Infographic on 'Ethics in AI' highlighting various types of biases including historical, representation, measurement, and evaluation biases, along with the importance of designing fairness in AI systems. Features a processing node for AI and labels for input and output stages.

Bias and Fairness

Here’s the uncomfortable truth about AI bias. It rarely announces itself. There’s no error message, no warning light, no moment where the system admits it has a problem. The model just does what it was trained to do — faithfully, at scale, and often with complete confidence.

Because bias in AI isn’t primarily about algorithmic malfunction. It’s about inheritance. AI systems learn from data that reflects the world as it was, not necessarily the world as it should be. And if that data carries the fingerprints of historical inequality, underrepresentation, or flawed measurement — the model doesn’t question it. It learns it.

The refinery analogy holds here too, and it holds uncomfortably well. Contaminated feedstock doesn’t trigger an alarm at the intake valve. It moves through the system, gets processed with everything else, and comes out the other end as contaminated output — refined, packaged, and delivered with the same confidence as everything around it. The pipeline worked perfectly. That’s the problem.

So where does the contamination get in? There are four entry points worth understanding.


Historical Bias is perhaps the most insidious. It occurs when training data reflects pre-existing inequality — when past decisions were themselves discriminatory, and those decisions become the ground truth a model learns from. A hiring algorithm trained on a decade of appointment records from a male-dominated industry doesn’t invent a preference for male candidates. It learns one from the data. The discrimination was already there. The model just scaled it.

Representation Bias arises when certain groups are underrepresented in training data. A facial recognition system trained predominantly on lighter-skinned faces will perform less accurately on darker-skinned ones — not because of any deliberate design choice, but because the training set didn’t reflect the full diversity of the population it would eventually serve. Gaps in the data become gaps in performance. And those gaps tend to fall hardest on the people already least served by existing systems.

Measurement Bias is subtler still. It emerges when the features used to train a model capture some populations less accurately than others. Predictive health models built on clinical data may underperform for groups historically less likely to access formal healthcare — not because the model is poorly designed, but because the measurement itself was uneven at source. The data records what was measured. It can’t record what wasn’t.

Evaluation Bias occurs when the test data used to validate a model doesn’t reflect the diversity of the real world it will operate in. A model can pass every benchmark with flying colours and still fail badly for populations underrepresented in the evaluation set. If you don’t test for it, you won’t find it — and you won’t find it until it’s already causing harm in deployment.

Four different entry points. The same result: a model that has learned the wrong lessons, at scale, and has no idea.


Consider a real example. In 2018, Amazon built a CV screening tool to help filter job applicants. It was trained on a decade of historical hiring data — CVs submitted by candidates who had actually been appointed. The problem was that the data reflected a male-dominated industry. The model didn’t know that. It just learned what a successful candidate looked like based on the evidence it was given. Over time it began penalising CVs that included words like “women’s” — as in women’s chess club, women’s rugby team — and downgrading graduates from all-female colleges. The model wasn’t told to discriminate against women. It learned to. From data that already did. Amazon scrapped the tool in 2018. But the lesson endures: a model trained on biased history will faithfully reproduce that history, at scale, until someone intervenes.

Which brings us to fairness — and here’s where it gets genuinely hard.

Fairness sounds like a simple goal. It isn’t. Multiple formal definitions of fairness exist — statistical parity, equalised error rates, individual fairness, counterfactual fairness — and they are mathematically distinct. More importantly, they frequently conflict. Satisfying one fairness criterion can, and often does, violate another.

A model that achieves equal error rates across demographic groups may not achieve equal outcomes. A model that achieves equal outcomes may not treat similar individuals similarly. There is no single definition of fairness that satisfies all criteria simultaneously — and that means choosing one is not a technical decision. It’s a values decision.

This is the point most AI ethics conversations skirt around. Technical mitigations exist — reweighting training data, adjusting loss functions, post-processing predictions. They matter. But they cannot resolve the underlying question of what fairness actually means in a given context, for a given population, with given consequences. That question requires human judgement, institutional accountability, and explicit deliberation — not optimisation.

And without that deliberate specification, optimisation processes will do exactly what they were designed to do: maximise predictive accuracy. Not fairness. Accuracy.

Fairness has to be designed in. Because a system left to optimise on its own will not choose it.


Next: Part 3 — Privacy and Consent.

Leave a Reply