3.2 Discrimination Through Data
Module 3: Why Privacy Matters — The Real Costs
Explores how data enables algorithmic discrimination in hiring, insurance, pricing, and criminal justice — including the role of proxy variables and connections to civil rights law.
Learning Material
1 pagesDiscrimination Through Data
Data dös not discriminate on its own — but the systems built on it can, and often do. The challenge is that modern algorithmic discrimination is frequently invisible, deniable, and scalable in ways that human bias never was.
Algorithmic hiring bias
In 2018, Reuters reported that Amazon had scrapped an internal AI recruiting tool after discovering it systematically downgraded resumes from women. The system had been trained on 10 years of historic hiring patterns — patterns that reflected male-dominated tech industry practices. The model learned that male-associated signals (attending certain universities, using words like 'executed' rather than 'organised') predicted selection, and penalised resumes mentioning women's colleges or the word 'women's.' Amazon's engineers could not fix it without starting over. The lesson: a model trained on biased historical data will replicate and automate that bias.
Insurance redlining and differential pricing
Traditional redlining — denying services to residents of minority neighbourhoods by drawing red lines on maps — was outlawed by the US Fair Housing Act of 1968. But ZIP codes and postcodes remain powerful proxies for race and income. When insurance companies set premiums based on local claims data, they can produce racially disparate outcomes without a single race-based variable in the model. A ProPublica investigation (2017) found that car insurance premiums in predominantly non-white neighbourhoods in four US states were 30% higher on average than in white neighbourhoods with the same actual risk profiles.
Predictive policing and criminal risk scores
ProPublica's landmark 2016 investigation 'Machine Bias' examined COMPAS, a risk-assessment algorithm used in US courts to predict the likelihood of re-offending. The investigation found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high-risk, while white defendants were more likely to be incorrectly labelled low-risk. The algorithm did not use race as an input — but the data it was trained on reflected racially skewed patterns of policing and prosecution.
Why 'neutral' data is not neutral
Shopping habits, ZIP codes, credit scores, browsing behaviour — none of these explicitly mention race, sex, or religion. But when the patterns in that data correlate with protected characteristics (because of historical discrimination, residential segregation, or economic inequality), they function as proxies. Civil rights law in many jurisdictions addresses this through the concept of disparate impact: a practice is discriminatory if it disproportionately harms a protected group, even with no discriminatory intent.
Your takeaway
Algorithmic discrimination is a civil rights issue, not merely a technical one. Understanding how proxy variables work helps you recognise it — and supports the case for algorithmic transparency and accountability in high-stakes decisions.