AI GovernanceRegulated Industries

You Need Demographic Data to Prove You're Not Biased: The GDPR-AI Act Tension

Article 10 of the EU AI Act requires that training, validation, and testing datasets are examined for possible biases, particularly in relation to protected characteristics. The obligation is clear: if you are building a high-risk AI system, you must be able to demonstrate that you have tested for discriminatory outcomes across demographic groups.

GDPR Article 9 restricts the processing of special category data: racial or ethnic origin, political opinions, religious beliefs, health data, sexual orientation, and biometric data. These restrictions exist for good reason. They also create a direct operational conflict with Article 10.

The practical tension is straightforward. You cannot test for racial bias without processing racial data. You cannot validate that your model treats men and women equitably without knowing which outputs correspond to which group. You cannot stratify results by protected characteristic if you have no protected characteristics in your dataset.

And the risk is bidirectional. If you collect demographic data improperly to satisfy your AI Act bias testing obligations, you may solve your Article 10 problem and create a GDPR Article 9 enforcement problem. Two regulatory regimes, two enforcement bodies, one engineering decision.

This is not a hypothetical conflict for legal scholars to debate. It is a live operational problem that every engineering team deploying high-risk AI in the EU must resolve, today, in their data pipelines and testing frameworks. The choices you make here will determine whether your bias testing is defensible or merely decorative. For broader context on how this fits into the EU AI Act's compliance architecture, see our Compliance-as-Architecture: An Engineering Leader's Guide to the EU AI Act.

Subscribe to our blog

Receive insights on agentic AI, compliance, and automation in regulated industries.

No spam. Unsubscribe at any time.

The obligation under Article 10

Article 10 of the AI Act imposes specific requirements on the data governance practices of high-risk AI system providers. Training, validation, and testing datasets must be "relevant, sufficiently representative, and to the best extent possible, free of errors and complete." More critically, these datasets must be examined "in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination."

This is not a suggestion. It is a requirement with enforcement behind it.

Bias testing, to be meaningful, requires demographic data. You need to know the demographic composition of your dataset to test whether outcomes differ across groups. If your credit scoring model approves 78% of applications from one ethnic group and 52% from another, that disparity is invisible unless you can stratify by ethnicity. If your recruitment screening tool ranks candidates differently by gender, you will never detect it without gender labels attached to your test data.

Aggregate statistical testing; fairness metrics computed across the entire population; is insufficient if you cannot break results down by protected characteristic. A model can appear fair in aggregate whilst systematically disadvantaging a minority group. Simpson's paradox is not an abstract statistical curiosity; it is a concrete failure mode in bias evaluation.

The Act expects "appropriate measures" to detect, prevent, and mitigate bias. Regulators will look for evidence that you tested, not merely that you documented a policy stating your commitment to fairness. A policy without corresponding test results is a liability, not a defence.

The restriction under GDPR Article 9

Special category data can only be processed under specific legal bases enumerated in GDPR Article 9(2). There is no general-purpose exemption for "we need this data to check our AI system is not biased."

Explicit consent (Article 9(2)(a)) is the most commonly cited basis. But consent creates its own methodological problem: selection bias. Those who consent to provide demographic data may differ systematically from those who do not. If your bias test is built on a self-selected sample, your test results may not generalise to the full population your system serves. You have traded one problem for another.

Substantial public interest (Article 9(2)(g)) is another potential basis, supported by member state or Union law. Its scope for commercial AI bias testing is largely untested. No court has definitively ruled that a private company's obligation under the AI Act constitutes a substantial public interest sufficient to invoke this basis. It may well qualify; but building your compliance strategy on an untested legal theory is a risk your legal team should quantify, not assume away.

Legitimate interest, the workhorse legal basis for much commercial data processing, does not apply to special category data under Article 9. This is a common misconception. Article 6(1)(f) legitimate interest cannot override Article 9's restrictions.

The UK GDPR mirrors these restrictions in substance. Post-Brexit divergence has introduced some procedural differences, but the fundamental tension between bias testing obligations and special category data restrictions remains identical. If you are deploying in both markets, you face this problem twice, under two enforcement regimes.

Practical approaches for engineering teams

There is no clean solution to this tension. There are, however, approaches that are defensible if implemented rigorously and documented honestly. Each carries limitations. Your job is to choose the approach whose limitations you can best manage, and to document that choice with precision.

Synthetic data for bias testing

Generate synthetic datasets that mirror real-world demographic distributions without containing actual personal data. If the synthetic data is not derived from real individuals in a way that renders them identifiable, it falls outside GDPR's scope entirely. You sidestep the Article 9 restriction by never processing real special category data.

The limitations are significant. Synthetic data reflects the assumptions of its generator. If your generation model has blind spots, or if it smooths over distributional irregularities that exist in the real world, your bias test inherits those blind spots. You may certify your model as fair against a synthetic population that does not accurately represent the real one.

Regulators have not yet issued definitive guidance on whether synthetic data constitutes sufficient evidence of bias testing under Article 10. The prudent position is to use synthetic data as one component of a broader testing strategy, not as the sole basis for your bias assessment. Document the generation methodology, the assumptions embedded in it, and the ways in which your synthetic population may diverge from reality.

Proxy variable analysis

Detect potential bias through correlated non-protected variables. Postcode correlates with ethnicity in many geographies. Name frequency patterns correlate with national origin. Certain purchasing behaviours correlate with age or gender.

Proxy analysis does not require processing special category data directly. You analyse variables that are themselves non-sensitive, looking for patterns that suggest disparate treatment of groups you have not explicitly identified.

The limitations are real. Proxy analysis is inherently imprecise. It can over-detect bias (flagging disparities driven by legitimate factors that happen to correlate with a protected characteristic) and under-detect bias (missing disparities in groups whose proxy signals are weak or confounded). It is a useful screening tool, not a definitive assessment.

The advantage is practical: you can run proxy analysis on your existing data without triggering Article 9 restrictions or requiring a new consent programme. Combine it with other approaches for a more complete picture.

Collect demographic data specifically for bias evaluation, with explicit informed consent under Article 9(2)(a). This is the most direct approach and, in some respects, the most methodologically sound; you are testing with real demographic data from real users.

The consent must be genuinely voluntary. No service degradation for non-participation. No dark patterns nudging users towards consent. The consent mechanism must clearly explain that demographic data will be used solely for bias evaluation, how long it will be retained, and how it will be protected.

Selection bias remains the central risk. Document this limitation explicitly. Report both the demographics of your consenting population and, where possible, estimate how it may differ from your full user base. A bias test conducted on a non-representative sample is still informative; but only if you acknowledge the representativeness gap in your documentation.

This approach also creates a data retention tension. You must retain enough demographic data for meaningful bias testing whilst minimising data storage under GDPR's data minimisation principle. Define clear retention periods, enforce them technically, and document the rationale.

Anonymisation and aggregation

Process demographic data only in anonymised, aggregated form. If data is truly anonymised; meaning it cannot be re-identified, even by the data controller, even in combination with other datasets; it falls outside GDPR's scope.

The threshold for true anonymisation under GDPR is high. Pseudonymisation (replacing names with codes, for example) does not qualify. You must demonstrate that re-identification is not reasonably likely by any means. Techniques such as k-anonymity, l-diversity, and differential privacy can help achieve this threshold, but each introduces its own trade-offs in data utility.

Aggregation reduces the granularity of your bias testing. You may be able to determine that your model treats broad demographic groups equitably, but you may miss intersectional bias affecting smaller subgroups. A model that is fair across gender and fair across ethnicity can still be unfair for a specific combination of gender and ethnicity. Aggregation can mask precisely the biases you are trying to find.

Documentation is your defence

Whatever approach you choose; and you will likely combine several; document everything. The methodology. The reasoning for choosing it over alternatives. The limitations you have identified. The residual risk you have accepted.

This documentation must be specific and technical. "We conducted bias testing using synthetic data" is insufficient. "We generated synthetic test populations using [method], calibrated against [source], with the following known distributional assumptions: [list]. We tested for disparate impact across [characteristics] using [metrics]. Results: [summary]. Known limitations: [list]" is closer to defensible.

This is not a legal question you can delegate entirely to counsel. The choice of bias testing methodology is an engineering decision with legal consequences. Your lawyers can advise on which legal bases are available. They cannot tell you whether your synthetic data generation adequately captures the demographic distributions in your production population. That is your responsibility.

As discussed in our pillar guide on compliance architecture, documentation that lives in your engineering workflow; version-controlled, traceable, and contemporaneous; is fundamentally more credible than documentation produced after the fact in response to a regulatory enquiry.

The key is this: demonstrate that you have thought rigorously about the tension between Articles 9 and 10, evaluated the available approaches against your specific system and context, and made a defensible choice. There is no "right" answer to this tension, because no clean answer exists yet. What regulators will look for is evidence of rigorous, honest analysis; not a pretence that the problem is solved.

The regulatory horizon

The European Data Protection Board has not issued definitive guidance on the specific tension between GDPR Article 9 and AI Act Article 10. Various national data protection authorities are beginning to address it, but positions vary across member states. The French CNIL has been more permissive in its interpretation of bias testing as a legitimate processing purpose; other DPAs have been more cautious.

The AI Act's implementing and delegated acts, expected over the coming years, may clarify acceptable approaches to bias testing under data protection constraints. Harmonised standards referenced by the Act may include specific provisions. But "may" and "expected" are not a compliance strategy.

Do not wait for perfect regulatory clarity. The enforcement timeline is already moving. Build a defensible approach now, document it thoroughly, and be prepared to adapt as guidance crystallises. Ongoing post-market monitoring must include periodic reassessment of your bias testing methodology as both the regulatory landscape and your model's real-world behaviour evolve.

The tension between data protection and bias testing is not a flaw in the regulatory framework. It is a reflection of a genuine conflict between two legitimate objectives: protecting individuals from discriminatory AI systems, and protecting individuals from intrusive processing of their most sensitive data. Both matter. Neither can be dismissed.

Engineering teams that treat this as a checkbox exercise; picking an approach, documenting it once, and moving on; will find themselves exposed when regulators begin examining bias testing methodologies in detail. The teams that will be best positioned are those that engage with the tension honestly, implement multiple complementary approaches, document their limitations transparently, and revisit their methodology as the regulatory and technical landscape evolves.

Systima works with engineering teams navigating precisely this intersection, where data protection constraints collide with AI governance obligations and the answer is not in any single regulation. If your team is building the bias testing framework for a high-risk AI system and needs to get the engineering decisions right, we can help.