AI LeadershipAI Governance

The EU AI Act Changes What a Head of AI Needs to Know

"The last time a Head of AI made a model selection decision without considering regulatory documentation, the gap did not surface until a compliance review six months later asked for evidence that the model's failure modes had been assessed before deployment. The evidence did not exist, because nobody had thought to look for it".

This is the kind of problem the EU AI Act creates for AI leadership. Not a new checklist to complete, but a set of obligations that attach to architectural decisions that were previously treated as purely technical. The Act does not prescribe how to build AI systems. It prescribes what must be demonstrable about the systems that are built; and those requirements reach into decisions that most AI leaders make without regulatory awareness.

The following sections examine where the obligations are established, where the ambiguities lie, and where a Head of AI needs to exercise judgement in territory the regulation has not yet clarified.

GPAI model selection and downstream documentation obligations

This one is straightforward. Articles 51 to 56 establish specific obligations for providers of General Purpose AI models. They must supply technical documentation, a summary of training data, and a copyright compliance policy. When a deployer integrates a GPAI model into a high-risk system, the adequacy of that documentation becomes the deployer's problem.

The practical question for a Head of AI is whether the model provider's technical documentation covers what the deployer needs for conformity assessment. If the system may be classified as high-risk under Annex III, the deployer must demonstrate an understanding of the model's capabilities, limitations, and potential failure modes. For most foundation models today, the provider's documentation covers general capabilities, safety benchmarks, and content policies. It rarely covers the specific failure modes that matter for a particular use case in a particular regulatory context.

Not sure whether your system uses a GPAI model, or what that means for your obligations? The GPAI obligations guide covers the two-tier structure and helps you determine where you sit.

This means model evaluation must include a documentation adequacy assessment alongside the standard cost, quality, and latency analysis. A Head of AI who evaluates models without considering the provider's GPAI documentation is making a selection that may be technically sound but leaves a gap in the conformity evidence chain.

Persistent memory and the Article 12 logging challenge

Article 12 requires automatic logging for high-risk AI systems that enables the reconstruction of decision chains. The requirement itself is established law. What the Article does not address is how this maps to specific architectural patterns; that is an engineering problem that the regulation leaves to the implementer.

Article 12 only applies to high-risk systems. Unsure whether your system qualifies? The "When Annex III applies" section of the engineering compliance guide walks through the classification criteria.

Persistent memory is where this engineering problem becomes acute. Agentic AI systems increasingly accumulate state across sessions: user preferences, conversation history, contextual knowledge built over time. Each decision the system makes is influenced by this accumulated history.

In a stateless architecture, logging is relatively straightforward: capture the input, the model call, and the output. In a stateful architecture with persistent memory, every decision is influenced by accumulated context that must itself be versioned and made reconstructable. Given a specific output from six months ago, can the full chain be traced: which model version was called, with what accumulated context, what intermediate reasoning occurred, and how the persistent memory influenced the result?

A Head of AI who implements persistent memory without designing the corresponding logging infrastructure is creating a system that works but cannot explain itself to a regulator. The cost of adding decision reconstruction to an architecture that was not designed for it is substantially higher than building it in from the start. (We built open-source tooling for Article 12 logging to demonstrate the architectural patterns involved.)

The provider/deployer boundary: established law, unresolved application

The Act defines two roles: providers develop or place AI systems on the market (Article 3(3)); deployers use them under their authority (Article 3(4)). Which role a company occupies determines which obligations apply. Provider obligations include conformity assessment, technical documentation, and post-market monitoring. Deployer obligations are lighter: oversight, transparency, record-keeping.

Not sure whether your company is a provider or a deployer? The "Where do you fit?" section of the engineering compliance guide covers three common scenarios: consuming a hosted model via API, fine-tuning an open model, and orchestrating multiple models into a composite system.

The distinction is clear in simple scenarios. It becomes genuinely ambiguous when a company assembles foundation models, custom tools, memory systems, and guardrails into an autonomous workflow using an orchestration framework. The company did not develop the underlying models, but the resulting system's behaviour is substantially determined by the orchestration layer's decisions: which tools to invoke, when to escalate, how to handle failures, what context to carry forward.

Article 25 provides a partial answer: a deployer who makes a "substantial modification" to a high-risk system becomes a provider for purposes of that modification. But what constitutes a substantial modification when the modification is an entire orchestration layer? This is an open question. Regulatory guidance has not clarified it, and the harmonised standards are still being developed.

A Head of AI does not need a definitive answer to this question. What they need is an awareness that the question exists, that it has significant cost implications (provider obligations are materially more expensive than deployer obligations), and that the architectural choices being made today will determine how defensible the company's classification position is when the question is eventually tested.

Risk classification: use case determines category, architecture determines exposure

Annex III defines high-risk AI system categories by intended purpose: systems used in critical infrastructure, education, employment, essential services, law enforcement, migration, and the administration of justice, among others. A system's risk classification depends on what it does, not on how it is built.

Unsure which Annex III category your system falls into, or whether it falls into one at all? The engineering compliance guide lists the five categories most relevant to technology companies and explains the scoring-and-ranking boundary question.

However, architecture determines the practical exposure within a given classification. A system classified as high-risk because it processes employment decisions carries the same Annex III classification whether it is a simple scoring model or a multi-step agentic workflow with tool invocations. But the complexity of satisfying the obligations (logging, human oversight, conformity assessment) scales with architectural complexity.

For agentic systems that invoke external services with real-world side effects, the architecture also determines whether an initially lower-risk system drifts into a higher-risk category. Adding a tool that accesses financial data or personal records can change the system's functional profile. A Head of AI who approves tool integrations without considering the risk classification implications is not violating a specific provision; they are failing to manage regulatory exposure.

Conformity assessment is not model evaluation

For high-risk systems, Article 43 requires conformity assessment before the system can be placed on the market. This is a structured evaluation against harmonised standards demonstrating that the system meets the essential requirements of Chapter III, Section 2.

Unsure whether your system requires conformity assessment? It applies to high-risk systems. The Delve case study illustrates what genuine conformity assessment requires and how it differs from what most companies currently do.

This is distinct from model evaluation. Measuring accuracy, tracking hallucination rates, running A/B tests, and optimising cost per query are engineering evaluation activities. They measure whether a system performs well. Conformity assessment demonstrates that a system meets specific regulatory requirements, with documented methodology and an evidence trail that can withstand scrutiny.

The distinction matters because a system that performs excellently may still fail conformity assessment if the evidence trail is inadequate, the evaluation methodology is not documented and repeatable, or the risk management system does not address the specific risks identified in the system's risk assessment. A Head of AI who treats internal evaluation as equivalent to conformity assessment preparation is creating a gap that becomes expensive to close later.

Cost optimisation creates compliance risk when it breaks audit trails

LLM cost optimisation is a core responsibility for any Head of AI. Model routing, prompt caching, response compression, and tiered inference strategies can reduce costs substantially.

These strategies also create compliance risk when they are designed without considering Article 12's logging requirements. Model routing decisions that swap models mid-chain create a decision path that the audit trail must capture: which model was called, why the routing logic selected it, and what context was passed between them. If the routing logic is not logged, the decision chain cannot be reconstructed.

Prompt caching strategies that reuse responses across users may create data isolation issues under GDPR, particularly when cached prompts contain personal data or cached responses reflect one user's context applied to another's query.

None of this means cost optimisation is incompatible with compliance. It means the two must be co-designed. A Head of AI who treats cost optimisation and compliance architecture as separate workstreams will eventually discover they are the same workstream, usually when a compliance review identifies gaps that require rearchitecting the optimisation layer.

What this means for the role

The EU AI Act has made compliance an architectural constraint. Some of these constraints are precisely defined (GPAI documentation obligations, Article 12 logging, conformity assessment requirements). Others are ambiguous and will remain so until regulatory guidance, harmonised standards, and enforcement practice clarify them (the provider/deployer boundary for assembled systems, the threshold for substantial modification).

A Head of AI does not need to be a lawyer. But they need to know which decisions carry regulatory weight, which obligations are established, and where the ambiguities lie. The engineering leader's guide to the EU AI Act provides a comprehensive treatment of how these obligations translate into engineering artefacts.

This is the intersection we built the Fractional Head of AI service around: engineering leadership that accounts for regulatory constraints where they are established, and exercises informed judgement where they are not.