Why UK SaaS Companies Must Redesign Engineering and QA for the EU AI Act

Systima26 February 20267 min read

For a comprehensive, structured treatment of these obligations and how they translate into engineering artefacts, see our full guide: Compliance-as-Architecture: An Engineering Leader's Guide to the EU AI Act.

If you are unsure whether the EU AI Act applies to your UK-based company, we have explored that in detail here: Why UK firms need to comply with the EU AI Act.

This article assumes scope has already been established and focuses on something more practical:

If the Act applies to you, your engineering and QA processes must change.

The EU AI Act does not primarily demand new policies. It demands evidence that risk is being managed in how AI systems are built, tested, deployed, and monitored.

That evidence comes from engineering systems, not legal documents.

The Core Shift

Most UK SaaS companies treat compliance as a legal exercise.

"Engineering builds the product. Legal writes policy. QA runs functional tests".

This is not the right approach to remain compliant with the EU AI Act, because it effectively collapses that separation.

If your development process cannot show:

What risks were identified
How those risks were tested
What controls were implemented
How the system is monitored
What happens when it fails

...then you do not have compliance, regardless of what your policy documents say.

Your SDLC becomes part of your regulatory evidence.

Step 1: Are You High-Risk?

Before redesigning your engineering workflow, answer a practical question:

Does your AI system materially influence decisions about individuals in the EU?

Examples include:

Hiring or firing decisions
Insurance underwriting or claims approval
Credit scoring
Access to education
Access to healthcare
Risk scoring that affects individuals’ opportunities

Note: This is not an exhaustive list.

If your system influences outcomes of this nature, you are likely dealing with a high-risk AI system under the Act.

If your product generates marketing copy or summarises internal notes, your obligations are lighter. Governance still matters, but the engineering burden is different.

The key takeaway here is to not assume.

Undertake a mapping exercise of your features to real-world impact.

Step 2: Are You the Builder or the Integrator?

In practical terms, there are two positions you may occupy.

If you build or significantly modify the AI system, the responsibility sits with you.

If you embed someone else’s model into your product without materially changing it, your obligations are narrower, but you remain responsible for how it behaves in your context.

Put more simply:

If you control training or architecture, you carry full lifecycle responsibility.
- Note that undertaking fine-tuning of somebody else's model may move you into 'provider' obligation territory (though this is not definitive), so it is best to act as if you do have these responsibilities when fine-tuning.
If you call an external API, you must still monitor performance, enforce oversight, and manage risk at the application layer.

This distinction determines how deep the redesign must go.

What Article 9 Means in Engineering Terms

Article 9 requires risk management to be continuous and iterative across the lifecycle of the AI system.

Translated into engineering language, this means you must be able to produce evidence showing:

What risks were identified before launch
How those risks were evaluated
What controls were built into the system
How the system is monitored in production
What happens when predefined limits are exceeded

Those outputs must exist as artefacts, not as intentions.

For example:

A risk register stored in your Git repository
Evaluation reports attached to model releases
Monitoring dashboards in production, along with the ability to meaningfully find/extract logging from given model runs
Incident logs retained and exportable

If a regulator or customer asks how you determined a model was safe to deploy, the answer must be retrievable from your engineering systems.

Redesigning the Development Lifecycle

Here is how an AI feature should look under an AI Act-aligned process.

Note

The level of rigour described below reflects what is expected for high-risk systems.

For lower-risk systems, the same structure may not be legally mandated, but many organisations will still adopt elements of it as good engineering practice, and to pre-empt later risk level change in an evolving system.

Discovery

Define the intended purpose precisely.

Example:

"This model assists underwriting decisions for small business insurance in Germany."

Then, identify foreseeable risks.

Examples:

Biased scoring across sectors
Incorrect denial of legitimate claims
Performance drift due to economic shifts

Required output: A documented risk list linked to the feature and stored in your repository.

Design

Decide in advance how risks will be controlled, and actions to take.

Examples:

Human review required if model confidence falls below 0.65
Manual override available to underwriters
Logging of all inputs and outputs
Maximum tolerated false-positive rate set at 3 percent

Required output: An architecture document showing controls, thresholds, and escalation paths.

Development

Build structured evaluation.

Examples:

Test against a curated dataset of historical claims
Measure false positives and false negatives
Compare performance across sectors

Define deployment thresholds.

For example, deployment is blocked if:

False positive rate exceeds 3 percent
Drift exceeds 5 percent from baseline
Performance disparity between groups exceeds tolerance

Required output: Automated evaluation report generated in CI and attached to the change record.

Pre-Deployment

Verify that:

Thresholds are met
Oversight triggers function correctly
Logging is active
Rollback mechanisms work

Required output: A release record containing:

Model version
Dataset version
Evaluation metrics
Active risk controls

Production

Monitor continuously.

Examples:

Drift detection
Alerts if confidence distribution shifts materially
Alerts if manual override frequency increases
Alerts tied to complaint or anomaly signals

Define responses in advance.

Examples:

Automatic rollback
Escalation to human review
Temporary suspension of automated decisions

Required output: Live dashboards and structured incident logs.

Rethinking QA for AI Systems

Traditional QA asks:

Does the feature work?

AI QA must ask:

How often is it wrong?
How wrong is it allowed to be?
What happens when it is wrong?

There is no binary pass or fail, but rather 'tolerances'.

Define in advance:

Maximum tolerated error rates
Maximum tolerated drift
Escalation thresholds
Time-to-human-review expectations

If those numbers are not defined before deployment, you do not have a functioning risk management system.

QA expands beyond code correctness into model evaluation and data quality governance.

CI/CD Becomes Compliance Infrastructure

To repeat an earlier point, this is not a governance committee problem; it should live in your pipeline.

Concrete integration points:

Every model change triggers automated evaluation
Deployment is blocked if thresholds are not met
Evaluation reports are attached to pull requests
Model versions are tagged and logged at deployment
An audit-ready summary is generated automatically on release

The CI/CD system becomes part of your regulatory control framework.

Organisational Implications

The above tooling and processes are not sufficient. In addition, accountability must be clear.

You will likely need:

A named AI risk owner inside engineering
QA expanded to include model evaluation and data quality
DevOps responsible for monitoring and logging integrity
Legal reviewing outputs and documentation, not designing technical controls

If engineering does not own this, the organisation will struggle to demonstrate compliance.

The Cost of Delay

High-risk obligations apply from December 2027.

Redesigning a development process is not a minor change. It requires:

Defining new thresholds
Building evaluation infrastructure
Adjusting release gates
Training teams
Embedding new habits

If you wait until customers or regulators request evidence, you will retrofit controls into systems that were not designed for them.

If you design the process now, compliance artefacts become normal outputs of development.

The Competitive Reality

The EU AI Act introduces a new category of engineering maturity.

Companies that:

Define risk tolerances clearly
Automate evaluation
Monitor continuously
Log consistently
Design human oversight deliberately

...will ship AI features with confidence and lower regulatory exposure.

Companies that treat compliance as paperwork will accumulate invisible risk inside their systems.

This is a system design undertaking, rather than a legal one.

The question is therefore:

Will your engineering process generate evidence that your AI behaves within defined limits, as required by the EU AI Act?

The Core Shift

Step 1: Are You High-Risk?

Step 2: Are You the Builder or the Integrator?

What Article 9 Means in Engineering Terms

Redesigning the Development Lifecycle

Note

Discovery

Design

Development

Pre-Deployment

Production

Rethinking QA for AI Systems

CI/CD Becomes Compliance Infrastructure

Organisational Implications

The Cost of Delay

The Competitive Reality

Related Articles

The EU AI Act Digital Omnibus Is Settled: A Pause Is Not a Reprieve

Continuous Conformity: Engineering Evidence for Orchestrated AI Systems

What The Claude Code Leak Means for Engineering Teams in Regulated Industries