Why UK SaaS Companies Must Redesign Engineering and QA for the EU AI Act
For a comprehensive, structured treatment of these obligations and how they translate into engineering artefacts, see our full guide: Compliance-as-Architecture: An Engineering Leader's Guide to the EU AI Act.
If you are unsure whether the EU AI Act applies to your UK-based company, we have explored that in detail here: Why UK firms need to comply with the EU AI Act.
This article assumes scope has already been established and focuses on something more practical:
If the Act applies to you, your engineering and QA processes must change.
The EU AI Act does not primarily demand new policies. It demands evidence that risk is being managed in how AI systems are built, tested, deployed, and monitored.
That evidence comes from engineering systems, not legal documents.
The Core Shift
Most UK SaaS companies treat compliance as a legal exercise.
"Engineering builds the product. Legal writes policy. QA runs functional tests".
This is not the right approach to remain compliant with the EU AI Act, because it effectively collapses that separation.
If your development process cannot show:
- What risks were identified
- How those risks were tested
- What controls were implemented
- How the system is monitored
- What happens when it fails
...then you do not have compliance, regardless of what your policy documents say.
Your SDLC becomes part of your regulatory evidence.
Step 1: Are You High-Risk?
Before redesigning your engineering workflow, answer a practical question:
Does your AI system materially influence decisions about individuals in the EU?
Examples include:
- Hiring or firing decisions
- Insurance underwriting or claims approval
- Credit scoring
- Access to education
- Access to healthcare
- Risk scoring that affects individuals’ opportunities
Note: This is not an exhaustive list.
If your system influences outcomes of this nature, you are likely dealing with a high-risk AI system under the Act.
If your product generates marketing copy or summarises internal notes, your obligations are lighter. Governance still matters, but the engineering burden is different.
The key takeaway here is to not assume.
Undertake a mapping exercise of your features to real-world impact.
Step 2: Are You the Builder or the Integrator?
In practical terms, there are two positions you may occupy.
If you build or significantly modify the AI system, the responsibility sits with you.
If you embed someone else’s model into your product without materially changing it, your obligations are narrower, but you remain responsible for how it behaves in your context.
Put more simply:
- If you control training or architecture, you carry full lifecycle responsibility.
- Note that undertaking fine-tuning of somebody else's model may move you into 'provider' obligation territory (though this is not definitive), so it is best to act as if you do have these responsibilities when fine-tuning.
- If you call an external API, you must still monitor performance, enforce oversight, and manage risk at the application layer.
This distinction determines how deep the redesign must go.
What Article 9 Means in Engineering Terms
Article 9 requires risk management to be continuous and iterative across the lifecycle of the AI system.
Translated into engineering language, this means you must be able to produce evidence showing:
- What risks were identified before launch
- How those risks were evaluated
- What controls were built into the system
- How the system is monitored in production
- What happens when predefined limits are exceeded
Those outputs must exist as artefacts, not as intentions.
For example:
- A risk register stored in your Git repository
- Evaluation reports attached to model releases
- Monitoring dashboards in production, along with the ability to meaningfully find/extract logging from given model runs
- Incident logs retained and exportable
If a regulator or customer asks how you determined a model was safe to deploy, the answer must be retrievable from your engineering systems.
Redesigning the Development Lifecycle
Here is how an AI feature should look under an AI Act-aligned process.
Note
The level of rigour described below reflects what is expected for high-risk systems.
For lower-risk systems, the same structure may not be legally mandated, but many organisations will still adopt elements of it as good engineering practice, and to pre-empt later risk level change in an evolving system.
Discovery
Define the intended purpose precisely.
Example:
"This model assists underwriting decisions for small business insurance in Germany."Then, identify foreseeable risks.
Examples:
- Biased scoring across sectors
- Incorrect denial of legitimate claims
- Performance drift due to economic shifts
Required output: A documented risk list linked to the feature and stored in your repository.
Design
Decide in advance how risks will be controlled, and actions to take.
Examples:
- Human review required if model confidence falls below 0.65
- Manual override available to underwriters
- Logging of all inputs and outputs
- Maximum tolerated false-positive rate set at 3 percent
Required output: An architecture document showing controls, thresholds, and escalation paths.
Development
Build structured evaluation.
Examples:
- Test against a curated dataset of historical claims
- Measure false positives and false negatives
- Compare performance across sectors
Define deployment thresholds.
For example, deployment is blocked if:
- False positive rate exceeds 3 percent
- Drift exceeds 5 percent from baseline
- Performance disparity between groups exceeds tolerance
Required output: Automated evaluation report generated in CI and attached to the change record.
Pre-Deployment
Verify that:
- Thresholds are met
- Oversight triggers function correctly
- Logging is active
- Rollback mechanisms work
Required output: A release record containing:
- Model version
- Dataset version
- Evaluation metrics
- Active risk controls
Production
Monitor continuously.
Examples:
- Drift detection
- Alerts if confidence distribution shifts materially
- Alerts if manual override frequency increases
- Alerts tied to complaint or anomaly signals
Define responses in advance.
Examples:
- Automatic rollback
- Escalation to human review
- Temporary suspension of automated decisions
Required output: Live dashboards and structured incident logs.
Rethinking QA for AI Systems
Traditional QA asks:
Does the feature work?AI QA must ask:
- How often is it wrong?
- How wrong is it allowed to be?
- What happens when it is wrong?
There is no binary pass or fail, but rather 'tolerances'.
Define in advance:
- Maximum tolerated error rates
- Maximum tolerated drift
- Escalation thresholds
- Time-to-human-review expectations
If those numbers are not defined before deployment, you do not have a functioning risk management system.
QA expands beyond code correctness into model evaluation and data quality governance.
CI/CD Becomes Compliance Infrastructure
To repeat an earlier point, this is not a governance committee problem; it should live in your pipeline.
Concrete integration points:
- Every model change triggers automated evaluation
- Deployment is blocked if thresholds are not met
- Evaluation reports are attached to pull requests
- Model versions are tagged and logged at deployment
- An audit-ready summary is generated automatically on release
The CI/CD system becomes part of your regulatory control framework.
Organisational Implications
The above tooling and processes are not sufficient. In addition, accountability must be clear.
You will likely need:
- A named AI risk owner inside engineering
- QA expanded to include model evaluation and data quality
- DevOps responsible for monitoring and logging integrity
- Legal reviewing outputs and documentation, not designing technical controls
If engineering does not own this, the organisation will struggle to demonstrate compliance.
The Cost of Delay
High-risk obligations apply from August 2026.
Redesigning a development process is not a minor change. It requires:
- Defining new thresholds
- Building evaluation infrastructure
- Adjusting release gates
- Training teams
- Embedding new habits
If you wait until customers or regulators request evidence, you will retrofit controls into systems that were not designed for them.
If you design the process now, compliance artefacts become normal outputs of development.
The Competitive Reality
The EU AI Act introduces a new category of engineering maturity.
Companies that:
- Define risk tolerances clearly
- Automate evaluation
- Monitor continuously
- Log consistently
- Design human oversight deliberately
...will ship AI features with confidence and lower regulatory exposure.
Companies that treat compliance as paperwork will accumulate invisible risk inside their systems.
This is a system design undertaking, rather than a legal one.
The question is therefore:
Will your engineering process generate evidence that your AI behaves within defined limits, as required by the EU AI Act?