AI GovernanceLLM Engineering

Open-Source Article 12 Logging Infrastructure For The EU AI Act

We built an open-source audit logging library for the EU AI Act. This post explains what it does, and what it deliberately does not do (and the significance of that difference).

The library may be found here: @systima/aiact-audit-log.

It is MIT-licensed, published on npm, and designed for TypeScript teams building AI systems that fall within scope of the EU AI Act's Article 12 record-keeping obligations.

What follows is an account of where a logging library fits into the compliance landscape, grounded in the engineering decisions we made and the trade-offs we accepted.

The problem we were solving

Article 12 of the EU AI Act requires that high-risk AI systems are designed and developed with capabilities enabling the automatic recording of events over the lifetime of the system. The logs must be adequate for tracing back the system's activity, facilitating post-market monitoring, and supporting deployer oversight.

We covered the full technical requirements in What to Log, How Long to Keep It, and How to Reconstruct: Article 12 for Engineers. That post describes what you need to capture, how long you need to keep it, and what "reconstruct a decision" actually means in practice.

The engineering challenge is not understanding the requirements. The challenge is that most production AI systems were not built with Article 12 in mind, and retrofitting compliant logging into an existing inference pipeline is tedious, error-prone work. The schema design, the hash chain for tamper evidence, the storage layout, the retention enforcement, the context propagation for multi-step decisions; these are all solved problems individually, but assembling them into a coherent logging layer is a significant engineering effort.

We built the library because we kept designing the same logging architecture for different clients. The schema, the hash chain, the batching, the S3 storage layout; the fundamentals were identical across engagements. The differences were in what constituted a "relevant event" for each system, how the logs fed into risk management, and what the monitoring procedures looked like. Those differences are the hard part. The logging infrastructure is the easy part, once you have it.

What the library provides

The core of the library is an AuditLogger class that writes structured, hash-chained log entries to S3-compatible storage. Every entry includes a SHA-256 hash of its contents and a reference to the previous entry's hash, forming a tamper-evident chain that a regulator or notified body can independently verify.

The schema maps directly to Article 12. Every field is annotated with the specific paragraph it relates to: entryId for unique event identification (12(1)), decisionId for correlating related events (12(2)(a)), timestamp for automatic recording (12(1), 12(3)(a)), eventType for risk situation identification (12(2)(a)), and so on. The mapping is documented in full in the repository's COMPLIANCE.md.

For teams using the Vercel AI SDK, the library includes middleware that automatically captures every LLM call; both generateText and streamText; without requiring manual instrumentation at each call site. This matters because the real compliance risk with Article 12 is not the schema. It is incomplete instrumentation. If a developer forgets to log a call, that gap is invisible until an auditor asks for a decision trace that passes through it.

Context propagation via AsyncLocalStorage allows decision IDs and metadata to flow through async call chains without manual threading. This reduces the friction that causes developers to skip logging in the first place.

The library also includes a coverage diagnostic that analyses logged entries and reports gaps: missing event types, tool call/result mismatches, long gaps between entries, and whether all entries are manually captured rather than automatically instrumented. This is the closest the library comes to answering "are you logging enough?", though the answer is always qualified by what the integrator defines as "enough."

What it deliberately does not do

The library provides technical logging infrastructure.

It does not automatically provide or prove compliance.

It does not define what constitutes a risk-relevant event. Article 12(2)(a) requires logging events "relevant for identifying situations that may result in the AI system presenting a risk." What counts as relevant depends entirely on the system's intended purpose, its deployment context, and its risk profile. A credit scoring system and a content moderation system have fundamentally different risk profiles. The library captures whatever events you tell it to capture. The decision of what to capture is yours.

It does not implement risk management. Article 9 requires a risk management system. The audit log provides raw data that feeds into risk management, but does not define risk criteria, implement risk scoring, or generate risk assessments. Those are organisational decisions, not library features.

It does not design human oversight. Article 14 requires human oversight mechanisms appropriate to the system's risk level. The library records human interventions when they occur; it does not define when human review should be required or design the oversight workflow.

It does not establish monitoring procedures. Article 72 requires post-market monitoring. The library provides the data layer for monitoring; you can query logs, compute statistics, and export evidence packages. But it does not define KPIs, set alert thresholds, or establish escalation procedures.

It does not produce technical documentation. Annex IV requires a comprehensive documentation package. The library's schema documentation and COMPLIANCE.md contribute to this, but they are one input among many.

We are explicit about this boundary because most compliance tooling is not, and we think the ambiguity is harmful. An engineering team that installs a library and believes they have satisfied Article 12 is in a worse position than a team that knows they have not, because the first team has stopped looking for gaps.

Design decisions worth explaining

A few choices in the library's design are non-obvious and worth documenting.

Why SHA-256 hash chains rather than external timestamping. The hash chain provides ordering and non-modification evidence that is self-contained; you can verify it with nothing but the logs themselves and standard tooling. External timestamping services (RFC 3161) provide stronger evidence but introduce a dependency on a third party and add latency to every write. For most deployments, the hash chain combined with S3 Object Lock in Compliance mode provides sufficient integrity evidence. The library is designed so that external timestamping can be layered on top without modifying the core chain.

Why S3 rather than a database. Audit logs have a write-heavy, append-only access pattern with infrequent reads. S3 is cheaper, simpler, and better suited to this pattern than a database. It also supports lifecycle policies for automated retention management, Object Lock for immutability, and cross-region replication for durability. The date-partitioned key layout ({systemId}/{year}/{month}/{day}/{fileIndex}.jsonl) enables efficient range queries and maps naturally to lifecycle policy prefixes.

Why AsyncLocalStorage for context propagation. The alternative is requiring developers to pass decisionId through every function call in a multi-step decision chain. This works in principle but fails in practice; developers omit it, especially in utility functions and error handlers. AsyncLocalStorage from Node.js's async_hooks module propagates context automatically through async call chains. The trade-off is that it does not work across EventEmitter callbacks or native addon boundaries, but these edge cases are documented and the middleware auto-generates a decisionId if context is missing, so logging is never lost; only correlation may be affected.

Why a coverage diagnostic rather than coverage enforcement. The library cannot know what events your system should be logging, because that depends on your system's architecture and risk profile. Instead of pretending to enforce completeness, the coverage diagnostic reports what it observes and flags common patterns that suggest gaps. "No human interventions logged" is a warning, not an error, because some systems genuinely have no human oversight loop. The diagnostic makes gaps visible; the integrator decides which gaps are real.

The relationship between this library and Systima's consulting practice

The boundary between "logging infrastructure" and "full compliance" is exactly the boundary between what the library provides and what our consulting practice provides.

The library handles the data layer.

Our consulting practice handles the organisational layer: risk management design, human oversight architecture, monitoring procedures, technical documentation, and the integration work that connects a logging library to a functioning compliance programme.

For absolute clarity: This is not a bait-and-switch. The library is MIT-licensed, fully functional, and does not 'phone home' or require any sort of commercial licence.

If you can close the gap between logging infrastructure and full compliance with your own team, the library costs you nothing. If you need help closing that gap, we exist.

Getting started

npm install @systima/aiact-audit-log

The README has a quick start guide, full API reference, and CLI documentation. The COMPLIANCE.md has the Article-by-Article field mapping and the precise scope of compliance claims.

If you are building a high-risk AI system and need help designing the compliance architecture around the logging layer, Systima's AI Governance and Compliance practice works with engineering teams to design and implement the full stack: logging, risk management, human oversight, monitoring, and technical documentation.