Data Integrity in an Age of Autonomous Writes

The first two articles in this series examined what is lost when agents bypass the application layer — the business rules, the authorisation controls, the audit trail, the filters that govern what each user can see. This article examines what happens when agents don’t just read, but write.

The answer is not complicated. It is, in some respects, worse than the security and identity problems already discussed — because the damage is structural, cumulative, and often invisible until it is very difficult to reverse.

The Mundane Reality of Agent-Driven Data Corruption

It is tempting to frame data integrity risks in sophisticated terms — race conditions, constraint violations, consistency anomalies. The reality is more prosaic, and more alarming for it.

I have seen a very ordinary version of this in data migration work. Dates are always one of the painful areas. On one migration to an upgraded system, the data loaded without errors, but the new application started showing some customers with ages in the thousands.

The cause was not a failed load. It was an old assumption made visible. Some customers had originally come from a much older mainframe system where years were stored as two digits: 65 for 1965, for example. The system we were migrating from had quietly interpreted those values as twentieth-century dates. The new system was stricter, and for those migrated customers it filled the missing century digits with 00. Newer customers had proper four-digit years, but the older migrated customers were now treated as if they had been born between years 0000 and 0099.

Structurally, the dates were present. The load succeeded. The application had data it could process. But the meaning had been lost in translation, and the result was absurd.

An agent tasked with updating a customer record may update some fields and not others, because it completed part of its task before a timeout, an error, or a re-prioritisation intervened. The record now exists in a half-finished state that no application workflow ever intended or anticipated. No constraint was violated. No error was logged. The data simply no longer reflects reality.

An agent inferring values from context — a price, a status, a category — may make a plausible but incorrect inference. The value written is syntactically valid, passes every structural check, and is wrong.

An agent updating related data across multiple tables or documents may update one and not the other, leaving contradictions that will silently propagate into reports, decisions, and downstream systems. The application that originally managed these tables always updated them together, in a specific sequence, as part of a defined workflow. The agent had no knowledge of that relationship.

None of this requires the agent to malfunction. It may be doing precisely what it was asked to do. The problem is that what it was asked to do was defined without sufficient understanding of the integrity requirements of the underlying data.

The Scale Problem

A data entry error made by a human affects one record. An agent making the same class of error may make it across thousands or hundreds of thousands of records before anyone notices — or before anything breaks in a way that is visible.

This is not a theoretical risk. Agents are valuable precisely because they operate at speed and at scale, automating what would otherwise require significant human effort. That same speed and scale applies to errors. An agent that misunderstands a field’s meaning, or applies a transformation incorrectly, or leaves records in a partially updated state, will do so consistently and rapidly across every record it touches.

The blast radius of agent-driven data corruption is qualitatively different from anything the application layer was designed to handle. Recovery — identifying affected records, understanding what the correct state should have been, and restoring it — is a significant undertaking with no clean solution in most current architectures.

Constraints Are Necessary But Not Sufficient

The obvious response is to enforce integrity at the database layer — constraints, foreign keys, check constraints, not-null rules. These matter, and the case for strengthening them is real. But they address only a subset of the problem.

Structural constraints can ensure that a value is of the right type, within an acceptable range, or present when required. They cannot ensure that the value is meaningful in context, that related records are consistent with each other, or that a write operation is complete in the sense that the business requires.

A record with all mandatory fields populated, all foreign keys satisfied, and all check constraints passing may still be fundamentally wrong — because the agent wrote a plausible value that is incorrect, or because a related record in another table was not updated to match, or because the operation that should have accompanied this write was never executed.

The database knows the rules it has been given. It does not know what the data is supposed to mean.

The Case for a Controlled Write Surface

If agents cannot be trusted to maintain data integrity through direct write access, the logical response is to remove that access entirely. All writes must pass through a controlled surface that exposes business operations rather than raw data manipulation.

That surface might be an API, a procedural database package, a command handler, or some other governed operation boundary. The implementation is less important than the property it provides: the caller invokes a complete business operation, and the system protects the data from partial or incoherent mutation.

This is not simply a return to service-oriented architecture, though the surface resemblance is there. The motivation and the design requirements are distinct. Traditional service APIs were designed for predictable, application-to-application integration. A controlled write surface for agentic consumers must be designed for a non-deterministic caller that may invoke operations in any order, with any parameters, without any inherent understanding of the business context it is operating in.

That changes the design requirements considerably. The surface cannot assume that the caller understands the implications of what it is requesting. Every exposed operation must be complete — it must leave the data in a valid, consistent state regardless of what the agent passes in. This is not a convenience layer; it is an integrity enforcement layer, and it must be designed as such.

The agent’s role becomes one of invocation rather than manipulation. It calls a business operation — create order, update customer status, record payment — and the controlled surface ensures that everything that operation requires happens correctly and completely. The agent does not need to know which tables are involved, which fields must be updated in concert, or what state the data must be in when the operation is complete. That knowledge lives behind the operation boundary, where it can be maintained, tested, and governed.

The Database as the Final Invariant Layer

A controlled write surface is a necessary response, but it is not sufficient on its own — for a simple reason. A table may be written to by multiple APIs, services, jobs, or database routines. Different systems, different agents, different integration paths may all have legitimate write access to the same underlying data. No single operation boundary sees all of them.

This is where the database re-enters the picture — not as a passive store, but as the enforcer of invariants that must be true regardless of which write path was used.

The division of responsibility is principled and clear. The controlled write surface enforces what is true for a specific business operation — the rules, the workflow, the context that belong to that operation. The database enforces what is always true for the data — the constraints, the referential integrity rules, the consistency requirements that apply universally, across every API, every agent, every integration that will ever write to these tables.

These are not alternatives. They are complementary layers, each enforcing what the other cannot. The operation boundary cannot be the last line of defence because it is not the only gate. The database is the only layer that sees every write, from every source, through every path — and it is therefore the only layer that can enforce rules that must hold universally.

This is a strong argument for richer, more expressive database-level constraints than most current implementations provide. Foreign keys and simple check constraints are a start. What is needed is the ability to express and enforce complex invariants — rules that span multiple tables, that apply across related documents, that assert conditions which must be true of the data as a whole. This is not a new idea; it is a feature of the relational model that was always intended and has never been fully realised. Agents make the case for it urgent in a way that decades of academic argument did not.

The Integrity of the Data Estate

The arrival of agents as autonomous writers surfaces a question that most organisations have never had to answer directly: what are the invariants of your data? Not the application rules — those are encoded in workflows and business logic — but the properties that must be true of the data itself, regardless of how it was written or by whom.

For most organisations, that question does not have a clean answer. The invariants exist, but they live in application code, in the heads of experienced developers, in tribal knowledge accumulated over years of working with a specific system. They were never expressed formally, because the application was always there to enforce them informally.

Agents make that informality untenable. When autonomous systems are writing at scale, across multiple paths, without human oversight of individual operations, the invariants need to be explicit, formal, and enforced at the layer that sees everything.

That layer is the database. It is time to treat it accordingly.