I’ve been spending a lot of time recently thinking about what happens when AI agents are pointed at real production systems — not the clean demos, but the actual accumulated architectural decisions of the last decade. What I keep coming back to is this: the software industry has been running a tab. We decomposed systems that didn’t need decomposing, distributed data that worked perfectly well together, and replaced database-enforced integrity with the optimistic assumption that well-behaved services would keep everything consistent. The tab wasn’t obviously wrong — some systems genuinely needed this architecture, and the teams building them had the discipline to make it work.

AI agents are the debt collectors. And they’re revealing exactly how much was borrowed.
What agents expose
When an agent interacts with your system, it doesn’t care about your architectural intentions. It has a goal, it has tools, and it will use them. Point an agent at a microservice estate and you quickly discover three things that careful human developers had been quietly working around.
First, integrity at the application layer is largely invisible to agents. The relational database that microservices replaced enforced constraints at the infrastructure level — foreign keys, check constraints, transactional boundaries. Violate them and you get a hard stop. Microservice architectures often moved this enforcement into application code, service contracts, and team convention. An agent calling a service API has no visibility into the full invariant surface behind that API. If the service accepts the write, the agent has little reason to know whether the wider system still makes sense. Silent corruption becomes much easier.
Second, cross-service writes have no default rollback. An agent completing a multi-step task might touch an order service, an inventory service, and a billing service in sequence. In a relational system, that’s a transaction — it either all commits or none of it does. In a microservice estate, each write is independent unless the architecture has explicitly provided a saga, compensating action, idempotency boundary, or some other recovery mechanism. A failure halfway through can leave the system in an inconsistent state with no simple rollback primitive. Human developers handled this with careful orchestration and local knowledge. Agents don’t yet reason about failure recovery at that level of sophistication.
Third, and most fundamentally, agents need to query data that microservices were never designed to serve. The canonical microservice answer to cross-service queries is: don’t do them. Pre-build your read models. Materialise projections via event streams. Design your BFF layer for known query patterns.
Agents don’t have known query patterns in the same way. An agent asked to identify high-value customers with open support tickets whose last three orders were delayed needs to span customer, support, and fulfilment services simultaneously. There may be no pre-built read model for that, and the combinatorial space of questions an agent might ask is effectively unbounded.
The data isn’t actually integrated. It’s siloed, with event-driven plumbing designed to serve the queries its architects anticipated. The relational monolith it replaced could have answered that question with a JOIN.
Two non-deterministic systems, one outcome
There is a deeper problem than bad writes and missing rollbacks. It is about the nature of the systems involved.
Microservice estates often expose non-deterministic behaviour at the system boundary. Eventual consistency means the state visible to one service at any given moment depends on which events have propagated and which haven’t. Network retries mean the same logical operation can produce different outcomes depending on timing. Race conditions across services are structurally hard to eliminate without distributed locking — which most microservice designs deliberately avoid as a scaling constraint. Event ordering is frequently not guaranteed, so consumers see different state depending on message arrival sequence. Human developers learn to work within this — they understand the failure modes, they write defensive code, they know which operations are idempotent and which aren’t.
Agents are also non-deterministic. The same goal, presented to the same agent against the same nominal system state, can produce different action sequences on different runs. An agent may read stale state from one service and base a write decision on it. It may interpret an ambiguous API response differently across invocations. It may retry an operation that partially succeeded without knowing it succeeded.
When you couple two non-deterministic systems, you don’t add their uncertainties — you multiply them. This is not just an engineering concern. In dynamical systems, chaos emerges when small differences in initial conditions produce wildly divergent outcomes. A minor difference in event propagation timing — one service seeing a slightly stale view of the world — becomes the initial condition the agent reasons from. The agent’s action, compounded across multiple services, can amplify that small difference into a large and potentially irreversible divergence in system state. The butterfly effect is a useful way to think about the failure mechanics.
That last word matters most. A data integrity failure in a monolith is usually deterministic and reproducible — you can find it, recreate it, and fix it. A corruption event caused by an agent interacting with an eventually consistent microservice estate may never reproduce under test conditions. The specific combination of propagation timing, agent reasoning path, and service response that caused the failure is unlikely to recur in exactly the same form. You are left with corrupted data, no clear cause, and no reliable way to verify the fix.
This is a qualitatively different class of problem. It is not just harder to prevent — it is harder to know it has happened.
The over-application problem
This would matter less if microservices had only been applied where they genuinely add value. They weren’t.
The pattern was designed for a specific class of problem: systems at Google, Amazon, and Netflix scale, where independent deployment across hundreds of engineers, genuine throughput constraints, and strict team ownership boundaries justified the operational complexity. Microservices are as much an organisational scaling pattern as a technical one. The industry cargo-culted the solution without the problem.
The backlash is no longer coming from database traditionalists. David Heinemeier Hansson has long argued for the “Majestic Monolith” as the right default for Basecamp-scale teams. Amazon’s Prime Video team published a case study showing that one of its monitoring services achieved an over 90% infrastructure cost reduction after consolidating a distributed, serverless design into a single-process implementation. Sam Newman, who wrote the definitive book on microservices, has consistently maintained that they are inappropriate below a certain organisational threshold, and that for a small team the “microservice tax” can be hard to justify.
In my experience, the majority of commercial systems — internal enterprise applications, mid-market SaaS products, departmental tooling — got complexity without the compensating benefits. Small teams gained no meaningful advantage from independent deployment. Query patterns that relational joins would have handled trivially became elaborate choreography. The operational overhead was real; the scale that justified it was not.
These are precisely the systems where the agent integrity problem lands hardest, because they also have the least spare engineering capacity to build compensating layers. A Netflix can construct a semantic query layer over its service estate. A mid-market SaaS company usually cannot do that well without starving the product work it actually needs to ship.
LLMs build monoliths better
There is a further irony that I haven’t seen analysed anywhere, and I want to flag it as instinct rather than settled argument — though I think the instinct is sound.
I’ve been using LLM coding tools heavily enough to form a view: they are structurally better at building monoliths than microservices. If AI-assisted development becomes a normal part of software construction, then this matters for which architectural patterns survive.
The training data distribution favours monoliths. Public code repositories are abundant with complete, coherent Rails applications, Django projects, Laravel codebases, and Spring Boot services. Well-architected microservice estates are mostly proprietary; what’s public tends to be fragments without the operational scaffolding that makes them function. LLMs learned from coherent wholes.
The context window favours monoliths. A monolith’s relevant logic — models, business rules, schema, query patterns — can largely fit within or near a model’s working context. A microservice problem requires understanding contracts across multiple repositories, event schemas, deployment configurations, and service mesh behaviour. That coherence rarely fits in context, so the model operates with partial information and fills the gaps poorly.
The reasoning required favours monoliths. LLMs handle local, synchronous, transactional logic well. Distributed state, eventual consistency, saga patterns, and compensating transactions require holding a mental model of time-separated, failure-prone interactions across trust boundaries. That is genuinely hard reasoning; models do it unreliably.
An LLM writing a monolith can reason about the whole system’s behaviour. An LLM writing a microservice is writing one actor in a choreography it cannot fully see. The bugs it introduces are the worst kind — locally correct, systemically wrong.
I want to be clear that the structural argument is inference rather than established research. But consider the revealed preference: many of the most visible demonstrations of AI coding productivity — vibe coding, solo SaaS MVPs, game jams, compiler projects, Cursor building Cursor — share a common characteristic. They are monolithic or near-monolithic in structure. The use of AI agents on large, mature microservice estates is conspicuously absent from the discourse, and the reason isn’t caution alone. The context window can’t hold a distributed codebase coherently. The feedback loop that makes AI coding productive — run it, see if it works — breaks down when spinning up dependencies requires an environment. The iteration cycle collapses. Someone should run the formal study, but the industry has already run the informal one. The silence is telling.
What this means in practice
The implication isn’t that every microservice estate should be collapsed into a monolith. For systems at genuine scale, with genuine team distribution, the benefits are real and the compensating investment is justified. Those systems will need to be explicit about the level of agency they are allowing: read-only agents that inspect state, advisory agents that recommend changes, and write-capable agents that execute operations directly. The integrity risk changes dramatically between those levels. Once agents can write, mature estates will need agent-aware write paths with explicit validation semantics, semantic query layers that present a coherent data surface, and careful governance around what agents can and cannot touch.
But for the majority of systems where microservices were the wrong choice to begin with, the arrival of agents may force an honest reckoning. The architectural complexity that was tolerable when all the consumers were known services written by disciplined human developers becomes untenable when the consumer is an agent with ad-hoc goals and no implicit understanding of your conventions.
I don’t think this plays out slowly. The pressure arrives as soon as organisations give agents write access to operational systems, rather than limiting them to read-only analysis or human-reviewed recommendations. The tab is due.
Leave a comment