There’s a growing narrative that AI agents can now “code for hours” and build meaningful applications with minimal human involvement
I’ve spent a bit of time looking into this, and I found myself slightly unconvinced—not because it isn’t happening, but because the framing doesn’t quite line up with how software engineering actually works.
If an agent is genuinely producing useful software over an extended period, then one of three things must be true:
- The problem is very tightly constrained
- The quality bar is lower than we’d normally accept
- Or we’re not talking about typical engineering problems
In practice, it’s usually a combination.
Three patterns hiding behind the headline
What’s being described as “hours of autonomous coding” tends to fall into three fairly distinct categories.
1. Well-bounded, conventional problems
If you point an agent at something like a CRUD application, an API layer, or a simple front end, it can make steady progress for quite a long time.
That’s not especially surprising. These are problems with well-established patterns, strong framework defaults, and relatively little ambiguity. The “specification” is largely implicit.
The agent isn’t designing something new—it’s assembling something familiar.
2. Iteration loops at scale
A lot of what looks like sustained progress is really just persistence:
generate → run → error → fix → repeat
Given enough cycles, this converges more often than you might expect, particularly if there are tests or clear runtime signals to guide it.
This is useful, but it’s worth being precise about what it is. It’s not architectural reasoning—it’s search with feedback.
That distinction matters once you move beyond straightforward problems.
3. Pre-engineered environments
Many of the more impressive examples rely on a well-prepared starting point: clean codebases, sensible structure, decent test coverage.
In other words, the environment is doing a lot of the work.
Agents perform well when the constraints are already in place. Remove those constraints, and the results become far less predictable.
Where this breaks down
The limitations show up in the areas you’d expect:
- Ambiguous or evolving requirements
- Trade-offs between competing concerns
- Long-term maintainability
- Cross-cutting issues like security, cost, and performance
These aren’t edge cases. This is most real-world software.
Agents don’t handle these particularly well—not because they’re flawed, but because these problems aren’t easily reduced to a closed loop with a clear success condition.
What is changing
There is a genuine shift, but it’s not quite the one being advertised.
The leverage is moving upstream.
The more precisely you define:
- constraints
- structure
- expected behaviour
the more effective the agent becomes.
Which leads to a slightly counterintuitive point:
If you want an agent to run for hours and produce something useful, you generally need to invest more effort in the specification, not less.
A more realistic model
The pattern that seems to work is relatively straightforward:
- Humans define the architecture, invariants, and boundaries
- Agents handle localised implementation and iteration
- Tooling provides continuous validation
That combination is powerful. It can significantly increase execution throughput.
But it’s not autonomous software engineering in any meaningful sense. It’s assisted implementation with a very fast feedback loop.
Final observation
There’s real value here, particularly for well-understood domains and repetitive tasks.
But it’s worth separating two ideas that are often conflated:
- an agent generating code continuously for an extended period
- an agent designing and delivering a robust, production-quality system
Those are not the same thing.
And for now, the second still depends heavily on human judgement.
Leave a comment