How to Build Good Software with AI — Not How You’ve Been Told

In a previous post, I wrote about the current narrative around AI agents “coding for hours” and what’s actually going on behind the scenes. If you haven’t read that, it’s here: https://simongriffiths.io/2026/04/21/ai-agents-coding-for-hours-whats-really-going-on/

That post was really about demystifying the claim.

This one is more practical.

Because once you step away from the hype, the more interesting question becomes:

If agents aren’t magically building production systems on their own… how should we actually use AI to build software properly?

Over the past few months, I’ve settled into a pattern that works for me. It’s not especially fashionable, and it’s definitely not “set an agent loose and come back later”.

It’s closer to a structured, iterative specification workflow—just with AI heavily involved.

Start with intent, not detail

I always begin with a short requirement. Not a full spec, just enough to define direction:

What am I trying to build?
What constraints matter?
What architectural approach am I leaning towards?

This isn’t about completeness. It’s about setting boundaries.

Expand into detailed specs (one piece at a time)

From there, I use AI chat to build out a detailed specification—but always in small, focused chunks.

That constraint turns out to be important.

If you try to describe everything at once, you either lose coherence or end up with something that looks plausible but isn’t internally consistent.

Instead, I’ll take one area—say authentication, or document storage—and iterate on it repeatedly:

refine the design
challenge assumptions
check alignment with earlier decisions

This is where AI is genuinely useful. Not because it writes the answer, but because it helps you explore the space more quickly and from multiple angles.

Stabilise by compressing to markdown

Once a section feels coherent, I summarise it into markdown.

That step matters more than it sounds.

The earlier interaction is exploratory. The markdown is deliberate. It forces decisions to be written down in a stable form.

It also makes the output portable. I can move it between tools without dragging a long conversational history along with it.

Branch by feature

When I want to extend the system, I take that summary and start a new thread focused on a specific feature.

Effectively:

take the current state
branch into a focused exploration
develop that feature independently

I might do this several times in parallel.

For anything unfamiliar or high-risk, I’ll often build a small prototype in a separate repository. Not to prove the whole system, just to answer specific questions:

does this actually work in practice?
what are the edge cases?
where does the theory break down?

Merge and align

At some point, all of those feature-level specs need to come back together.

I merge them into a single combined spec and then go through a different kind of iteration—less creative, more analytical:

do the pieces fit together?
are there conflicting assumptions?
have constraints been violated?

This is where inconsistencies tend to surface.

Final consolidation and baseline

Once everything is aligned, I produce a clean, consolidated markdown spec.

That becomes the input to the coding AI tool.

If I’ve built prototypes, I also ask the tool to review them and reconcile any differences between what was specified and what actually worked. That feedback is folded back into the spec.

Then I run a few final validation passes, resolve any remaining ambiguity, and baseline the spec in Git.

Only then do I start building.

One more piece: skills and coding standards still matter

One thing I didn’t mention so far is that I’m not handing a raw spec to an AI coding tool and hoping for the best.

I use a set of well-defined skills and coding guidelines alongside the spec. These cover things like structure, naming, error handling, security patterns, and how different components should be implemented.

In practice, this acts as a second layer of control. The spec defines what needs to be built. The skills and guidelines define how it should be built.

That turns out to be important.

Without that layer, the output can be technically correct but inconsistent, or drift away from the architectural intent over time. With it, the implementation becomes much more predictable and aligned.

So the spec isn’t working in isolation—it’s part of a system that includes reusable skills and guardrails for the code itself.

What this approach actually gives me

A few things stand out.

First, separating exploration from commitment is critical. Early stages are iterative and messy. Later stages are controlled and precise. Mixing those tends to produce poor results.

Second, working in small scopes is essential. Feature-level thinking keeps both the human and the AI grounded.

Third, prototypes still matter. AI can help reason about a system, but it doesn’t remove the need to test assumptions against reality.

And finally, the spec becomes a real artefact. Not documentation as an afterthought, but the backbone of the build.

Where it starts to creak

There is a downside.

This process relies quite heavily on manual summarisation and moving content between contexts. Over time, that introduces risk:

decisions get softened or lost
constraints aren’t always enforced consistently
small inconsistencies creep in when features are merged

In effect, I’m acting as the consistency engine.

That works up to a point, but it doesn’t scale particularly well.

There’s also a broader pattern emerging in the industry that reinforces this. Even where coding agents increase velocity, there’s evidence that quality and complexity can degrade without strong controls, with measurable increases in things like cognitive complexity and maintenance burden. ( arXiv )

So the problem isn’t “can AI generate code?” It clearly can.

The problem is: can we keep the system coherent and understandable as it does so?

The next step

The direction I’m moving towards is introducing a more formal “canonical spec”.

Not a bigger document, but a more structured one:

a stable set of requirements with IDs
explicit architectural decisions
feature specs that reference both

The idea is to maintain a single authoritative source of truth, and then feed slices of that into AI for each piece of work, rather than repeatedly summarising and reconstructing context.

That should reduce drift, improve traceability, and make the whole process more scalable.

I haven’t fully implemented that yet, but it feels like the natural next step.

For now, this approach strikes a reasonable balance.

It uses AI where it’s strong—iteration, expansion, reframing—and keeps control where it matters—structure, validation, and final decision-making.

It’s not agents running for hours building systems on their own.

It’s something more grounded.

AI helping you think—while you remain responsible for making the system actually work.