AI Didn’t Break Infrastructure — It Exposed Its Edges

For a long time, the industry settled into a comfortable model for infrastructure.

Two CPUs. A standard server. Add more boxes when you need more scale. Keep everything broadly interchangeable. Let software do the heavy lifting.

It worked. In fact, it worked so well that it became an assumption rather than a design choice.

But that model was always shaped by the workloads we cared about at the time.

The First Signs Were Already There

When Oracle introduced Oracle Exadata, the reaction was fairly predictable.

It didn’t fit the prevailing model. It looked too integrated. Too opinionated. Too far away from the idea of standardised infrastructure.

Underneath the reaction, though, was something more interesting.

A stack optimised for lowest cost generic compute just wasn’t up to the task of delivering the I/O throughput demanded by critical database workloads. It was common on commodity platforms to have idle CPU while the I/O cards were consistently maxed out. In cloud, the situation was worse – database systems sized just on CPU consistently failed to deliver performance. The simple – but costly – solution was to over-provision CPU because, in most clouds, I/O is rationed per core. This was expensive on compute charges, and even more expensive on the CPU database licences required for the mostly idle CPUs.

Exadata wasn’t trying to be different for the sake of it. It was optimising for very specific constraints—data locality, I/O bottlenecks, latency between compute and storage. It was a system designed around the behaviour of a database workload, not around a generic idea of infrastructure. Exadata delivered a different balance between CPU and I/O, one which produced database efficiencies impossible on commodity.

The long-term view is that CPU (more or less) doubles in power over approximately a year. I/O, on the other hand, increases in speed much more slowly. The result is that over time, we ended up with small servers with plenty of CPU resource, but comparatively little I/O. This is fine for app servers, but not for databases.

The key performance limitation had started to shift from compute to data movement.

AI took this to a new level

Early work on frontier models quickly exposed that commodity hardware wasn’t enough, and up stepped NVIDIA to fill the gap with their GPUs. But extreme competition to produce the next greatest model drove straight past the limits of what a bolted-on GPU can do, and pushed ever more exotic chips and architecture.

Training large models isn’t just “more compute.” It behaves differently. You’re dealing with highly parallel operations, extreme memory bandwidth pressure, and sensitivity to how thousands of devices communicate with each other.

That’s why hardware like the NVIDIA A100 or Google TPU exists. Not as faster general-purpose processors, but as architectures designed for a very specific class of computation.

And even within AI, the picture fragments quickly.

Training environments optimise for scale and throughput.
Batch inference cares about efficiency and cost per token.
Real-time inference shifts the focus again—latency, responsiveness, sometimes even moving back toward CPUs or edge devices.

There isn’t a single “AI infrastructure.” There are multiple, each with its own constraints.

The Differentiated Hardware Stack

In the “commodity era” the constraint was (nearly) always CPU, but now that CPU was plentiful, new factors started to drive hardware stack design.

Data locality matters — in both database and AI training, moving data to where processing happens is the dominant cost, so higher bandwidth is needed and the latency of that bandwidth also matters. This is not the usual datacentre North-South I/O route, but East-West – between servers in a dense AI cluster, or between cluster nodes in a database cluster. For database it’s also between compute and long-term storage.

The shape of the workload matters — database tolerates standard compute, but AI requires massive parallelisation and specialised vector arithmetic. Database additionally needs high bandwidth to the storage tier.

As a general pattern, data movement has become the critical limiting factor.

In this new world, interconnects matter more. Memory hierarchies matter more. Placement decisions matter more. Software frameworks are no longer abstracted cleanly from the hardware—they increasingly assume and exploit its shape.

Even cloud providers have adjusted. Instead of presenting purely uniform infrastructure, they now expose very distinct environments—different chips, different network characteristics, different performance envelopes.

That’s not a marketing choice. It’s a reflection of the underlying reality.

The Boundary Has Moved

The interesting shift isn’t that infrastructure has become “specialised.”

It’s that the boundary of what we treat as a stable, interchangeable layer has moved.

At one level, things still look familiar. There is still a base layer of broadly consistent compute, storage, and networking. That hasn’t gone away.

But above that, the system becomes more structured, more intentional.

In other words, the infrastructure is no longer something you can fully abstract away.

Where This Leaves Us

It’s tempting to frame this as a dramatic shift, but it’s more accurate to see it as an adjustment.

The model we’ve been using hasn’t failed. It just has boundaries. And those boundaries are now visible.

For some workloads, the familiar patterns still work perfectly well.

For others, especially in AI, the system needs to be designed with a clearer understanding of how compute, memory, storage, and network behave together.

That’s the real change.

Not that infrastructure is suddenly different—but that we can no longer pretend it isn’t.

Simon Griffiths