The AI infrastructure buildout has been designed, almost exclusively, around training. That focus made sense in 2023. It does not make sense in 2026. The gigawatt campuses, the 500-megawatt power agreements, the liquid cooling systems for GPU clusters on 24-hour model runs: all of it reflects training requirements. That is, consequently, the wrong infrastructure for what AI workloads are actually becoming.
Inference is, today, the dominant AI compute use case by volume. Every time a user interacts with an AI product, an inference request runs. Autonomous agent actions, API calls, and real-time recommendations all run on inference. The scale of that demand already dwarfs training by volume, by frequency, and by the number of organisations dependent on it. The ratio of inference to training workloads widens with every quarter as AI adoption expands. Crucially, inference and training have fundamentally different infrastructure needs. Most of what the industry has built to date, and most of what is currently under construction, optimises for the wrong one.
Why Training and Inference Are Different Infrastructure Problems
Training workloads are characterised by sustained, high-density compute demand. A large model training run keeps a full GPU cluster at near-maximum utilisation for days or weeks. The thermal output is constant, the power draw is predictable, and operators can optimise the infrastructure for a single, well-defined operating condition. That is, specifically, why training data centers have driven the shift to high-density liquid cooling. The heat flux is too high for air and too consistent for systems managing variable load.
Inference workloads are, by contrast, characterised by bursty, variable demand and strict latency requirements. A user waiting for a response cannot tolerate the multi-second delays that are irrelevant during a multi-day training run. Inference workloads spike unpredictably and drop to near-zero between requests. They require infrastructure that ramps quickly and maintains low latency at variable utilisation levels. The power profile is, moreover, fundamentally different. As we have covered in our analysis of agentic AI creating a power demand profile nobody designed data centers for, the shift from batch training to real-time inference is one of the most consequential changes in AI infrastructure demand and one of the least well-addressed in current facility design.
The Thermal Design Mismatch
Operators design high-density direct-to-chip or immersion cooling systems around consistent heat flux at rack densities reflecting training GPU configurations. Those systems are not, however, optimally suited for the variable heat loads that inference workloads produce. An inference server handling bursty requests generates a thermal profile that cycles between high and low output. That variability strains cooling systems designed for sustained peak load.
The consequence is operational inefficiency. A cooling system sized for training loads but operating at partial utilisation during inference consequently consumes more energy per useful compute unit. A system designed for inference from the start would, in contrast, be more efficient. At gigawatt campus scale, that efficiency gap translates into hundreds of millions of dollars in operating cost over the facility’s lifetime. The difference between sustained peak load design and variable inference load design is, consequently, not marginal. As we have covered in our analysis of how colocation is being redefined by AI workload requirements, the operators who understand these workload-specific infrastructure requirements are building facilities that perform materially better than those who treat all AI compute as equivalent.
The Network Architecture Mismatch
The network fabric inside a training campus optimises for all-to-all communication between GPU nodes within a cluster. Training workloads require tight coupling between compute nodes, high bisectional bandwidth, and low intra-cluster latency. The topology optimises for large, monolithic jobs that occupy the entire cluster.
Inference at scale requires a different network topology. Inference requests are, typically, independent of each other and do not require all-to-all communication patterns. The network bottleneck in inference is, in other words, the connection between the inference server and the user or application generating requests. Low external latency matters more than high internal bisectional bandwidth. The network investment in a training-optimised campus is, consequently, misallocated for inference. Rebuilding it for inference requires not just reconfiguration but, in many cases, physical infrastructure changes that campus designers never anticipated.
Why This Matters for the Next Phase of the Buildout
The industry is beginning to recognise this mismatch. At Data Center World 2026, Ram Nagappan, VP of AI infrastructure at Oracle Cloud Infrastructure, said operators must now design for two different AI patterns. Training and distributed inference each require a distinct infrastructure approach. That framing reflects, notably, growing acknowledgement that the single-purpose training campus model is insufficient for the full range of AI workload requirements. The question is, however, whether that acknowledgement will translate into design changes quickly enough. A significant overhang of training-optimised infrastructure suboptimal for inference is already forming.
The operators best positioned to navigate this build modular, adaptable facilities configurable for both workload types, rather than facilities fully optimised for either. As we have covered in our analysis of how agentic AI is rewriting data center design requirements, the shift toward agentic and real-time AI workloads is accelerating the infrastructure design challenge.
Facilities planned in 2024 and 2025 on training-first assumptions are already facing design obsolescence pressure before they have even opened. The developers who anticipated this are, consequently, building for flexibility from the ground up. Those who did not are building facilities that will require expensive retrofitting to serve the inference-dominant workload mix that is already here, not arriving. As we have covered in our analysis of the inference cost crisis driving enterprises off the cloud, the economics of inference infrastructure are distinct from training, and the gap between them is growing. The data center industry has not yet fully priced that distinction into its design decisions.
