For most of the past decade, custom AI silicon was a Google story. The company began developing its Tensor Processing Units internally in 2015, deployed them quietly across its data centers, and said relatively little about them publicly for years. The rest of the industry watched with interest but did not replicate the strategy at scale, in part because the capital requirements of custom chip development were prohibitive for most operators and in part because Nvidia’s GPU ecosystem was performing well enough that the ROI case for custom silicon development was difficult to make at the board level. That period of cautious observation is over.
In 2026, every major hyperscaler is deploying custom AI silicon at production scale simultaneously, the design partnerships and manufacturing commitments underpinning those programs are locking in competitive positions that will persist for years, and the implications for the AI infrastructure market are more consequential than the industry has yet fully internalized.
A Structural Inflection Point
The custom silicon AI accelerator race has reached a structural inflection point that is different from the incremental advances of previous years in three specific ways. First, the programs are no longer experimental. Google’s TPU v7 Ironwood is handling more than 75% of Gemini model computations internally. Amazon’s Trainium 3 is deployed at scale with Anthropic and OpenAI as confirmed production customers. Meta has committed to four MTIA chip generations over two years in a multi-gigawatt deployment with Broadcom.
Microsoft’s Maia 200 is processing a significant share of Azure AI inference workloads. These are not research programs or limited pilots. They are production infrastructure that is already changing the competitive economics of AI compute. Second, the pace of development has accelerated to annual release cycles that match or exceed Nvidia’s cadence, compressing the window available to Nvidia and AMD to respond. Third, the design partnerships and manufacturing commitments that underpin these programs are creating structural advantages that compound over time rather than reset with each chip generation.
The Four Programs Reshaping the Market
Understanding the custom silicon AI accelerator race in 2026 requires examining each of the four major hyperscaler programs on its own terms before examining how they interact and what their collective impact means for the broader market.
Google’s TPU program is the oldest and most mature. The Ironwood architecture, its seventh generation TPU, represents a decade of compounding design investment that no competitor can replicate quickly regardless of capital commitment. Google has confirmed that its frontier models including Gemini, Veo, and Imagen all train and serve on TPU infrastructure internally, with former Google engineers describing the company as fully committed to TPUs across its entire AI stack. The Ironwood deployment at scale demonstrates that a custom silicon program can reach the performance and reliability level required for frontier model serving, which is the threshold that earlier TPU generations had not consistently cleared across all workload types.
Amazon’s Trainium 3 program represents the most aggressive deployment of custom silicon outside of Google’s own operations. The chip delivers 2.52 petaflops of FP8 compute per chip with 144 gigabytes of HBM3e memory, achieving approximately 50% better price-performance than equivalent Nvidia H100 or B200 instances on AWS. More significantly, Anthropic’s commitment to run its Claude model inference on Trainium infrastructure and OpenAI’s adoption of Trainium 3 for training and inference workloads both represent commercial endorsements from frontier AI model developers that validate the chip’s production-readiness in ways that Amazon’s own benchmarks cannot. Amazon is simultaneously developing Trainium 3 Ultra with 128 gigabytes of HBM4 for 2026 to 2027 and has Trainium 4 on a 2nm node in development for 2027 to 2028, establishing a roadmap cadence that will maintain competitive pressure on Nvidia’s pricing in the AWS ecosystem.
Meta’s MTIA Strategy and the Broadcom Partnership
Meta’s approach to custom silicon differs from Google’s and Amazon’s in ways that reflect the specific workload profile of its business. The company’s AI compute requirements are driven by recommendation engines and ranking systems that serve billions of users across Facebook, Instagram, and WhatsApp, a workload profile that is inference-heavy, highly repetitive, and tolerant of the specialised optimisation that custom silicon enables more readily than general-purpose GPUs. Meta announced four new MTIA chip generations in March 2026, built on RISC-V architecture, with up to 25 times compute gains across the lineup. The company has committed to deploying these chips in partnership with Broadcom in a first-phase deployment exceeding 1 gigawatt, with a sustained multi-gigawatt rollout planned.
The Meta-Broadcom relationship is the clearest illustration of how the custom silicon race is restructuring the semiconductor supply chain. Broadcom now holds approximately 70% of the custom AI accelerator design market, providing the IP blocks, networking components, and packaging expertise that allow hyperscalers to build competitive custom chips without developing those capabilities entirely internally.
Broadcom’s AI semiconductor revenue hit $8.4 billion in Q1 2026 alone and is projected to approach $48 billion over the following four quarters based on a $73 billion backlog of orders from hyperscalers and AI developers, a figure that makes Broadcom’s AI revenue trajectory comparable to Nvidia’s even though Broadcom occupies a completely different position in the value chain. The design partnership model that Broadcom has established with Google, Meta, and now OpenAI is not easily replicable by competitors, because Broadcom’s lead in custom AI accelerator design reflects years of accumulated expertise across multiple generations of hyperscaler programs.
The Inference Workload as the Decisive Battlefield
The strategic logic of the custom silicon AI accelerator race is most clearly visible when examined through the lens of inference workloads rather than training workloads. Training is where Nvidia’s GPU advantage has been most durable, because the generalised computational flexibility of GPU architecture accommodates the varied and evolving requirements of model development in ways that custom silicon optimised for specific inference patterns cannot match as easily. Inference is where custom silicon’s advantages are most pronounced, because inference workloads are more predictable, more repetitive, and more sensitive to cost-per-token economics than training workloads.
The shift in AI compute toward inference is structural and accelerating. Inference workloads now represent roughly two-thirds of all AI compute according to Deloitte’s TMT Predictions 2026, a proportion that will continue growing as AI deployment in production applications expands faster than frontier model training at the leading edge.
The economics of serving AI at scale, where cost-per-token directly determines whether an AI application is commercially viable, create relentless pressure on inference infrastructure costs that custom silicon programs are specifically designed to address. Google running Gemini inference on TPUs rather than Nvidia GPUs is not primarily a performance decision. It is an economics decision. The cost differential between TPU inference and GPU inference at the scale Google operates is enormous in absolute terms, and that differential grows with every generation of custom silicon that Broadcom and TSMC help hyperscalers deploy.
The Nvidia Response and Its Limitations
Nvidia is not standing still in response to the custom silicon challenge. The B300 Blackwell Ultra, which began shipping in January 2026, delivers 15 petaflops of dense FP4 compute and 288 gigabytes of HBM3e memory, maintaining Nvidia’s position as the performance leader across multiple workload benchmarks. At rack scale, the GB300 NVL72 system delivers 1.1 exaflops of dense FP4 compute through 36 Grace Blackwell Superchips connected via NVLink 5, a system architecture that custom silicon programs at single-chip scale cannot currently match for the highest-density training workloads.
However, Nvidia faces a structural challenge in the inference market that hardware performance leadership alone cannot resolve. The CUDA software ecosystem that has been Nvidia’s most durable competitive advantage is being eroded by the maturation of alternative software frameworks. OpenAI’s Triton compiler, which allows developers to write hardware-agnostic kernels in Python with backends for Google TPU, AWS Trainium, and AMD accelerators, is reducing the switching cost that previously locked developers into Nvidia’s ecosystem. As covered in our analysis of the AI inference cost crisis in enterprise infrastructure, the economics of inference at production scale create structural incentives to find lower-cost alternatives to Nvidia GPU infrastructure regardless of CUDA’s software advantages. Those incentives are now being met by custom silicon programs that have reached production quality across a widening range of workload categories.
AMD’s Position in the New Landscape
AMD occupies a distinct and interesting position in the custom silicon AI accelerator race. Unlike the hyperscalers, AMD is not developing chips for internal use. It is competing directly against Nvidia in the merchant GPU market while attempting to establish a credible alternative for AI accelerator workloads. The company’s MI350 series is already deployed at scale and is described as AMD’s fastest-ramping product in company history. More consequentially, the MI450 series, codenamed Helios, is targeting H2 2026 launch on TSMC’s 2nm process with HBM4 memory, representing a potential step change in AMD’s competitive position against both Nvidia and hyperscaler custom silicon in the inference segment.
AMD’s strategic partnerships with OpenAI for a 6 gigawatt deployment commitment and with Meta for Llama model training are commercially significant endorsements that validate AMD’s position as a credible Nvidia alternative for at least a portion of hyperscaler workloads. However, AMD faces the same structural challenge in the inference market that Nvidia faces, though from a weaker starting position. Custom silicon optimised for specific hyperscaler inference workloads will always have cost and performance advantages over general-purpose merchant GPUs at the workload categories they are designed for, because custom design eliminates the overhead that general-purpose architecture carries to support workload flexibility. AMD can win in the segments where hyperscaler custom silicon does not adequately cover training requirements and specialised inference workloads. It cannot win in the segments where Google, Amazon, and Meta have deployed custom silicon specifically optimised for their highest-volume workloads.
The TSMC Chokepoint
Every major custom silicon program covered in this analysis, Google TPU, Amazon Trainium, Meta MTIA, Microsoft Maia, and OpenAI’s planned ASIC, manufactures at TSMC. The foundry produces approximately 92% of advanced AI chips at 7nm process nodes and below. TSMC’s 3nm process is running at or near full capacity utilisation with demand significantly exceeding available supply. This shared manufacturing dependency creates a structural constraint that no amount of investment in chip design can resolve on a short timeline, because TSMC’s capacity expansion takes years to bring online and is contested by every major AI infrastructure program simultaneously.
The TSMC chokepoint creates constraints that go beyond simple supply limits. Every hyperscaler custom silicon program, AMD’s merchant GPU roadmap, and Nvidia’s production pipeline compete for the same foundry capacity. Manufacturing availability now shapes deployment timelines as much as design completion does. Even when a hyperscaler designs a more competitive chip, it cannot deploy it faster than TSMC’s capacity allows. The shared manufacturing bottleneck reduces the advantage of superior design by imposing the same constraint across competing programs.
TSMC’s $100 billion investment in five new US fabs, with two Arizona facilities at 4nm and 3nm completing around 2026, will eventually expand the capacity available to AI chip programs. However, the demand growth rate from AI infrastructure programs is outpacing even TSMC’s aggressive capacity expansion, and the supply constraint will remain material through at least 2027 and likely beyond.
The Software Ecosystem Battle Running Alongside the Hardware Race
The hardware competition between custom silicon and Nvidia GPUs is inseparable from a parallel software ecosystem battle that will ultimately determine whether the hardware transitions can actually happen at the pace and scale that the roadmaps imply. Custom silicon that delivers superior performance per dollar for inference workloads is only commercially relevant if the software required to run production AI workloads on that silicon is available, debugged, and performant. Building that software ecosystem is harder and slower than building the hardware itself, and the gap between chip availability and software ecosystem readiness has been the primary constraint limiting custom silicon adoption in previous generations.
Google’s advantage in this dimension reflects its decade-long head start. The JAX framework and the XLA compiler that Google developed to run AI workloads on TPUs have matured through multiple generations of hardware and model development. Developers who have built on JAX for Google Cloud AI workloads can move to new TPU generations with relatively modest porting effort. Amazon’s Neuron SDK, which provides PyTorch and JAX support for Trainium workloads, has matured significantly with Trainium 3, and the confirmed deployment of Anthropic’s Claude models on Trainium infrastructure demonstrates that the software stack can support frontier model serving at production quality. Meta’s RISC-V-based MTIA architecture presents a more complex software challenge because it departs more significantly from conventional GPU programming models, requiring specialised compilers and runtime systems that Meta is developing internally alongside the hardware.
OpenAI’s Triton and the Declining CUDA Moat
OpenAI’s Triton compiler is the most significant software development in the custom silicon ecosystem. It lets developers write high-performance AI kernels in Python and automatically optimises them across multiple hardware targets. Triton now supports mature backends for Google TPU, AWS Trainium, and AMD’s MI series, and developers increasingly use it as a primary compilation target for AI frameworks. This shift has clear practical implications. Developers can now build new AI models on Triton and deploy them across multiple hardware platforms with far less porting effort than before, when achieving competitive performance required CUDA-specific optimisation.
Triton does not eliminate the CUDA moat. The enormous ecosystem of CUDA-optimised libraries, pre-tuned model implementations, and developer tooling that has accumulated over Nvidia’s decade of GPU dominance represents a switching cost that Triton cannot simply dissolve. However, Triton meaningfully reduces the marginal cost of supporting non-CUDA hardware for new model development, which means that the models being built today are less CUDA-dependent than the models built three years ago. As the AI model population shifts toward newer architectures developed with hardware-agnostic frameworks, the software ecosystem advantage that has sustained Nvidia’s market position will erode incrementally rather than disappear suddenly. That erosion is already underway, and the trajectory favours custom silicon adoption over the three to five year horizon that infrastructure investment decisions are made against.
The Neocloud Sector’s Exposure to Custom Silicon Disruption
The neocloud sector’s competitive position rests on a specific premise: that enterprises and AI developers will pay a premium for dedicated Nvidia GPU infrastructure over shared hyperscaler cloud alternatives. That premise has been commercially valid because Nvidia GPU access has been scarce, hyperscaler pricing has been opaque, and the performance of dedicated GPU infrastructure for AI workloads has been demonstrably better than shared cloud alternatives for many use cases. Each of those supporting conditions is now under pressure from the custom silicon transition in ways that neocloud operators have not yet fully addressed in their business models or their investor communications.
The scarcity premium on Nvidia GPU access is declining as Nvidia scales production and hyperscalers expand deployments. Hyperscaler pricing is becoming more transparent as Trainium and TPU offerings give enterprises a credible benchmark for evaluating GPU-based alternatives. At the same time, custom silicon is challenging the performance advantage of dedicated GPU infrastructure by delivering better performance per dollar for the fastest-growing segment of enterprise AI workloads. Neoclouds that built their businesses around privileged access to Nvidia GPUs are now facing a shift. The hardware they positioned as scarce is becoming more widely available, the pricing they framed as competitive is being undercut by custom silicon alternatives, and the workloads they targeted are moving toward infrastructure they do not control.
As we showed in our analysis of the GPU-as-a-service neocloud model, operators that built differentiated service layers above the hardware are in a stronger position than those that relied on hardware access as their primary competitive advantage. The shift to custom silicon is accelerating pressure on the latter group faster than most expected.
The Training Market as the Last Nvidia Stronghold
The training market represents the most durable remaining stronghold for Nvidia GPU dominance, and understanding its dynamics is essential for operators and investors making long-horizon infrastructure commitments. Training frontier AI models requires the kind of generalised high-performance compute that Nvidia’s GPU architecture provides across varied and evolving model architectures. The sheer diversity of experimental training workloads that frontier AI labs run, testing new architectures, scaling laws, and training methodologies, makes the flexibility of GPU hardware more valuable in training than in inference, where workload predictability enables the specialisation that custom silicon requires to deliver its economic advantages.
However, the training market is also evolving in ways that gradually reduce the absolute advantage of Nvidia’s position. The Trainium 3 deployments at Anthropic and OpenAI demonstrate that custom silicon can support frontier model training, not just inference, for at least some of the training workloads at the leading edge of AI development. AMD’s MI450 targeting rack-scale performance leadership with HBM4 memory represents the most credible merchant GPU challenge to Nvidia’s training dominance that the market has seen. And the hyperscalers’ custom silicon roadmaps all include enhanced training capabilities in next-generation designs, reflecting the strategic priority of reducing GPU dependence across the full spectrum of AI compute rather than just the inference portion. Nvidia’s training market stronghold will persist through 2026 and likely through 2027.
Whether it remains intact through 2028 depends on how quickly custom silicon programs and AMD’s merchant GPU roadmap close the performance and ecosystem gaps that currently define the competitive boundary between GPU dominance and custom silicon opportunity.
The Infrastructure Implications for Data Center Operators
The custom silicon AI accelerator race is not just a semiconductor story. Its outcomes will reshape how data centers are designed, financed, and operated in ways that infrastructure operators cannot afford to ignore. The transition of inference workloads from Nvidia GPU infrastructure to custom silicon changes the hardware and thermal profile of the facilities serving those workloads. Custom silicon programs optimised for specific inference workloads tend to achieve better performance-per-watt than general-purpose GPU infrastructure at those workloads, which means that inference-focused facilities built around custom silicon will have different power density, cooling, and rack layout requirements than GPU-centric AI factories.
The infrastructure implications are also financial. The GPU-collateralised debt structures that have become a defining feature of neocloud financing depend on the market value of Nvidia GPU assets remaining sufficient to support the debt secured against them. As custom silicon captures inference market share from Nvidia GPUs, the residual value trajectory of GPU hardware changes in ways that current debt underwriting models have not fully incorporated. A Trainium 3 deployment at AWS, where Anthropic is training models on hundreds of thousands of chips at 50% better price-performance than equivalent Nvidia hardware, represents a data point that lenders underwriting GPU-collateralised debt must factor into their view of how quickly GPU hardware economic value depreciates. As covered in our analysis of the GPU-as-a-service neocloud business model, the neocloud operators who built their businesses around Nvidia GPU hardware access face the most direct exposure to the competitive pressure that custom silicon creates.
What the Arms Race Means for Enterprise AI Buyers
For enterprise AI buyers, the custom silicon AI accelerator race has consequences that are simultaneously positive in the short term and complex in the medium term. The short-term consequence is lower inference costs. As hyperscaler custom silicon programs reduce the cost of serving AI workloads internally, those cost savings flow through to enterprise cloud pricing over time. AWS charging less for Trainium-based inference than for equivalent Nvidia GPU-based inference creates competitive pressure on Google Cloud and Azure to match or beat that pricing with their own custom silicon economics, driving a competitive dynamic that benefits enterprise buyers regardless of which hyperscaler’s infrastructure their workloads run on.
The medium-term consequence is more complex. Hyperscaler custom silicon programs are optimised for specific model architectures and workload patterns that reflect the hyperscalers’ own AI applications. An enterprise whose AI workloads align well with those patterns will benefit from the cost and performance improvements that custom silicon delivers. An enterprise whose workloads require the flexibility and breadth of Nvidia’s CUDA ecosystem will find that the inference cost advantages of custom silicon are not accessible to them without significant engineering investment in porting workloads to alternative hardware platforms. The diversification of the AI accelerator market is creating a more complex decision landscape for enterprise AI infrastructure strategy, where the optimal hardware choice depends increasingly on workload specifics rather than on a simple preference for the dominant platform.
The Outcome Nobody Is Fully Pricing In
TrendForce forecasts that ASIC-based AI servers will rise to over 30% of shipments by 2026, with custom silicon expected to surpass GPU shipments among top hyperscalers by 2028 as inference workloads drive adoption at scale. That trajectory represents a structural market shift that the infrastructure investment community has not fully priced into either Nvidia’s valuation or the valuations of the neocloud operators whose competitive positions depend on Nvidia GPU hardware remaining the dominant AI accelerator platform.
The uncertainty around that projection is genuine. Nvidia’s software ecosystem advantage is real and has proven durable through previous challenges. The CUDA switching costs that enterprise developers face when migrating to alternative hardware are not trivial, and the open-source compiler frameworks that are reducing those costs are still maturing. AMD’s MI450 roadmap may or may not close the performance gap with Blackwell and its successors sufficiently to capture the merchant GPU market share that hyperscaler custom silicon is vacating. And the hyperscaler custom silicon programs themselves face the risk that TSMC capacity constraints, design complexity challenges, or software ecosystem gaps limit their deployment faster than their roadmaps imply.
The direction of travel is clear. The custom silicon AI accelerator race is real, the programs operate at production scale, partners have locked in design relationships, and companies have committed manufacturing capacity. Structural, not cyclical, inference economics are pushing hyperscalers toward custom silicon. Custom silicon is already reshaping the AI accelerator market. The real question is how quickly that shift unfolds and how well operators, investors, and enterprise buyers account for a hardware landscape that looks materially different from the one they originally based their decisions on.
