Why Co-Packaged Optics Is Becoming the Next Critical Layer of AI Infrastructure

April 16, 2026
AI & Machine Learning
World
Akash

Share the Post:

The networking layer of AI data centers has historically attracted less attention than compute and cooling. GPUs generate the headlines. Power consumption drives the policy debates. Cooling determines where facilities can be built. The industry treats the cables and switches connecting everything as implementation details and incrementally upgrades them as bandwidth demand increases.

That framing is no longer accurate. The bandwidth requirements of current and next-generation AI clusters have outpaced what conventional electrical interconnects can deliver at acceptable power consumption levels. Co-packaged optics, a technology that integrates optical components directly with switching and processor silicon rather than connecting them through pluggable transceivers, is transitioning from a research project to a production deployment requirement. The companies building AI infrastructure today are making architectural decisions about interconnects that will shape their compute density, power efficiency, and operational economics for years. Understanding why co-packaged optics is moving from optional enhancement to structural requirement requires examining what conventional electrical interconnects can no longer deliver at AI scale.

The I/O Wall That Electrical Interconnects Cannot Climb

AI clusters are growing in ways that expose a fundamental physical limit of copper-based interconnects. A training cluster connecting 100,000 GPUs, which is not an aspirational scale for 2026 but a current deployment reality for leading hyperscalers, requires an interconnect fabric that can move data between those GPUs fast enough to keep them productively occupied. At 200 gigabits per second per lane, which is the speed that current high-end interconnects operate at, copper’s physics becomes a binding constraint. Signal integrity degrades over distance, power consumption increases with frequency, and the heat generated by high-speed copper interconnects at the densities that AI clusters require compounds the thermal management challenge that GPU clusters already create.

Beyond GPUs, the hidden architecture powering the AI revolution also includes the networking fabric that determines whether GPU compute capacity can actually be utilised effectively. Jensen Huang describes this constraint as an I/O wall: engineers can scale compute through process improvements and 3D packaging, but chip-to-chip I/O does not scale at the same rate, creating a growing gap between what GPU clusters can theoretically compute and what the interconnect fabric actually delivers. That gap is visible in the utilisation metrics of large AI training clusters, where effective GPU utilisation is often substantially below theoretical peak because the network cannot feed data fast enough to keep all compute units busy.

Why the 1.6T Generation Makes Copper Unworkable

The transition from 800 gigabit to 1.6 terabit interconnect speeds, which is underway in 2026, pushes copper further into its physical limits. At 224 gigabits per second per lane, which the 1.6 terabit generation requires, traditional passive copper cannot reliably span distances beyond a single server rack, and in some configurations cannot reliably span within a rack. That reach constraint is not solvable through better cable manufacturing. It reflects the fundamental physics of how electromagnetic signals propagate at high frequencies through conductive materials.

What Co-Packaged Optics Changes

Co-packaged optics addresses the I/O wall by replacing electrical signal paths between optical engines and switching silicon with photonic signal paths that operate at fundamentally different physics. A conventional pluggable optical transceiver sits at the edge of a switch, converting electrical signals from the ASIC to optical signals for transmission and back again. That conversion adds latency, consumes power, and requires physical space at the switch faceplate that limits port density.

CPO integrates the optical engine directly with the switching ASIC inside the same package, eliminating the electrical path between them and replacing it with a photonic connection. That integration reduces power consumption for the interconnect function by as much as 3.5 times compared to pluggable transceiver architectures, according to NVIDIA’s published specifications for its Quantum-X and Spectrum-X photonics switch platforms. It improves signal reliability by removing the mechanical connections and insertion points that pluggable transceivers introduce. And it enables higher bandwidth density by packing more optical connections per unit of package area than faceplate-mounted pluggable modules allow.

How NVIDIA and Marvell Are Bringing CPO to Market

NVIDIA’s Quantum-X InfiniBand photonics switches began commercial availability in early 2026, delivering 115 terabits per second of throughput across 144 ports operating at 800 gigabits per second each. The Spectrum-X Ethernet photonics platform is following in the second half of 2026. Both platforms use silicon photonics integrated circuits fabricated through TSMC’s COUPE packaging platform, which bonds photonic integrated circuits with electronic control chips in a configuration that can be further evolved toward tighter integration in subsequent generations.

Marvell’s interconnect technology portfolio and Broadcom’s optical switch development represent the merchant silicon dimension of the same transition. Both companies are building CPO-enabled networking silicon that will be deployed in hyperscaler-designed switches alongside NVIDIA’s vertically integrated platforms. The competitive dynamics between NVIDIA’s integrated networking approach and the merchant silicon alternatives will shape the economics and architectural diversity of AI cluster interconnects over the next hardware cycle.

The Pluggable to CPO Transition Timeline

The transition from pluggable optical transceivers to co-packaged optics is not happening simultaneously across all interconnect tiers. The AI data center network has distinct segments with different bandwidth requirements, reach characteristics, and economics that will transition on different timelines.

Scale-out networking, which connects switches to each other across the data center fabric, is transitioning to CPO most rapidly because it operates at the highest bandwidth densities and the longest distances within a data center, where the power and bandwidth advantages of CPO are most material. Meta presented reliability data at OFC 2026 comparing pluggable and CPO implementations for scale-out switches, concluding that CPO demonstrated better reliability than pluggable transceivers in their evaluation, which accelerates adoption by removing the reliability risk premium that new interconnect technology typically carries.

Scale-up networking, which connects GPUs within a single server or rack, has historically used copper because the distances involved are short enough that copper’s physics are manageable. As rack densities increase and GPU interconnect requirements push beyond 800 gigabits per second, copper is reaching its limits even at these short distances. NVIDIA’s roadmap introduces CPO for scale-up interconnects beginning in 2027 and 2028, initially for cross-rack connections where reach requirements exceed what copper can reliably serve. A prototype scale-up rack system demonstrated by Ayar Labs and Wiwynn at OFC 2026 showed a configuration with 100 percent CPO interconnect, with copper retained only for power delivery and cooling systems.

The Infrastructure Implications for Operators

The shift toward optically integrated AI compute stacks creates operational considerations for data center operators that extend beyond the networking team. CPO-enabled switches require liquid cooling because the integration of optical components with high-power switching ASICs generates heat at densities that air cooling cannot manage. Operators planning deployments of CPO-based networking infrastructure need to ensure their cooling infrastructure can support it, which for facilities that have not yet deployed direct-to-chip or immersion cooling for compute may require infrastructure investment beyond the networking hardware itself.

The serviceability model for CPO also differs from pluggable transceivers in ways that affect operational planning. A failed pluggable transceiver can be replaced in the field by a technician with standard tools and a replacement module. CPO failure modes are more complex because the optical engine is integrated into the switch package. The industry is addressing this through architectures that use pluggable lasers at the front of compute trays, allowing the laser elements, which are the component most likely to require replacement, to remain field-serviceable while the integrated photonic circuitry remains in the package. That design choice represents a considered tradeoff between the performance advantages of full integration and the operational requirements of large-scale data center management.

Serviceability and Cost Considerations

The capital cost of CPO-enabled infrastructure is higher than pluggable alternatives in the near term, reflecting the newness of the manufacturing processes and the limited scale of current production. MediaTek and Microsoft’s collaboration on MicroLED interconnect technology represents one dimension of the broader industry investment in alternative optical interconnect approaches that could expand the competitive landscape and drive down costs as volumes increase. As CPO manufacturing scales through 2026 and 2027, improving cost parity with pluggable transceivers for equivalent bandwidth will reshape the economic case for operators who currently prioritise performance over lifecycle economics when evaluating the technology.

The operators who are designing AI infrastructure deployments today need to account for the CPO transition in their planning horizons. Facilities built around purely pluggable interconnect architectures for their initial AI deployments will require infrastructure upgrades within three to four years as CPO becomes the performance standard for scale-out networking and begins penetrating scale-up applications. Operators lower transition costs and complexity by incorporating CPO-compatible cooling infrastructure and switch form factor requirements into current facility designs, even while deploying pluggable switches initially.