Huawei has taken a major step forward in AI computing with the unveiling of its Supernode 384 architecture, a development that signals a serious challenge to Nvidia’s dominance in the field. The announcement came during the Kunpeng Ascend Developer Conference held in Shenzhen last Friday, where Huawei presented its latest innovation amid ongoing US-led trade restrictions and rising tech tensions between China and the West.
Zhang Dixuan, head of Huawei’s Ascend computing unit, explained during his keynote that the need for a new approach stemmed from scalability issues in traditional server systems. “As parallel computing scales, cross-machine bandwidth in conventional architectures has become a major limitation for AI training,” he said.
To solve this, Huawei moved away from the traditional Von Neumann model, instead opting for a peer-to-peer design tailored specifically for the growing demands of AI workloads, particularly Mixture-of-Experts (MoE) models, which rely on multiple specialised subnetworks to process complex tasks efficiently.
The result is the CloudMatrix 384 system, which combines 384 Ascend AI chips across 12 compute and 4 bus cabinets. The system delivers a staggering 300 petaflops of raw compute power, alongside 48TB of high-bandwidth memory — a notable leap in AI infrastructure capability.
Benchmark tests place the Supernode 384 in a strong competitive position. For example, Meta’s LLaMA 3 models ran at 132 tokens per second per card on the system about 2.5 times faster than on typical cluster setups.
Meanwhile, AI models from Alibaba (Qwen) and DeepSeek performed even better, reaching speeds between 600 and 750 tokens per second per card. These figures highlight how the Supernode 384 is tuned for the heavy communication demands of next-gen AI applications.
This performance boost is largely due to Huawei’s switch from traditional Ethernet interconnects to high-speed bus communication, which resulted in a 15x increase in bandwidth and a tenfold drop in latency—from 2 microseconds to just 200 nanoseconds.
The development of Supernode 384 also reflects broader geopolitical currents. With American sanctions limiting Huawei’s access to cutting-edge semiconductors, the company has been forced to innovate around these barriers.
According to SemiAnalysis, Huawei’s CloudMatrix 384 is likely powered by the Ascend 910C processor. While this chip may not match the peak performance of the latest US offerings, the overall system design places Huawei ahead in terms of large-scale architectural innovation.
In short, Huawei’s AI strategy now hinges not just on chip specs, but on integrated systems designed to maximise performance through smarter architecture — a bold play to rival Nvidia’s stronghold on the AI computing market.
