SOURCE: d-Matrix
Generative AI inference pioneer d-Matrix, in collaboration with AI infrastructure leaders Arista, Broadcom and Supermicro, announced SquadRackâ„¢, the industry’s first blueprint for disaggregated standards-based rack-scale solutions for ultra-low latency batched inference.Â
SquadRack comes at a time when cloud providers, including sovereign clouds and enterprises, are struggling to keep up with generative AI inference demands. SquadRack provides a reference architecture to build turnkey solutions enabling blazing fast agentic AI, reasoning and video generation. It delivers up to 3x better cost-performance, 3x higher energy efficiency, and up to 10x faster token generation speeds compared to traditional accelerators.
SquadRack configured with eight nodes in a single rack enables customers to run Gen AI models up to 100 billion parameters with blazing fast speed. For larger models or large-scale deployments, it uses industry standards-based ethernet to scale out to hundreds of nodes across multiple racks.
“With the launch of SquadRack, we’re enabling customers to scale inference the right way — with high efficiency, low latency, and standards-based deployment. Corsair delivers the compute-memory acceleration, while JetStream delivers I/O acceleration. Combined with Supermicro’s AI servers, Arista’s ethernet switches, and Broadcom’s PCIe and ethernet switch chips, we’re delivering an AI inference rack that speeds up time to deployment. It’s a big step forward in making AI infrastructure commercially viable at scale.”- Sid Sheth, CEO and Co-Founder, d-Matrix.
“Supermicro is proud to collaborate with d-Matrix in delivering an efficient AI inference rack solution that combines compute acceleration, efficient networking, and server density in one integrated platform. Our proven track record in rack-level integration, along with d-Matrix’s inference acceleration products, offers customers a practical path to scaling AI inference across the enterprise and cloud.”- Vik Malyala, President & Managing Director, EMEA and SVP Technology & AI, Supermicro.
“As a leader in high-performance PCIe and Ethernet connectivity, Broadcom is excited to see d-Matrix advancing AI infrastructure solutions. d-Matrix is unlocking a new level of performance and efficiency in AI inference while leveraging the standards-based networking ecosystem that Broadcom has long supported.”- Jas Tremblay, Vice President and General Manager, Data Center Solutions Group, Broadcom.
“Arista’s cloud networking fabric is designed to meet the rigorous demands of AI infrastructure. JetStream’s ability to enable accelerator-to-accelerator communication over standard ethernet pairs perfectly with Arista’s high-performance switches. Together, we’re demonstrating how AI inference can scale efficiently without requiring proprietary networking fabrics.”- Vijay Vusirikala, Distinguished Lead, AI Systems and Networks, Arista Networks.
SquadRack’s key components include:
- d-Matrix Corsairâ„¢ Inference Accelerators with innovative compute-memory integration delivering ultra-low latency, high-throughput inferenceÂ
- d-Matrix JetStreamâ„¢ IO Accelerators enabling ultra-low latency device-initiated, accelerator-to-accelerator communication using standard ethernetÂ
- Supermicro X14 AI Server Platform integrated with Corsair accelerators and JetStream NICsÂ
- Broadcom PCIe switches for scaling up within a single node
- Arista Leaf Ethernet Switches connected to JetStream NICs enabling high performance, scalable, standards-based multi-node communicationÂ
- d-Matrix Aviatorâ„¢ software stack that makes it easy for customers to deploy Corsair and JetStream at scale and speed up time to inference
SquadRack configurations will be available for purchase through Supermicro in Q1’26.
