FuriosaAI partners with Broadcom to build next-generation inference platform for the Agentic Era

News
May 27, 2026

Summary

Written by

The Furiosa Team

Share this article

No items found.

AI is at a major inflection point as agentic AI emerges and AI data centers rapidly shift toward inference-centric infrastructure. These next-generation agentic applications, powered by increasingly capable frontier AI models with advanced reasoning abilities, require continuous loops of inference calls that generate enormous token volumes and massive compute, memory, and scale-up capabilities that push current hardware to its limit.

Today, FuriosaAI is announcing a strategic partnership with Broadcom to develop our third-generation AI accelerator. This collaboration evolves Furiosa’s Tensor Contraction Processor (TCP) architecture into a multi-die chiplet system, creating a next-generation inference engine engineered for the high-volume token requirements of global hyperscale environments.

"Inference performance is no longer defined solely by raw compute. It is increasingly a function of data reuse and communication efficiency across servers and racks," said Charlie Kawwas, Ph. D., president of Broadcom’s Semiconductor Solutions Group. "By pairing Furiosa’s TCP architecture with Broadcom’s market-leading XPU Technology and IP Platform, Ethernet scale-up and fabric switches, we are building a platform that addresses the key bottlenecks of large-scale agentic AI."

Proven maturity: RNGD in mass production for the data center

This partnership is built on a foundation of proven commercial success. FuriosaAI’s data center inference chip, RNGD, is now in mass production. Fabricated by TSMC, RNGD delivers world-class performance and breakthrough energy efficiency for enterprise and hyperscale customers, proving that our architecture is a validated solution running frontier models in production environments.

Our hardware is supported by a software stack that provides a real-world alternative to CUDA. Furiosa’s SDK overcomes the "CUDA moat" by leveraging a general compiler, which automatically maps high-level PyTorch code to silicon with extreme efficiency. For cases requiring maximum performance, our Virtual ISA provides a declarative, low-level programming model for granular control without the non-deterministic complexity of traditional GPU programming, enabling developers to deploy new models and optimizations in days rather than months.

“Bringing together Broadcom’s infrastructure capabilities and Furiosa’s Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era,” said Furiosa Cofounder and CEO June Paik.

“Having proven the performance and efficiency of our architecture with RNGD, our second-generation chip now in mass production with TSMC, we will deliver a third-generation inference solution that offers industry-leading performance per watt for even the largest, most complex frontier AI models and agentic workloads.”

Furiosa's third-gen AI accelerator for the high-volume token requirements of global hyperscale environments.

Why TCP is the right architecture for the inference era

GPUs carry a "legacy tax" from their origins in graphics. Their SIMT (Single Instruction, Multiple Threads) model struggles with the irregular memory patterns and high-frequency communication required by modern data center workloads.

Our Tensor Contraction Processor (TCP) is a clean-sheet design optimized for the mathematical heart of AI.

TCP focuses on high-bandwidth data movement and massive tensor operations rather than managing thousands of tiny threads. It treats memory access as a first-class citizen, eliminating the efficiency "cliff" GPUs hit when models outgrow rigid cache hierarchies.

TCP achieves superior performance-per-watt, maximizing token density in power-constrained data center racks.

A roadmap for frontier inference at scale

Our third-generation platform represents a significant evolution in technical capability, incorporating HBM4/4E, 2nm process technology, and high-speed inter-chip networking. Our design prioritizes an all-to-all-capable topology to support the complex communication patterns—like Mixture-of-Experts (MoE) routing—essential for frontier models in hyperscale environments.

The future of AI is not just about raw TFLOPS; it is about building a sustainable, efficient, and integrated infrastructure for all data centers. Through this partnership with Broadcom, and backed by the proven success of RNGD and our software stack, FuriosaAI is establishing the new standard for the world’s most demanding inference needs.

Furiosa's RNGD accelerator is now in mass production.

Written by

The Furiosa Team

Share this article

white dot background graphic

Get the latest updates on FuriosaAI