FuriosaAI partners with Broadcom to build next-generation inference platform for the Agentic Era

No items found.

AI is at a major inflection point as agentic AI emerges and AI data centers rapidly shift toward inference-centric infrastructure. These next-generation agentic applications, powered by increasingly capable frontier AI models with advanced reasoning abilities, require continuous loops of inference calls that generate enormous token volumes and massive compute, memory, and scale-up capabilities that push current hardware to its limit.

Today, FuriosaAI is announcing a strategic partnership with Broadcom to develop our third-generation AI accelerator. This collaboration evolves Furiosa's Tensor Contraction Processor (TCP) architecture into a scale-up inference platform engineered for serving frontier agentic systems at massive scale of global hyperscale environments.

"Inference performance is no longer defined solely by raw compute. It is increasingly a function of data reuse and communication efficiency across servers and racks," said Charlie Kawwas, Ph. D., president of Broadcom’s Semiconductor Solutions Group. "By pairing Furiosa’s TCP architecture with Broadcom’s market-leading XPU Technology and IP Platform, Ethernet scale-up and fabric switches, we are building a platform that addresses the key bottlenecks of large-scale agentic AI."

Proven maturity: RNGD in mass production for the data center

This partnership is built on a foundation of proven commercial maturity. FuriosaAI's data center inference chip, RNGD, is now in mass production. Fabricated at TSMC’s 5nm advanced process, RNGD is an 180W, PCIe-based accelerator designed for high-performance LLM and agentic AI workloads. RNGD has been validated in production environments by global leaders including Samsung SDS and LG AI Research, proving the efficiency of the TCP architecture.

Our hardware is supported by a software stack that provides a real-world alternative to CUDA. Furiosa’s SDK overcomes the "CUDA moat" by leveraging a general compiler, which automatically maps high-level PyTorch code to silicon with extreme efficiency. For cases requiring maximum performance, our Virtual ISA provides a declarative, low-level programming model for granular control without the non-deterministic complexity of traditional GPU programming, enabling developers to deploy new models and optimizations in days rather than months.

“Bringing together Broadcom’s infrastructure capabilities and Furiosa’s Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era,” said Furiosa Cofounder and CEO June Paik.

“Having proven the performance and efficiency of our architecture with RNGD, our second-generation chip now in mass production with TSMC, we will deliver a third-generation inference solution that offers industry-leading performance per watt for even the largest, most complex frontier AI models and agentic workloads.”

Furiosa's third-gen AI accelerator for the high-volume token requirements of global hyperscale environments.

Why TCP is the right architecture for the inference era

GPUs carry a "legacy tax" from their origins in graphics. Their SIMT (Single Instruction, Multiple Threads) model struggles with the irregular memory patterns and high-frequency communication required by modern data center workloads.

Our Tensor Contraction Processor (TCP) is a clean-sheet design optimized for the mathematical heart of AI.

TCP focuses on high-bandwidth data movement and massive tensor operations rather than managing thousands of tiny threads. It treats memory access as a first-class citizen, eliminating the efficiency "cliff" GPUs hit when models outgrow rigid cache hierarchies.

TCP achieves superior performance-per-watt, maximizing token density in power-constrained data center racks.

A roadmap for frontier inference at scale

Our third-generation platform represents a significant evolution in technical capability, incorporating HBM4/4E, 2nm process technology, and high-speed inter-chip networking. Our design prioritizes an all-to-all-capable topology to support the complex communication patterns—like Mixture-of-Experts (MoE) routing—essential for frontier models in hyperscale environments.

The future of AI is not just about raw TFLOPS; it is about building a sustainable, efficient, and integrated infrastructure for all data centers. Through this partnership with Broadcom, and backed by the proven success of RNGD and our software stack, FuriosaAI is establishing the new standard for the world’s most demanding inference needs.

Furiosa's RNGD accelerator is now in mass production.

Written by

The Furiosa Team

FuriosaAI partners with Broadcom to build next-generation inference platform for the Agentic Era

Proven maturity: RNGD in mass production for the data center

Why TCP is the right architecture for the inference era

A roadmap for frontier inference at scale

Blog

FuriosaAI CEO June Paik Meets The Princess Royal at British Embassy Seoul

FuriosaAI and Samsung SDS Launch Korea’s First Domestic NPUaaS to Expand Enterprise AI Access

FuriosaAI Expands European AI Infrastructure with RNGD Deployment at Equinix’s Lisbon Data Center

Get the latest updates on FuriosaAI