RNGD Enters Mass Production for Data Center AI

No items found.

FuriosaAI is officially shipping RNGD in volume.

With the first delivery of 4,000 units already received from our partners TSMC and ASUS, our high-performance AI chip is immediately available to enterprise customers globally as both a standalone PCIe card and a turnkey server.

"Turning a first-principles architecture into volume silicon is something few AI chip startups have achieved,” said Furiosa co-founder and CEO June Paik.

“With RNGD now shipping, we are giving enterprises the power to run the most advanced LLMs and agentic AI at scale, without the massive energy and infrastructure penalties of legacy solutions."

We will build on this success and move forward aggressively to achieve our mission to make high-performance AI computing truly sustainable for every enterprise.”

High Performance, No Data Center Retrofit

Modern AI models require massive compute, but the vast majority of enterprise data centers are air-cooled and capped at 15kW per rack. Power-hungry legacy GPUs, often burning 600W+ per chip, require expensive and time-consuming infrastructure upgrades. GPU data centers face substantial additional hurdles in obtaining, delivering, and paying for the huge quantities of electricity they need to operate. The industry needs a high-performance alternative that works in the racks you have today.

RNGD is a data center inference accelerator for advanced language models and generative and agentic AI. It delivers both performance (512 INT8 TFLOPS) and breakthrough energy efficiency, which is critical both for overcoming infrastructure bottlenecks and lower Total Cost of Ownership.

FuriosaAI has steadily advanced hardware stabilization and software stack optimization, resulting in real-world milestones such as LG AI Research’s EXAONE adoption and a public demonstration of gpt-oss models in collaboration with OpenAI in the second half of last year.

In parallel, the company has established a stable manufacturing and supply chain through close partnerships with TSMC, SK hynix, and other global technology leaders.

Because our architecture is purpose-built for AI, RNGD delivers 3.5x greater compute density (throughput per rack) than H100-based systems in standard environments. RNGD is now available in two form factors:

RNGD PCIe Card: A drop-in accelerator delivering frontier model performance at a strict 180W TDP.
NXT RNGD Server: A plug-and-play, 4U rackmount server housing 8 RNGD cards. Because the system draws just 3kW, you can stack five NXT RNGD Servers in a single standard air-cooled rack, delivering 20 petaFLOPS (INT8) per rack.

RNGD is supported by a full-featured SDK which provides advanced optimization techniques, such as inter-chip tensor parallelism, and support for popular models such as Qwen 2 and Qwen 2.5. The Furiosa SDK offers torch.compile support, a drop-in replacement for vLLM, and OpenAI API compatibility, enabling developers to move quickly with minimal changes to their existing code. Precompiled artifacts on the Hugging Face Hub now support context lengths up to 32K tokens, enabling more complex and context-aware applications.

Silicon Proven in Production

In rigorous validation using their EXAONE model, LG AI Research confirmed RNGD delivers 2.25x better performance-per-watt than comparable GPUs.

We optimized OpenAI’s 120B parameter GPT-OSS model to run on just two RNGD cards, proving we can handle massive parameter counts with a fraction of the hardware usually required.

The costs and infrastructure challenges of running advanced AI on GPUs are becoming a bottleneck for the entire industry. But there is now an alternative solution for high-performance inference in any data center. RNGD is shipping, scalable, and ready to deploy now.

To inquire about the RNGD card or NXT RNGD Server, contact us here.