RNGD Product Card, cropped image
Furiosa RNGD - Gen 2 data center accelerator

Powerfully efficient AI inference for enterprise and cloud

#1 EFFICIENT LLAMA INFERENCE

token/s/W

Llama70 B 31 Graph

Llama 3.1 70B

2,048 input tokens / 128 output tokens / x 8 cards

Furiosa Symbol in black

rngd

FuriosaSDK / FP8 / 957.05 token/s

H100 SXM

TensorRT-LLM 0.15.0 / FP8 / 2,064.53 token/s

L40S

TensorRT-LLM 0.15.0 / FP8 / 163.53 token/s

token/s/W

Llama8 B 31 Graph

Llama 3.1 8B

128 input tokens / 4,096 output tokens / x 1 card

Furiosa Symbol in black

rngd

FuriosaSDK / FP8 / 3,935.25 token/s

H100 SXM

TensorRT-LLM 0.15.0 / FP8 / 13,222.06 token/s

L40S

TensorRT-LLM 0.15.0 / FP8 / 2989.17 token/s

RNGD

H100 SXM

L40S

Technology

TSMC 5nm

TSMC 4nm

TSMC 5nm

BF16/FP8 (TFLOPS)

256/512

989/1979

362/733

INT8/INT4 (TOPS)

512/1024

1979/-

733/733

Memory Capacity (GB)

48

80

48

Memory Bandwidth (TB/s)

1.5

3.35

0.86

Host I/F

Gen5 x16

Gen5 x16

Gen4 x16

TDP (W)

180

700

350

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://github.com/NVIDIA/Tens..... /perf-overview.md, on Aug 25, 2024.

EFFICIENT AI INFERENCE IS HERE

Rev01 product image, front facing
RNGD delivers high-performance LLM and multimodal deployment capabilities while maintaining a radically efficient 180W power profile.
512TFLOPS
64TFLOPS (FP8) x 8 processing elements
48GB
HBM3 memory capacity
2 x HBM3
CoWoS-S, 6.0Gbps
256MB SRAM
384TB/s on-chip bandwidth
1.5TB/s
HBM3 memory bandwidth
180W TDP
Targeting air-cooled data centers
PCIe P2P support for LLMBF16, FP8, INT8, INT4 support
Multiple-Instance and VirtualizationSecure boot & model encryption

Tensor contraction, not matmul

Black Frame
Tensor Contraction Processor (TCP)
At the heart of Furiosa RNGD is Tensor Contraction Processor architecture (ISCA 2024), specifically designed for efficient tensor contraction operations. The fundamental computation of modern day deep learning is tensor contraction, a higher dimensional generalization of matrix multiplication. However, most commercial deep learning accelerators today incorporate fixed-sized matmul instructions as primitives.

RNGD breaks away from that, unlocking powerful performance and efficiency.

Tensor Contraction Processor

Black Frame
TCP is the compute architecture underlying Furiosa accelerators. With tensor operation as the first-class citizen, Tensor Contraction Processor (TCP) unlocks unparalleled energy efficiency.

Tensor mapping for max utilization

Image of the tensor mapping for max utilization
We elevate the programming interface between hardware and software to treat tensor contraction as a single, unified operation.

This fundamental design choice streamlines programming, maximizing parallelism and data reuse, while providing flexibility and reconfigurability of compute and maximizes memory resources based on tensor shapes.

Furiosa Compiler leverages this flexibility and reconfigurability of hardware to select the most optimized tactics, delivering powerful and efficient deep learning acceleration for all scales of deployment.

Advanced packaging technology

3D rendered image of a Furiosa computer chip with multiple layers separated, showing the circuitry and a holographic-like translucent layer on top.
For optimal single-chip compute density, memory bandwidth, and energy efficiency.

Turnkey AI inference you can own today

Product image of the RNGD server
Furiosa NXT RNGD (pronounced “renegade”) Server delivers exceptional performance with cost-efficient scalability for inference with advanced LLM and agentic AI applications. Designed for air-cooled data centers, the NXT RNGD Server can be deployed on-premises, in managed environments, or colocation facilities.
8 x RNGD
Tensor Contraction Processor (TCP)
12 TB/s
Memory Bandwidth
4 petaFLOPS
512 TFLOPS (FP8) x 8 RNGDs
3 kW
Power Consumption
384 GB
HBM3 Capacity
Dual AMD EPYC 9354
CPU
“After extensively testing a wide range of options, we found RNGD to be a highly effective solution for deploying EXAONE models. RNGD provides a compelling combination of benefits: excellent real-world performance, a dramatic reduction in our total cost of ownership, and a surprisingly straightforward integration."
Kijeong Jeon, the leader of Product Unit at LG AI Research.

SOFTWARE FOR LLM DEPLOYMENT

Furiosa SW Stack consists of a model compressor, serving framework, runtime, compiler, profiler, debugger, and a suite of APIs for ease of programming and deployment.
Fai sw Stack cubes 5 RGB 1 Infographic
Built for advanced inference deployment
Comprehensive software toolkit for optimizing large language models on RNGD. User-friendly APIs facilitate seamless state-of-the-art LLM deployment.
Maximizing data center utilization
Ensure higher utilization and flexibility for small and large deployments with containerization, SR-IOV, Kubernetes, as well as other cloud native components.
Robust ecosystem support
Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.
Abstract black background with flowing waves of red halftone dots that form a dynamic, curved pattern across the image.

BE IN THE KNOW

Sign up to be notified first about RNGD availability and product updates.

Blog

Secure, Production-Ready Agentic AI: The Furiosa and Helikai Partnership

News
Secure, Production-Ready Agentic AI: The Furiosa and Helikai Partnership

RNGD Enters Mass Production: 4,000 High-Performance AI Accelerators Shipped by TSMC

News
RNGD Enters Mass Production: 4,000 High-Performance AI Accelerators Shipped by TSMC

How PyTorch handles dynamic tensor shapes

Our Viewpoints
How PyTorch handles dynamic tensor shapes