FuriosaAI

Sign up to learn more about RNGD Contact Us

Frontpage

Furiosa RNGD - Gen 2 data center accelerator

Powerfully efficient AI inference for enterprise and cloud

#1 EFFICIENT LLAMA INFERENCE

token/s/W

70 B L40 S 2

token/s/W

70 B H100 2

Llama 3.1 70B

2,048 input tokens / 128 output tokens / x 8 cards

rngd

FuriosaSDK / FP8 / 957.05 token/s

l40s

TensorRT-LLM 0.15.0 / FP8 / 163.53 token/s

h100 sxm

TensorRT-LLM 0.15.0 / FP8 / 2,064.53 token/s

token/s/W

8 B L40 S 2

token/s/W

8 B H100 2

Llama 3.1 8B

128 input tokens / 4,096 output tokens / x 1 card

rngd

FuriosaSDK / FP8 / 3,935.25 token/s

l40s

TensorRT-LLM 0.15.0 / FP8 / 2,989.17 token/s

h100 sxm

TensorRT-LLM 0.15.0 / FP8 / 13,222.06 token/s

RNGD L40S H100 SXM
Technology TSMC 5nm TSMC 5nm TSMC 4nm
BF16/FP8 (TFLOPS) 256/512 362/733 989/1979
INT8/INT4 (TOPS) 512/1024 733/733 1979/-
Memory Capacity (GB) 48 48 80
Memory Bandwidth (TB/s) 1.5 0.86 3.35
Host I/F Gen5 x16 Gen4 x16 Gen5 x16
TDP (W) 180 350 700

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://github.com/NVIDIA/Tens... /perf-overview.md, on Aug 25, 2024.

INFERENCE WITHOUT CONSTRAINTS

Performance

Deploy the most capable models with low latency and high throughput

Efficiency

Lower total cost of ownership with less energy, fewer racks, and air-cooled data centers of today

Programmability

Stay future-proof for tomorrow’s models and transition with ease

EFFICIENT AI INFERENCE IS HERE

Rev01 front

RNGD (pronounced "Renegade") delivers high-performance LLM and multimodal deployment capabilities while maintaining a radically efficient 180W power profile.

512TFLOPS
64TFLOPS (FP8) x 8 processing elements
48GB
HBM3 memory capacity
2 x HBM3
CoWoS-S, 6.0Gbps
256MB SRAM
384TB/s on-chip bandwidth
1.5TB/s
HBM3 memory bandwidth
180W TDP
Targeting air-cooled data centers
PCIe P2P support for LLM BF16, FP8, INT8, INT4 support
Multiple-instance and virtualization Secure boot & model encryption

SOFTWARE FOR LLM DEPLOYMENT

Furiosa SW Stack consists of a model compressor, serving framework, runtime, compiler, profiler, debugger, and a suite of APIs for ease of programming and deployment.

Available now.

Fai sw Stack cubes 5 RGB 1

Built for advanced inference deployment

Comprehensive software toolkit for optimizing large language models on RNGD. User-friendly APIs facilitate seamless state-of-the-art LLM deployment.

Maximizing data center utilization

Ensure higher utilization and flexibility for small and large deployments with containerization, SR-IOV, Kubernetes, as well as other cloud native components.

Robust ecosystem support

Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.

Series RNGD

2025

RNGD-S

Leadership performance for creatives, media and entertainment, and video AI

Q3 2024

RNGD

150W versatile inference for all infrastructure deployments

2025

RNGD-MAX

350W powerful inference with maximum compute density

Blackbg

BE IN THE KNOW

Sign up to be notified first about RNGD availability and product updates.