FuriosaAI

Join us Nov 20-22 at AI EXPO TOKYO Learn more

Frontpage

Furiosa RNGD - Gen 2 data center accelerator

Powerfully efficient AI inference for enterprise and cloud

EFFICIENT LLM INFERENCE

‎Efficiency (tokens/watt)

3 8b h100

‎Efficiency ‎(tokens/watt)

3 8b l40s

Llama 3 8B

2048 input tokens / 2048 output tokens

rngd

FuriosaSDK / FP8 / 3047 tokens/s

h100 sxm

TensorRT-LLM 0.11.0 / FP8 / 8399 tokens/s

l40s

TensorRT-LLM 0.11.0 / FP8 / 1912 tokens/s

Efficiency (queries/watt)

GPT J H100

Efficiency (queries/watt)

GPT J l40s

GPT-J

MLPerf data center, closed, offline scenario / 99.9% accuracy

rngd

FuriosaSDK / FP8 / 15.13 queries/s

h100 sxm

TensorRT-LLM 0.11.0 / FP8 / 30.375 queries/s

l40s

TensorRT-LLM 0.11.0 / FP8 / 12.25 queries/s

RNGD H100 SXM L40S
Technology TSMC 5nm TSMC 4nm TSMC 5nm
BF16/FP8 (TFLOPS) 256/512 989/1979 362/733
INT8/INT4 (TOPS) 512/1024 1979/- 733/733
Memory capacity (GB) 48 80 48
Memory bandwidth (TB/s) 1.5 3.35 0.86
Host I/F Gen5 x16 Gen5 x16 Gen4 x16
TDP (W) 150 700 350

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://github.com/NVIDIA/Tens... /perf-overview.md, on Aug 25, 2024.

INFERENCE WITHOUT CONSTRAINTS

Performance

Deploy the most capable models with low latency and high throughput

Efficiency

Lower total cost of ownership with less energy, fewer racks, and air-cooled data centers of today

Programmability

Stay future-proof for tomorrow’s models and transition with ease

EFFICIENT AI INFERENCE IS HERE

Rev01 front

RNGD (pronounced "Renegade") delivers high-performance LLM and multimodal deployment capabilities while maintaining a radically efficient 150W power profile.

512TFLOPS
64TFLOPS (FP8) x 8 processing elements
48GB
HBM3 memory capacity
2 x HBM3
CoWoS-S, 6.0Gbps
256MB SRAM
384TB/s on-chip bandwidth
1.5TB/s
HBM3 memory bandwidth
150W TDP
Targeting air-cooled data centers
PCIe P2P support for LLM BF16, FP8, INT8, INT4 support
Multiple-instance and virtualization Secure boot & model encryption

SOFTWARE FOR LLM DEPLOYMENT

Furiosa SW Stack consists of a model compressor, serving framework, runtime, compiler, profiler, debugger, and a suite of APIs for ease of programming and deployment.

Coming soon publicly in Q4.

Fai sw Stack cubes 5 RGB 1

Built for advanced inference deployment

Comprehensive software toolkit for optimizing large language models on RNGD. User-friendly APIs facilitate seamless state-of-the-art LLM deployment.

Maximizing data center utilization

Ensure higher utilization and flexibility for small and large deployments with containerization, SR-IOV, Kubernetes, as well as other cloud native components.

Robust ecosystem support

Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.

Series RNGD

2025

RNGD-S

Leadership performance for creatives, media and entertainment, and video AI

Q3 2024

RNGD

150W versatile inference for all infrastructure deployments

2025

RNGD-MAX

350W powerful inference with maximum compute density

Blackbg

BE IN THE KNOW

Sign up to be notified first about RNGD availability and product updates.