FuriosaAI

Join us Nov 20-22 at AI EXPO TOKYO Learn more

RNGD Product Page

Rngdcard
Furiosa RNGD - Gen 2 data center accelerator

Powerfully efficient AI inference for Enterprise and Cloud

EFFICIENT LLM INFERENCE

‎Efficiency (tokens/watt)

3 8b h100

‎Efficiency ‎(tokens/watt)

3 8b l40s

Llama 3 8B

2048 input tokens / 2048 output tokens

rngd

FuriosaSDK / FP8 / 3047 tokens/s

h100 sxm

TensorRT-LLM 0.11.0 / FP8 / 8399 tokens/s

l40s

TensorRT-LLM 0.11.0 / FP8 / 1912 tokens/s

Efficiency (queries/watt)

GPT J H100

Efficiency (queries/watt)

GPT J l40s

GPT-J

MLPerf data center, closed, offline scenario / 99.9% accuracy

rngd

FuriosaSDK / FP8 / 15.13 queries/s

h100 sxm

TensorRT-LLM 0.11.0 / FP8 / 30.375 queries/s

l40s

TensorRT-LLM 0.11.0 / FP8 / 12.25 queries/s

RNGD H100 SXM L40S
Technology TSMC 5nm TSMC 4nm TSMC 5nm
BF16/FP8 (TFLOPS) 256/512 989/1979 362/733
INT8/INT4 (TOPS) 512/1024 1979/- 733/733
Memory Capacity (GB) 48 80 48
Memory Bandwidth (TB/s) 1.5 3.35 0.86
Host I/F Gen5 x16 Gen5 x16 Gen4 x16
TDP (W) 150 700 350

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://github.com/NVIDIA/Tens... /perf-overview.md, on Aug 25, 2024.

EFFICIENT AI INFERENCE IS HERE

Rev01 front

RNGD delivers high-performance LLM and multimodal deployment capabilities while maintaining a radically efficient 150W power profile.

512 TFLOPS
64 TFLOPS (FP8) x 8 processing elements
48 GB
HBM3 memory capacity
2 x HBM3
CoWoS-S, 6.0 Gbps
256 MB SRAM
384 TB/s on-chip bandwidth
1.5 TB/s
HBM3 memory bandwidth
150 W TDP
Targeting air-cooled data centers
PCIe P2P support for LLM BF16, FP8, INT8, INT4 support
Multiple-Instance and Virtualization Secure boot & model encryption

SOFTWARE FOR LLM DEPLOYMENT

Furiosa SW Stack consists of a model compressor, serving framework, runtime, compiler, profiler, debugger, and a suite of APIs for ease of programming and deployment.

Coming soon publicly in Q4.

Fai sw Stack cubes 5 RGB 1

Built for advanced inference deployment

Comprehensive software toolkit for optimizing large language models on RNGD. User-friendly APIs facilitate seamless state-of-the-art LLM deployment.

Maximizing Data Center Utilization

Ensure higher utilization and flexibility for small and large deployments with containerization, SR-IOV, Kubernetes, as well as other cloud native components.

Robust Ecosystem Support

Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.

Series RNGD

2025

RNGD-S

Leadership performance for creatives, media and entertainment, and video AI

Q3 2024

RNGD

150W versatile inference for all infrastructure deployments

2025

RNGD-MAX

350W powerful inference with maximum compute density

Blackbg

BE IN THE KNOW

Sign up to be notified first about RNGD availability and product updates.