FuriosaAI

Sign up to learn more about RNGD Contact Us

RNGD Product Page

Rngdcard
Furiosa RNGD - Gen 2 data center accelerator

Powerfully efficient AI inference for Enterprise and Cloud

#1 EFFICIENT LLAMA INFERENCE

token/s/W

Llama70 B 31

token/s/W

Llama8 B 31

Llama 3.1 70B

2,048 input tokens / 128 output tokens / x 8 cards

rngd

FuriosaSDK / FP8 / 957.05 token/s

h100 sxm

TensorRT-LLM 0.15.0 / FP8 / 2,064.53 token/s

l40s

TensorRT-LLM 0.15.0 / FP8 / 163.53 token/s

Llama 3.1 8B

128 input tokens / 4,096 output tokens / x 1 card

rngd

FuriosaSDK / FP8 / 3,935.25 token/s

h100 sxm

TensorRT-LLM 0.15.0 / FP8 / 13,222.06 token/s

l40s

TensorRT-LLM 0.15.0 / FP8 / 2989.17 token/s

RNGD H100 SXM L40S
Technology TSMC 5nm TSMC 4nm TSMC 5nm
BF16/FP8 (TFLOPS) 256/512 989/1979 362/733
INT8/INT4 (TOPS) 512/1024 1979/- 733/733
Memory Capacity (GB) 48 80 48
Memory Bandwidth (TB/s) 1.5 3.35 0.86
Host I/F Gen5 x16 Gen5 x16 Gen4 x16
TDP (W) 180 700 350

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://github.com/NVIDIA/Tens..... /perf-overview.md, on Aug 25, 2024.

EFFICIENT AI INFERENCE IS HERE

Rev01 front

RNGD delivers high-performance LLM and multimodal deployment capabilities while maintaining a radically efficient 180W power profile.

512 TFLOPS
64 TFLOPS (FP8) x 8 processing elements
48 GB
HBM3 memory capacity
2 x HBM3
CoWoS-S, 6.0 Gbps
256 MB SRAM
384 TB/s on-chip bandwidth
1.5 TB/s
HBM3 memory bandwidth
180 W TDP
Targeting air-cooled data centers
PCIe P2P support for LLM BF16, FP8, INT8, INT4 support
Multiple-Instance and Virtualization Secure boot & model encryption

SOFTWARE FOR LLM DEPLOYMENT

Furiosa SW Stack consists of a model compressor, serving framework, runtime, compiler, profiler, debugger, and a suite of APIs for ease of programming and deployment.

Available now.

Fai sw Stack cubes 5 RGB 1

Built for advanced inference deployment

Comprehensive software toolkit for optimizing large language models on RNGD. User-friendly APIs facilitate seamless state-of-the-art LLM deployment.

Maximizing Data Center Utilization

Ensure higher utilization and flexibility for small and large deployments with containerization, SR-IOV, Kubernetes, as well as other cloud native components.

Robust Ecosystem Support

Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.

Blackbg

BE IN THE KNOW

Sign up to be notified first about RNGD availability and product updates.