RNGD Product Page
![Rngd chip](https://furiosa.imgix.net/rngd_chip.png?auto=compress%2Cformat&crop=focalpoint&fit=crop&fm=webp&fp-x=0.5&fp-y=0.5&h=229&q=85&transformer=imgix&w=500)
The most efficient data center accelerator for high-performance LLM and multimodal deployment
- 512 TFLOPS
- 64 TFLOPS (FP8) x 8 Processing Elements
- 48 GB
- HBM3 Memory Capacity
- 1.5 TB/s
- Memory Bandwidth
- 150 W
- Thermal Design Power
- 512 TFLOPS
- 64 TFLOPS (FP8) x 8 Processing Elements
- 48 GB
- HBM3 Memory Capacity
- 1.5 TB/s
- Memory Bandwidth
- 150 W
- Thermal Design Power
Tensor Contraction Processor
Tensor Contraction Processor (TCP) is the compute architecture underlying Furiosa accelerators. With tensor operation as the first-class citizen, TCP unlocks unparalleled energy efficiency.
Performance results
Llama 2 7B
Energy Efficiency
Perf/Watt (tokens/s/W)
*Higher value is better
Batch=32, Input Length=2K, Output Length=2K
Batch=16, Input Length=2K, Output Length=2K
Latency
(ms)
*Lower value is better
Batch=1, Sequence Length=128
Batch=1, Sequence Length=128
Throughput
(tokens/s)
*Higher value is better
Batch=16, Input Length=2K, Output Length=2K
Batch=32, Input Length=2K, Output Length=2K
RNGD | H100 | L40S | |
---|---|---|---|
Technology | TSMC 5nm | TSMC 4nm | TSMC 5nm |
BF16/FP8 (TFLOPS) | 256/512 | 989/1979 | 362/733 |
INT8/INT4 (TOPS) | 512/1024 | 1979/- | 733/733 |
Memory Capacity (GB) | 48 | 80 | 48 |
Memory Bandwidth (TB/s) | 1.5 | 3.35 | 0.86 |
Host I/F | Gen5 x16 | Gen5 x16 | Gen4 x16 |
TDP (W) | 150 | 700 | 350 |
Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference, on February 14, 2024.
Purpose-built for tensor contraction
AI models structure data in tensors of various shapes. The RNGD chip fully exploits parallelism and data reuse by flexibly adapting to each tensor contraction with software-defined tactics and supporting model-wise operator fusion.
Uniquely designed for AI inference deployment, Furiosa TCP unlocks superior utilization, performance and energy efficiency.
AI models structure data in tensors of various shapes. The RNGD chip fully exploits parallelism and data reuse by flexibly adapting to each tensor contraction with software-defined tactics and supporting model-wise operator fusion.
Series RNGD
![Fai mock RNGDS](https://furiosa.imgix.net/fai-mock-RNGDS.png?auto=compress%2Cformat&crop=focalpoint&fit=crop&fm=webp&fp-x=0.5&fp-y=0.5&h=746&q=85&transformer=imgix&w=500)
RNGD-S 2025
Leadership performance for creatives, media and entertainment, and video AI
![Fai mock RNGD](https://furiosa.imgix.net/fai-mock-RNGD.png?auto=compress%2Cformat&crop=focalpoint&fit=crop&fm=webp&fp-x=0.5&fp-y=0.5&h=473&q=85&transformer=imgix&w=500)
RNGD Q3 2024
Versatile cloud and on-prem LLM and Multimodal deployment
- 512 TFLOPS
- 48GB HBM3 Memory Capacity
- 1.5TB/s Memory Bandwidth
- 150W TDP
![Fai mock RNGD Max 1](https://furiosa.imgix.net/fai-mock-RNGDMax-1.png?auto=compress%2Cformat&crop=focalpoint&fit=crop&fm=webp&fp-x=0.5&fp-y=0.5&h=260&q=85&transformer=imgix&w=500)
RNGD-Max 2025
Powerful cloud and on-prem LLM and Multimodal deployment
FURIOSA STACK
Bring your PyTorch model and weights for deep learning inference. Our comprehensive software stack - including compiler, runtime, compressor, and profiler - optimizes and distributes the model across multiple RNGD chips. The serving engine makes the model instantly available to your customers. Deploy efficiently and at scale with our Kubernetes integrations and cloud native support.
Bring your PyTorch model and weights for deep learning inference. Our comprehensive software stack - including compiler, runtime, compressor, and profiler - optimizes and distributes the model across multiple RNGD chips. The serving engine makes the model instantly available to your customers. Deploy efficiently and at scale with our Kubernetes integrations and cloud native support.