FuriosaAI

Join us August 25-27 at Hot Chips 2024 Learn more

Frontpage

Meet RNGD - 2nd-gen AI accelerator

The most efficient data center accelerator for high-performance LLM and multimodal deployment

512 TFLOPS
64 TFLOPS (FP8) x 8 Processing Elements
48 GB
HBM3 Memory Capacity
1.5 TB/s
Memory Bandwidth
150 W
Thermal Design Power

Performance results

Llama 2 7B

Energy Efficiency

Perf/Watt (tokens/s/W)

*Higher value is better

Energy efficiency 3x

Batch=32, Input Length=2K, Output Length=2K

Energy efficiency 4x

Batch=16, Input Length=2K, Output Length=2K

Latency

(ms)

*Lower value is better

Latency h100

Batch=1, Sequence Length=128

Latency l40s

Batch=1, Sequence Length=128

Throughput

(tokens/s)

*Higher value is better

Throughput h100

Batch=16, Input Length=2K, Output Length=2K

Throughput l40s

Batch=32, Input Length=2K, Output Length=2K

RNGD H100 L40S
Technology TSMC 5nm TSMC 4nm TSMC 5nm
BF16/FP8 (TFLOPS) 256/512 989/1979 362/733
INT8/INT4 (TFLOPS) 512/1024 1979/- 733/733
Memory Capacity (GB) 48 80 48
Memory Bandwidth (TB/s) 1.5 3.35 0.86
Host I/F Gen5 x16 Gen5 x16 Gen4 x16
TDP (W) 150 700 350

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://developer.nvidia.com/d..., on February 14, 2024.

Purpose-built for tensor contraction

Uniquely designed for AI inference deployment, Furiosa TCP unlocks superior utilization, performance and energy efficiency.

AI models structure data in tensors of various shapes. The RNGD chip fully exploits parallelism and data reuse by flexibly adapting to each tensor contraction with software-defined tactics and supporting model-wise operator fusion.

Series RNGD

RNGD-S 2025

Leadership performance for creatives, media and entertainment, and video AI

RNGD Q3 2024

Versatile cloud and on-prem LLM and Multimodal deployment

512 TFLOPS
48GB HBM3 Memory Capacity
1.5TB/s Memory Bandwidth
150W TDP

RNGD-Max 2025

Powerful cloud and on-prem LLM and Multimodal deployment

ENTERPRISE-READY & CLOUD-READY STACK

Swstack basic

Workflow Integration

Our platform integrates seamlessly with your existing workflow, offering easy-to-use APIs, support for popular AI frameworks like PyTorch, ONNX, TensorFlow Lite, and compatibility with NumPy data structures.

Model Optimization

Leveraging our advanced compiler, we ensure your models achieve peak performance per watt. With our profiling tools identifying performance bottlenecks, we can further enhance model performance and efficiency.

Flexible Deployment

Furiosa hardware provides the flexibility to choose multiple processing elements (PEs) depending on your workloads. It’s designed for data center adaptability, incorporating containerization and Kubernetes, to facilitate the swift scaling of AI projects.