RNGD Product Page
The most efficient data center accelerator for high-performance LLM and Multimodal deployment
Llama 2 7B
L40S | H100 | RNGD | ||
---|---|---|---|---|
Perf/Watt (tokens/sec/W) | Batch Size=16, Input Length=2K, Output Length=2K | 1.52 | Undisclosed | 6.24 |
Batch Size=32, Input Length=2K, Output Length=2K | Undisclosed | 3.19 | 8.62 |
L40S | H100 | RNGD | ||
---|---|---|---|---|
1st Token Latency (ms) | Batch Size=1, Sequence Length=128 | 14 | 7 | 8 |
L40S | H100 | RNGD | ||
---|---|---|---|---|
Throughput tokens (tokens/s) | Batch Size=16, Input Length=2K, Output Length=2K | 531 | Undisclosed | 935 |
Batch Size=32, Input Length=2K, Output Length=2K | Undisclosed | 2230 | 1293 |
Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference, on February 14, 2024.
L40S | H100 | RNGD | |
---|---|---|---|
Technology | TSMC 5nm | TSMC 4nm | TSMC 5nm |
BF16/FP8 (TFLOPS) | 362/733 | 989/1979 | 256/512 |
INT8/INT4 (TOPS) | 733/733 | 1979/- | 512/1024 |
Memory Capacity (GB) | 48 | 80 | 48 |
Memory Bandwidth (TB/s) | 0.86 | 3.35 | 1.5 |
Host I/F | Gen4 x16 | Gen5 x16 | Gen5 x16 |
TDP (W) | 350 | 700 | 150 |
Purpose-built for tensor contraction
How Furiosa TCA unlocks powerful performance and energy efficiency
AI models structure data in tensors of various dimensions. The architecture adapts to each tensor contraction via compiler-defined tactics.
Intermediary tensors are maintained in the on-chip memory (SRAM), akin to model-wise operator fusion.
This allows the chip to fully exploit parallelism and maximize data reuse for maximum utilization for inference deployment.
Meet the RNGD Series
RNGD-S
Leadership performance for creatives, media and entertainment, and video AI
RNGD
Versatile cloud and on-prem LLM and Multimodal deployment
RNGD-Max
Powerful cloud and on-prem LLM and Multimodal deployment