Join us August 25-27 at Hot Chips 2024 Learn more

Furiosa Software Stack

Built for superior utilization, performance for AI inference

Furiosa’s Software Stack is co-designed with the RNGD hardware from day one. They work together to achieve industry-leading performance and efficiency for deploying AI models for today and the future.



Bring your PyTorch model and weights for deep learning inference. Our comprehensive software stack - including compiler, runtime, compressor, and profiler - optimizes and distributes the model across multiple RNGD chips. The serving engine makes the model instantly available to your customers. Deploy efficiently and at scale with our Kubernetes integrations and cloud native support.


Maximizing Data Center Utilization

Designed for data centers, utilizing containerization, virtual machine technologies, and Kubernetes. Each RNGD card with 8 cores can be partitioned into multiple NPUs with full isolation powered by SR-IOV technologies. This flexibility gives higher utilization for diverse workloads.

Built for LLM Inference and more

A comprehensive software toolkit, including compiler, compression, quantization, and model parallelizer, designed to optimize the large language models on RNGD. Accelerate state-of-the-art LLM employment seamlessly with our user-friendly APIs.

Robust Ecosystem Support

Seamlessly integrate our accelerator for inference into new or existing workflows. Support popular AI frameworks like PyTorch 2.0. Optimum Furiosa offers pre-optimized, state-of-the-art AI models like BERT, Llama 2 and 3, and Mixtral for easy deployment and evaluation.