Furiosa releases Furiosa SDK 2025.1.0

Technical Updates February 27, 2025

Share this article

We’re excited to introduce Furiosa SDK 2025.1.0 (release notes), packed with major enhancements for seamless LLM deployment with our user-friendly Furiosa LLM APIs. This release includes significant latency optimizations, tool calling support, flexible device remapping in containers, and a streamlined process for converting Hugging Face models.

This marks our third major SDK release for RNGD in just five months, reflecting our rapid innovation to deliver the most efficient LLM inference for data centers.

LLM Latency Optimization

Furiosa SDK 2025.1.0 delivers significant LLM latency optimizations, improving TTFT by up to 11.66% and TPOT by 11.45% for large inputs (30k tokens) and outputs (1k tokens).

This means faster response times and better efficiency for high-throughput AI workloads.

Get started by updating to the latest SDK and benchmarking supported models.

OpenAI API’s Tool Calling Support

Furiosa LLM now supports tool calling, enabling models to interact with external tools and functions.

This allows developers to seamlessly integrate AI-driven automation into their applications with minimal changes, which is critical for building Agentic AI applications.

Get started by using the `llama3_json` parser, with more options coming in future releases. For more information, refer to`tool calling` documentation.

‘furiosa-llm build’ for easily converting Hugging Face models

The new `furiosa-llm build` command simplifies converting Hugging Face models into optimized model artifacts for RNGD.

This streamlines deployment, reducing manual setup while ensuring peak performance.

Get started by referring to building a model artifact documentation.

Automatic optimization of Blocked KV cache allocation

Furiosa SDK 2025.1.0 automatically maximizes blocked KV cache allocation by reducing memory fragmentation, improving memory efficiency for LLM inference.

You don’t need to manually tune to get the best performance. With the latest release of Furiosa LLM engine, you automatically get maximum performance.

Get started with Furiosa LLM with this quick start documentation.

And there’s more coming! In the next few months, we’re rolling out enhanced tensor parallelism, speculating with a draft model, embeddings API support, torch.compile () backend, and more.

With RNGD now in key enterprise customers’ hands, we’re prioritizing rapid SDK updates, so join us on our journey.

🔗Sign up to be notified first about RNGD: https://furiosa.ai/signup.

Share this article

Furiosa releases Furiosa SDK 2025.1.0

LLM Latency Optimization

OpenAI API’s Tool Calling Support

‘furiosa-llm build’ for easily converting Hugging Face models

Automatic optimization of Blocked KV cache allocation

Other posts

LG AI Research taps FuriosaAI to achieve 2.25x better LLM inference performance vs. GPUs

Furiosa SDK 2025.2.0 is here: Hugging Face Hub integration, reasoning model support, enhanced APIs, and more

Furiosa to bring RNGD to Microsoft’s Azure Marketplace

Get the latest updates on FuriosaAI

Get the latest from Furiosa AI