Why should you care about RNGD?

Some new tech products create a big splash because they have an amazing marketing deck. Or gorgeous renders of products that don't exist yet. Or billions in VC funding from the most media savvy, unicorn-collecting Sand Hill Road visionaries.

That isn’t FuriosaAI.

The company name is a little weird, even by AI tech startup standards. Our new flagship product’s name (RNGD) seems hard to pronounce and sounds a little bit like a Scandinavian metal band. Our headquarters is 5,649 miles from Silicon Valley.

Even so, you should still care about RNGD. Why?

Because it solves real problems for large-scale AI inference in data centers. Problems that are becoming major headaches for CTOs and ML engineers around the world.
It’s here now: an actual physical product running models like Llama 3.1 70B.

The problem with AI inference today

Running large language models (LLMs) and multimodal models in production today is difficult, expensive, or both. Even setting aside the substantial upfront costs of obtaining a fleet of high-performance GPUs, businesses must contend with eye-popping electricity bills, complex and costly liquid cooling systems, and server room infrastructure that was never designed to accommodate cards that consume 1,000 watts or more each.

Enter RNGD: The chip you actually want in your data center

RNGD is our answer to these problems. (It’s pronounced “Renegade,” by the way. Apologies if you were thinking “RUHN-ga-dah” in your head.)

Here's what RNGD delivers:

Efficiency: 180W TDP. The GPU in your gaming PC probably uses more power.
Performance: It runs Llama 3.1 without breaking a sweat.
Programmability: Our compiler treats entire models as single fused operations, because no one is eager to do more manual kernel optimizations.

Solving three challenges with one chip

Creating a chip that's efficient, powerful, *and* programmable is hard. That's why most chips excel in one or two areas but fall flat in the third.

We built RNGD from the ground up to nail this balance, starting with a new chip architecture called the Tensor Contraction Processor (TCP). (Spoiler: It turns out that matrix multiplication isn’t all you need.) We also incorporated cutting-edge technology like HBM3 and a 5nm node.

Real-world benefits (the stuff that actually matters)

Lower Total Cost of Ownership: Spend less on energy bills, server infrastructure, and hardware. Your customers, investors, and CFO will thank you.
Simplified Deployments: Deploy on premises or in the cloud, as easily as you would with a run-of-the-mill CPU server.
Faster Innovation: Engineers spend less time optimizing and more time building products and services.
Sustainability: It’s possible to do more with AI without supersizing your carbon footprint.

What's Next?

If you’ve read this far, we know you’re likely ready to see some technical details. Visit our RNGD product page and sign up here for benchmark performance stats and other updates. We will have much more to share soon.

And if you’re attending Hot Chips in person this week, please come by our booth and say hi. We’ll be the ones in T-shirts that look like they could be inspired by a Scandinavian metal band.

Sign up here to be notified first about RNGD availability and product updates.

Why should you care about RNGD?

The problem with AI inference today

Enter RNGD: The chip you actually want in your data center

Solving three challenges with one chip

Real-world benefits (the stuff that actually matters)

What's Next?

Other posts

Serving gpt-oss-120b at 5.8 ms TPOT with two RNGD cards: compiler optimizations in practice

Introducing Furiosa NXT RNGD Server: Efficient AI inference at data center scale

Furiosa SDK 2025.3 boosts RNGD performance with multichip scaling and more

Get the latest updates on FuriosaAI

Get the latest from Furiosa AI

Why should you care about RNGD?

The problem with AI inference today

Enter RNGD: The chip you actually want in your data center

Solving three challenges with one chip

Real-world benefits (the stuff that actually matters)

What's Next?

Other posts

Serving gpt-oss-120b at 5.8 ms TPOT with two RNGD cards: compiler optimizations in practice

Introducing Furiosa NXT RNGD Server: Efficient AI inference at data center scale

Furiosa SDK 2025.3 boosts RNGD performance with multichip scaling and more

Get the latest updates on FuriosaAI

Get the latest from Furiosa AI

Serving gpt-oss-120b at 5.8 ms TPOT with two RNGD cards: compiler optimizations in practice