LG AI Research taps FuriosaAI to achieve 2.25x better LLM inference performance vs. GPUs

News July 22, 2025

Summary

After months of rigorous performance, energy efficiency, and software stack evaluations, LG AI Research has adopted FuriosaAI’s RNGD AI accelerator for inference computing with its EXAONE models.
RNGD achieves 2.25x better LLM inference performance per watt vs. GPUs, while also meeting LG AI Research’s demanding latency and throughput requirements.
This achievement validates RNGD’s enterprise readiness and represents one of the first enterprise adoptions of an on-premise AI solution as a superior alternative to GPUs for inference with LLMs.
Furiosa and LG AI Research will partner to supply RNGD servers to enterprises using EXAONE across key sectors, like electronics, finance, telecommunications, and biotechnology.
Because it can be deployed in a wide range of settings, RNGD addresses the growing need for sovereign AI and enables enterprises like LG AI Research to own and control their AI stack and deploy advanced LLMs.

Share this article

The biggest barrier to scaling AI is the unsustainable power consumption and costs associated with traditional GPU hardware. Today, we're pleased to announce a major step toward solving this challenge.

FuriosaAI’s RNGD (pronounced “Renegade”) accelerator has successfully passed LG AI Research's rigorous performance tests with its EXAONE models.

Following this successful evaluation, Furiosa and LG AI Research will offer RNGD Server to enterprise customers deploying LLMs. These include the diverse spectrum of LG businesses across electronics, finance, telecommunications, and biotechnology.

LG AI Research concluded RNGD delivers high performance, meets low-latency service requirements, and achieves significant improvements in energy efficiency compared to previous GPU solutions.

"After extensively testing a wide range of options, we found RNGD to be a highly effective solution for deploying EXAONE models. RNGD provides a compelling combination of benefits: excellent real-world performance, a dramatic reduction in our total cost of ownership, and a surprisingly straightforward integration," said Kijeong Jeon, Lead, Product Unit, LG AI Research. "For a project of this scale and ambition, the entire process was quite impressive."

This milestone demonstrates RNGD satisfies enterprises’ real-world needs for inference with advanced LLMs, while also delivering significantly improved energy efficiency.

RNGD Server comprises eight RNGD accelerators in a single, air-cooled 4U chassis.

Testing power consumption as well as performance

LG AI Research first announced plans two years ago to evaluate RNGD and assess the accelerator’s efficiency and, if successful, integrate RNGD into various EXAONE-based services across LG. We unveiled RNGD, which leverages our unique Tensor Contraction Processor (TCP) chip architecture to deliver up to 512 TFLOPS of FP8 performance with a Thermal Design Power (TDP) of just 180W, last summer at Hot Chips and began sampling with customers last fall.

RNGD Server aggregates the power of eight RNGD accelerators into a single, air-cooled 4U chassis, enabling high compute density. Up to five RNGD Server Systems can be deployed within a single, standard 15kW air-cooled rack.

LG AI Research has adopted RNGD to ensure power efficiency, cost-effectiveness, and scalability when delivering its LLM services. The company evaluated RNGD for its ability to meet demanding, real-world benchmarks using 7.8-billion-parameter and 32-billion-parameter versions of EXAONE 3.5, both available with 4K and 32K context windows.

Performance and Efficiency Results

LG AI Research’s direct, real-world comparison demonstrates a fundamental leap in the economics of high-performance AI inference.

While meeting the rigorous performance requirements of LG AI Research, RNGD achieved 2.25x better performance per watt for LLMs compared to a GPU-based solution.
Thanks to its greater compute density, a RNGD-powered rack can generate 3.75x more tokens for EXAONE models compared to a GPU rack operating within the same power constraints.
Using a single server with four RNGD cards and a batch size of one, LG AI Research ran the EXAONE 3.5 32B model and achieved 60 tokens/second with a 4K context window and 50 tokens/second with a 32K context window.

Deployment and integration

After installing RNGD hardware at its Koreit Tower data center, LG AI Research collaborated with our team to launch an enterprise-ready solution. We successfully optimized and scaled EXAONE 3.0, 3.5, and 4.0 models, progressing from a single card to two-card, four-card, and then eight-card server configurations. To achieve this, we applied tensor parallelism not only across multiple processing elements but also across multiple RNGD cards.

We leveraged the unique strengths of our innovative TCP chip architecture and its globally-optimizing compiler for maximal data reuse. This enables RNGD to maximize SRAM reuse between transformer blocks. To maximize the performance of tensor parallelism, we optimized PCIe paths for peer-to-peer (P2P) communication, communication scheduling, and compiler tactics to overlap inter-chip DMA operations with computation. We used the global optimization capabilities of Furiosa’s compiler to maximize SRAM reuse between transformer blocks.

This successful integration highlights the maturity and ease-of-use of our software stack, including the vLLM-compatible Furiosa-LLM serving framework. The migration demonstrates the platform's programmability and simplified optimization process. It also showcases key advantages required in real-world service environments, such as support for an OpenAI-compatible API server, monitoring with Prometheus metrics, Kubernetes integration for large-scale deployment in cloud-native environments, and easy deployment through a publicly available SDK.

RNGD achieved 2.25x better performance per watt for LLMs compared to LG AI Research's GPU-based solution.

Next Steps for Furiosa and LG AI Research

Furiosa and LG AI Research are committed to enabling businesses to deploy advanced models and agentic AI sustainably, scalably, and economically. After adding support for EXAONE 4.0, we are now working with LG AI Research to develop new software features, expand to additional customers and markets, and provide a powerful and sustainable AI infrastructure stack for advanced AI applications.

Moving forward, LG AI Research plans to expand ChatEXAONE's availability to external clients, utilizing RNGD to facilitate this expansion. ChatEXAONE is LG AI Research's EXAONE-powered enterprise AI agent that provides robust capabilities such as document analysis, deep research, data analysis, and Retrieval-Augmented Generation (RAG).

We will work closely with LG AI Research to continue optimizing RNGD software and hardware for EXAONE models around specific business use cases.

To learn more and to test RNGD Server with your use case, reach out to us here.

Eight RNGDs in a server right before their delivery to LG.

Share this article

LG AI Research taps FuriosaAI to achieve 2.25x better LLM inference performance vs. GPUs

Testing power consumption as well as performance

Performance and Efficiency Results

Deployment and integration

Next Steps for Furiosa and LG AI Research

Other posts

Introducing Furiosa NXT RNGD Server: Efficient AI inference at data center scale

Furiosa SDK 2025.3 boosts RNGD performance with multichip scaling and more

FuriosaAI and OpenAI showcase the future of sustainable enterprise AI

Get the latest updates on FuriosaAI

Get the latest from Furiosa AI