Sign up to learn more about RNGD Contact Us

Algorithm - Research Engineer (LLM Serving and Acceleration)

Seoul, South Korea (On-site)

View other positions

About the job

FuriosaAI는 하드웨어부터 알고리즘까지 수직방향으로 통합된 AI 솔루션을 개발하고 있습니다.
FuriosaAI 알고리즘 팀은 Furiosa NPU(Neural Processing Unit)로 에너지 효율을 극대화하면서 Latency를 최적화한 LLM 서비스를 제공하기 위해 관련 연구를 진행합니다.
알고리즘팀에서는 10명이 넘는 팀원들로 구성되어있으며, 연구 결과물이 실제 제품으로 세상에 나올 수 있도록 SW-Platform팀과 긴밀한 협업을 진행하고 있습니다.

Responsibilities

LLM serving system에 대한 분석 및 연구를 하며, serving 전략 및 inference 가속 알고리즘을 NPU/GPU에 구현 및 평가
- 기존에 존재하는 LLM / Multi-Modal inference tool (vLLM, TensorRT-LLM, Deepspeed-MII 등)의 feature 및 코드를 분석
- Serving 전략/알고리즘 (Selective Batching, Sarathi-serve, Dynamic SplitFuse 등) 및 inference acceleration 방식 (Speculative Decoding, KV-cache pruning 등) 들에 대하여 연구
- NPU 를 이용한 Serving 전략/알고리즘 및 Inference acceleration 방식을 선행적으로 GPU 시스템에서 구현하여 실증 및 비교

Minimum Qualifications

Python, C++, CUDA programming 등을 활용하여 3년 이상 개발한 경험
PyTorch, Tensorflow 등 주요 DL framework 경험이 있으신 분
CS에 대한 풍부한 지식이 있으신 분 (특히, Network, Multiprocessing/Threading 등)
업무 요구 사항 및 issue들에 대한 원활한 의사소통 능력을 가지고 계신 분
대규모 Open Source 코드 개발 혹은 분석 경험이 있으신 분

Preferred Qualifications

vLLM, TensorRT-LLM, Deepspeed-MII 등의 LLM inference tool을 사용한 경험이 있으신 분
효율적인 LLM Inference 방식들에 관한 개발 및 연구 경험이 있으신 분
Transformer 모델 기반 inference에 대하여 깊게 이해하고 계신 분
다양한 딥러닝 알고리즘과 어플리케이션에 지적 호기심이 많으신 분

Contact

minsup.lee@furiosa.ai

View other positions

View other positions

Get the latest from Furiosa AI

Sign up for news, announcements, and more.

First name

Last name

Organization name

Job title

Email Address