FuriosaAI

We’re heading to SuperAI 2025! Visit us at Booth MB2

Software - AI Software Engineer (Generative AI)

Seoul, South Korea (On-site)

View other positions

About the job

  • FuriosaAI is seeking a Software Engineer to join our Platform Software Team.

  • This team is dedicated to conducting focused research and engineering efforts to develop a cutting-edge, end-to-end LLM serving solution and deliver a streamlined software development kit (SDK) for the FuriosaAI Tensor Contraction Processor (TCP) architecture.

  • We are looking for an AI Software Engineer who will contribute to a full-stack real-world AI product development, from analyzing, researching, implementing inference/serving methods for Generative AI models.

Responsibilities

1. Design & Optimization of Generative AI Model Inference

  • Parallelism strategies: data/pipeline/tensor/sequence/context/expert parallelism, and new parallelism methods

  • Serving strategies: Selective Batching, Sarathi-serve, DynamicSplitFuse, Dynamic MoE exper load, etc.

  • Inference acceleration techniques: Speculative Decoding, KV-cache dropping, Sparse Attention, Hybrid Linear Attention (e.g., Minimax-01), etc.

  • LLM Reasoning Inference techniques : Search-Based Methods (MCTS, MCTSr, and Variants, Best-of-N, etc.) in Combination with Chain-of-Thought, Tree-of-Thought, and Forest-of-Thought.

  • Research of Generative AI models beyond LLMs (e.g., Diffusion Models).

2. Generative AI Model & System Co-Design

  • Co-design Generative AI models and systems while considering Furiosa's Tensor Contraction Processor (TCP) architecture and software stack (Compiler, Runtime, Serving Stack).

  • Conduct performance modeling of various Generative AI models and systems on GPU/NPU to optimize inference techniques tailored for RNGD.

  • Implement optimized Generative AI model inference methods in the FuriosaAI SDK.

3. Analysis & Research of Existing Inference Frameworks

  • Analyze the features and source code of existing Generative AI model inference frameworks such as vLLM, TensorRT-LLM, and DeepSpeed-MII.

  • Research and analyze the state-of-the-art Generative AI model inference & system architectures, focusing on optimizing them for Furiosa's TCP architecture

  • Use profiling tools like Nsight to analyze GPU execution and study CUDA/Triton kernel performance.

Minimum Qualifications

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent industry experience.

  • Programming languages: Python, C++, Rust, or CUDA programming.

  • Hands-on experience with deep learning frameworks such as PyTorch or TensorFlow.

  • Strong understanding of Computer Science concepts, particularly Networking, Multi-Processing, Multi-Threading system and/or distributed systems.

  • Effective communication skills to discuss project requirements and technical issues

Preferred Qualifications

  • Experience using LLM inference frameworks: vLLM, TensorRT-LLM, and DeepSpeed-MII.

  • Experience in developing or analyzing large-scale open-source models and projects.

  • Hands-on experience in developing and researching efficient LLM inference methods.

  • Deep understanding of Transformer-based model inference.

  • Strong intellectual curiosity about various deep learning algorithms and applications.

  • Strong proficiency in Rust programming language.

Contact