Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling

Seunghyuk Oh, Minjae Lee, Kevin Galim, Minseo Kim, Hyung Il Koo, Wonjun Kang, Hanbaek Lyu, Kangwook Lee

diffusion-llm

discrete-diffusion

sft

Abstract

Diffusion large language models (dLLMs) are trained to denoise randomly masked sequences, yet in practice, they are commonly decoded by progressively unmasking tokens in order of model confidence. Consequently, the masking patterns used in supervised fine-tuning (SFT) often diverge from those encountered at inference-time, resulting in suboptimal training signals. We propose Group-based Trajectory Sampling, which constructs inference-aligned training trajectories directly from ground-truth targets. We use an initial model to iteratively categorize ground-truth tokens into ordered groups based on how much context the model needs to confidently predict each one. By training on trajectories sampled in this group order, the model learns masking patterns closer to what it would actually produce during inference. Across Sudoku, Countdown, and Trip Planning, our approach consistently outperforms standard SFT, yielding consistent accuracy gains across diverse settings. These findings demonstrate that aligning training trajectories with inference-time unmasking enables more reliable SFT of dLLMs.

Related Publications

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

TMLR

2026

transformer

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

EACL

2026

speculative-decoding

vision-lanuage

Draft-based Approximate Inference for LLMs

ICLR

2026

speculative-decoding

kv-cache