ICLR
2026

Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling

Seunghyuk Oh, Minjae Lee, Kevin Galim, Minseo Kim, Hyung Il Koo, Wonjun Kang, Hanbaek Lyu, Kangwook Lee
diffusion-llm
discrete-diffusion
sft

Abstract

Diffusion large language models (dLLMs) are trained to denoise randomly masked sequences, yet in practice, they are commonly decoded by progressively unmasking tokens in order of model confidence. Consequently, the masking patterns used in supervised fine-tuning (SFT) often diverge from those encountered at inference-time, resulting in suboptimal training signals. We propose Group-based Trajectory Sampling, which constructs inference-aligned training trajectories directly from ground-truth targets. We use an initial model to iteratively categorize ground-truth tokens into ordered groups based on how much context the model needs to confidently predict each one. By training on trajectories sampled in this group order, the model learns masking patterns closer to what it would actually produce during inference. Across Sudoku, Countdown, and Trip Planning, our approach consistently outperforms standard SFT, yielding consistent accuracy gains across diverse settings. These findings demonstrate that aligning training trajectories with inference-time unmasking enables more reliable SFT of dLLMs.

Resources

Related Publications

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

TMLR
2026
search
transformer
View Job

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

EACL
2026
speculative-decoding
vision-lanuage
View Job

Draft-based Approximate Inference for LLMs

ICLR
2026
speculative-decoding
kv-cache
View Job