TMLR
2026

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

Jungtaek Kim, Thomas Zeng, Ziqian Lin, Minjae Lee, Chungpa Lee, Jy-yong Sohn, Hyung Il Koo, Kangwook Lee
search
transformer

Abstract

We investigate whether Transformer architectures can approximate search algorithms without external components. We introduce a framework called unknown tree search with bandit feedback, where tree extensions and feedback signals are externally provided for controlled evaluation. We show that Transformers possess sufficient expressiveness to implement distinct search strategies, can be trained to approximate those strategies from scratch, and demonstrate generalization to longer horizons and deeper trees. Additionally, fine-tuning pretrained language models on search trajectories unlocks enhanced capabilities.

Resources

Read paper

Related Publications

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

TMLR
2026
search
transformer
View Job

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

EACL
2026
speculative-decoding
vision-lanuage
View Job

Draft-based Approximate Inference for LLMs

ICLR
2026
speculative-decoding
kv-cache
View Job