Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

Jungtaek Kim, Thomas Zeng, Ziqian Lin, Minjae Lee, Chungpa Lee, Jy-yong Sohn, Hyung Il Koo, Kangwook Lee

transformer

Abstract

We investigate whether Transformer architectures can approximate search algorithms without external components. We introduce a framework called unknown tree search with bandit feedback, where tree extensions and feedback signals are externally provided for controlled evaluation. We show that Transformers possess sufficient expressiveness to implement distinct search strategies, can be trained to approximate those strategies from scratch, and demonstrate generalization to longer horizons and deeper trees. Additionally, fine-tuning pretrained language models on search trajectories unlocks enhanced capabilities.

Related Publications

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

TMLR

2026

transformer

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

EACL

2026

speculative-decoding

vision-lanuage

Draft-based Approximate Inference for LLMs

ICLR

2026

speculative-decoding

kv-cache