Resource Info Paper http://arxiv.org/abs/2505.13417 Code & Data https://github.com/THU-KEG/AdaptThink Public arXiv Date 2025.06.20
作者认为对于简单的问题,没有必要使用 thinking 模式,而是直接进行回答也能获得很高的准确率。从效率上来说,模型应该有选择的使用 thinking,而不是所有的都开。通过强化学习训练模型,使得模型能够在推理时,自动地选择是否使用 thinking。
AdaptThink, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty.
Can the reasoning model learn to select Thinking or NoThinking mode adaptively based on the difficulty of the input problem, thereby achieving more efficient reasoning without sacrificing or even improving performance?
Constrained Optimization Objective
an ideal selection policy should prefer to choose NoThinking as long as the overall performance is not diminished. Let denote the reference model, which is the initial $$
本文作者:Geaming
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!