编辑
2025-06-20
论文阅读
0

目录

Summary Overview
Main Content
Related Work
Experiments
Metrics
Models
Datasets
Baselines
Results
Case Study
🤖
Others
ResourceInfo
Paperhttp://arxiv.org/abs/2505.13417
Code & Datahttps://github.com/THU-KEG/AdaptThink
PublicarXiv
Date2025.06.20

Summary Overview

作者认为对于简单的问题,没有必要使用 thinking 模式,而是直接进行回答也能获得很高的准确率。从效率上来说,模型应该有选择的使用 thinking,而不是所有的都开。通过强化学习训练模型,使得模型能够在推理时,自动地选择是否使用 thinking。

AdaptThink, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty.

  1. a constrained optimization objective that encourages the model to choose NoThinking while maintaining the overall performance;
  2. an importance sampling strategy that balances Thinking samples during on-policy training, thereby enabling cold start and allowing the model to explore and exploit both thinking modes throughout the training process.

image.png

Main Content

Can the reasoning model learn to select Thinking or NoThinking mode adaptively based on the difficulty of the input problem, thereby achieving more efficient reasoning without sacrificing or even improving performance?

image.png

Constrained Optimization Objective

an ideal selection policy should prefer to choose NoThinking as long as the overall performance is not diminished. Let πθref\pi_{\theta_\text{ref}} denote the reference model, which is the initial $$

Experiments

Metrics

Models

Datasets

Baselines

Results

Case Study

🤖

Others

本文作者:Geaming

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!