Enhancing Non-Reasoning Models with Reasoning Models

MinWoo(Daniel) Park | Tech Blog

Created: 2025-04-23 11:45:54 +0000

Last modified: 2025-04-23 20:56:50 +0900

url: https://arxiv.org/abs/2504.09639
pdf: https://arxiv.org/pdf/2504.09639
html: https://arxiv.org/html/2504.09639v1
abstract: Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate “thinking” steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.

post contain ""