00:00:00

Share Your Feedback 🏝️

DeepSeekMath

DeepSeekMath

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: R1-Zeros Aha Visual Reasoning on a 2B Next: L1

DeepSeekMath

  • Related Project: Private
  • Category: Paper Review
  • Date: 2025-03-11

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

  • url: https://arxiv.org/abs/2402.03300
  • pdf: https://arxiv.org/pdf/2402.03300
  • html: https://arxiv.org/html/2402.03300v1
  • abstract: Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Previous: R1-Zeros Aha Visual Reasoning on a 2B Next: L1

post contain ""

    No matching posts found containing ""