abstract: We are pleased to introduce Grok 3, our most advanced model yet: blending strong reasoning with extensive pretraining knowledge. Trained on our Colossus supercluster with 10x the compute of previous state-of-the-art models, Grok 3 displays significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks. Grok 3’s reasoning capabilities, refined through large scale reinforcement learning, allow it to think for seconds to minutes, correcting errors, exploring alternatives, and delivering accurate answers. Grok 3 has leading performance across both academic benchmarks and real-world user preferences, achieving an Elo score of 1402 in the Chatbot Arena. Alongside it, we’re unveiling Grok 3 mini, which represents a new frontier in cost-efficient reasoning. Both models are still in training and will evolve rapidly with your feedback. We are rolling out Grok 3 to users in the coming days, along with an early preview of its reasoning capabilities.
Massive Compute and Training Advances
Grok 3 was trained on Colossus supercluster, using 10x the compute of previous state-of-the-art models. Its reinforcement learning (RL) techniques enable long-form reasoning, error correction, and multiple solution explorations.
Performance Across Benchmarks – Grok 3 (Think) leads across major academic and real-world benchmarks:
AIME 2025 Math Competition: 93.3% (highest among competitors)
GPQA (graduate-level reasoning): 84.6%
LiveCodeBench (code generation): 79.4%
MMLU (VLM understanding): 78%
Variable
Grok 3 Mini – A cost-efficient reasoning model optimized for STEM, achieving 95.8% on AIME 2024 and 80.4% on LiveCodeBench.
Grok 3’s “Think” Mode – Users can activate “Think” mode to inspect the model’s reasoning process, improving transparency and trust.
DeepSearch AI Agent – A new “truth-seeking” AI that synthesizes real-time web knowledge, providing concise and fact-verified insights beyond standard browser searches.
API & Enterprise Rollout – Grok 3 and DeepSearch API will soon be available and offer enhanced tool use and code execution capabilities.