X | Grok 3 · MinWoo Park

Created: 2025-02-22 08:52:44 +0000

Last modified: 2025-02-22 20:56:50 +0900

Grok 3 Beta — The Age of Reasoning Agents

url: https://x.ai/blog/grok-3?utm_source=substack&utm_medium=email

abstract: We are pleased to introduce Grok 3, our most advanced model yet: blending strong reasoning with extensive pretraining knowledge. Trained on our Colossus supercluster with 10x the compute of previous state-of-the-art models, Grok 3 displays significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks. Grok 3’s reasoning capabilities, refined through large scale reinforcement learning, allow it to think for seconds to minutes, correcting errors, exploring alternatives, and delivering accurate answers. Grok 3 has leading performance across both academic benchmarks and real-world user preferences, achieving an Elo score of 1402 in the Chatbot Arena. Alongside it, we’re unveiling Grok 3 mini, which represents a new frontier in cost-efficient reasoning. Both models are still in training and will evolve rapidly with your feedback. We are rolling out Grok 3 to users in the coming days, along with an early preview of its reasoning capabilities.

Massive Compute and Training Advances

Grok 3 was trained on Colossus supercluster, using 10x the compute of previous state-of-the-art models. Its reinforcement learning (RL) techniques enable long-form reasoning, error correction, and multiple solution explorations.

Performance Across Benchmarks – Grok 3 (Think) leads across major academic and real-world benchmarks:

AIME 2025 Math Competition: 93.3% (highest among competitors)
GPQA (graduate-level reasoning): 84.6%
LiveCodeBench (code generation): 79.4%
MMLU (VLM understanding): 78%

Variable

Grok 3 Mini – A cost-efficient reasoning model optimized for STEM, achieving 95.8% on AIME 2024 and 80.4% on LiveCodeBench. Grok 3’s “Think” Mode – Users can activate “Think” mode to inspect the model’s reasoning process, improving transparency and trust.
DeepSearch AI Agent – A new “truth-seeking” AI that synthesizes real-time web knowledge, providing concise and fact-verified insights beyond standard browser searches.
API & Enterprise Rollout – Grok 3 and DeepSearch API will soon be available and offer enhanced tool use and code execution capabilities.

X | Grok 3

X | Grok 3

X | Grok 3

Grok 3 Beta — The Age of Reasoning Agents

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

X | Grok 3

X | Grok 3

Grok 3 Beta — The Age of Reasoning Agents

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views