×
00:00:00
Share Your Feedback 🏝️
Suggest a New Post
Suggestions for Improvement
Report an Issue
Share Your Feedback
Send
Cancel
🏄♂ Click to Search
MinWoo Park
一切唯心造 不狂不及
POST | Except Notifier
POST | Tokenizer
POST | LLM Training
POST | Estimation FLOPs of LLaMA-2
Model | BERT
Context Length
Attn | Multi-matrix Factorization Attention
Attn | Multi-matrix Factorization Attention
Nvidia | Cosmos
RAG | Cache-Augmented Generation
LFM | NVIDIA Cosmos
Search-o1
Visual Tokenizer | Scaling Visual Tokenizers for Reconstruction and Generation
Transformer2 | Self-adaptive LLMs
RAG | VideoRAG
Optimizing Pretraining Data Mixtures
RL Reasoning | Advancing Language Model Reasoning
Tech Report | Qwen2.5-1M Technical Report
Tech Report | DeepSeek-V3 Technical Report
Post | DeepSeek
Satori
Diffusion | Image to Image Diffusion
Diffusion | Kolors
Test-time Scaling
Dream Booth
Goedel-Prover
MLLM | Image to Video
VLM | LLMs Can Easily Learn from Structure, not content
VLM | Scaling VLM
Attn | Prune Sub-quadratic Attention
EQ-VAE
Score of Mixture
Post | RAG
S* Test Time Scaling for Code Generation
Google | AI CoScientist
X | Grok 3
Perplexity | R1 1776
Qwen2.5-VL Technical Report
Inner Thinknig Transformers
Query Expansion
Survey | Scaling Laws
Comet
Anthropic | Claude 3.7
Weekly 25/1W
OrderSum
Kafka KRaft
LLM FineTune | Training with MXFP4
OpenAI | GPT-4.5 System Card
LLM FineTune | Training with MXFP4
Predictable Scale
RAG | Agentic Deep Graph Reasoning
Every FLOP Counts
R1-Zeros Aha Visual Reasoning on a 2B
DeepSeekMath
L1
Gemma 3
Communication-Efficient LM
An Expanded Massive Multilingual Dataset
OpenAI | Monitoring Reasoning Models
Transformers without Normalization
Weekly | March Week 3
Block Diffusion
Flow to the Mode
Gemini Embedding
Official Agent Project
AutoAgent
Sample, Scrutinize and Scale
Transformer | nGPT
Model | Qwen2-VL Video
Stop Over Thiking
TinyR1-32B-Preview
Weekly | March Week 4
DAPO
Model | Mistral Small 3.1
I Have Covered All the Bases
Every Sample Matters
Understanding R1-Zero-Like Training
Qwen2.5-Omni Technical Report
Exploring Data Scaling Trends
Rediscovers a Semantic Variant of BM25
Command A
Proof or Bluff
Scaling Language-Free Visual Representation Learning
MegaMath
GitHub MCP Server
Meta | LLaMa 4
Paper Bench
InfiniteICL
DION
MUON
Reward Models Know
Right Question
GigaTok
Swan-GPT
Scaling Laws for Multimodal
Seed Thiking
ReTool
Dynamic-Length Float
d1
Perception LM
Enhancing Non-Reasoning Models with Reasoning Models
LMs are Implicitly Continous
Reasoning Models Without Thinking
Socio Verse
TTRL
Pangu Ultra MoE
On Path to Multimodal Generalist
ZeroSearch
Overcoming Vocabulary Mismatch
AlphaEvolve
Claude 4
Illusion of Thikning
Illusion of Illusion of Thiking
ERNIE 4.5 Technical Report
Self-Guided Process Reward Optimization with Masked Step Advantage for Process Reinforcement Learning
Skywork-Reward-V2
Deep Research Agents
Survey on Evaluation of LLM-based Agents
How Well Does GPT-4o Understand Vision
Why is Your Language Model a Poor Implicit Reward Model?
SAS
Synergy Dilemma
Grok 4
Agents
Efficient Reasoning Models | A Survey
Towards Large Reasoning Models
A Survey | Context Engineering for LLMs
Apple LFM | Tech Report
Seed-X
Thinking Beyond Tokens
Qwen-3
Deep Researcher with Test-Time Diffusion
The Big LLM Architecture Comparison
RF-DETR vs. YOLOv12
Jet-Nemotron
Casual Attn
Mamba 2
MinWoo Park
MinWoo Park
post contain "
"
No matching posts found containing "
"