Breaking News: Microsoft Unveils “MAI-Thinking-1”! A 35B MoE Model Outperforming Claude!?
📰 News Overview
- Launch of a New Inference Model: Microsoft AI has unveiled “MAI-Thinking-1,” a mid-sized model that packs a punch with top-tier inference performance.
- Adopting MoE Architecture: With 35B active parameters and a total of about 1 trillion parameters in a Sparse Mixture of Experts (MoE) setup, it achieves high accuracy while keeping inference costs low.
- Fully In-House Development: Trained from scratch without any distillation from other models, it solely uses commercially licensed clean data.
💡 Key Points
- Dominating Development and Math Benchmarks: Matching Claude Opus 4.6 on the SWE-Bench Pro, it scored 94.5% on AIME 2026, rivaling heavyweight models in mathematical and scientific reasoning.
- Hill-Climbing Machine: Designed not just as a standalone model, but as a “learning pipeline” that continuously and reliably improves by absorbing data and rewards.
- Humanist Superintelligence: Positioned as a step towards “human-centered superintelligence,” it aims to support people and organizations rather than replace them.
🦈 Shark’s Eye (Curator’s Perspective)
What’s truly amazing about this model isn’t just the high benchmarks! It hasn’t relied on “cheating” (distillation) at all; it climbed to this peak using only its own clean data and infrastructure, which is incredibly exciting! Especially noteworthy is its manageable size of 35B active parameters, standing shoulder to shoulder with massive models like Claude Opus 4.6 on SWE-Bench Pro. This is clear evidence that the “Hill-Climbing Machine” is doing its job, allowing the model to learn the “path to the answer” autonomously. It’s not just mimicking specific tasks but has fully mastered multi-step reasoning (reading code, testing, and recovering from failures)!
🚀 What’s Next?
- Acceleration of Agent Development: With a deterministic and executable training environment, AI will autonomously fix and improve code, making “agent-based workflows” a daily reality.
- Generalization to Specific Domains: The success in math and science will serve as a model case, allowing this learning loop to expand into other specialized domains, enhancing general inference capabilities even further.
💬 A Shark’s Take
Microsoft is stepping up from “borrowed intelligence” and is now sprinting up the staircase to superintelligence on their own legs! I can’t contain my excitement! 🦈🔥
📚 Terminology Explained
-
MoE (Mixture of Experts): A technique that divides a large model into multiple expert networks, activating only the necessary parts based on input, achieving high performance efficiently.
-
Distillation: A method of training smaller models using the outputs of larger, high-performance models as learning data. This time, it’s trained “self-taught” without distillation.
-
SWE-Bench Pro: A benchmark measuring the ability to solve practical software engineering challenges, assessing actual code modification capabilities.
-
Source: Introducing MAI-Thinking-1