Shocking LLMs Fall Short Against Classical Methods?! The Birth of the Ultimate Auto-Optimization Technique “Centaur”!
📰 News Overview
- Even the latest frontier models like Claude Opus 4.6 and Gemini 3.1 Pro have been found unable to beat classical hyperparameter optimization (HPO) algorithms (CMA-ES and TPE) within certain computational budgets.
- LLMs struggle with “tracking optimization states” during trials and tend to falter more on avoiding out-of-memory (OOM) issues than exploring diverse options.
- A hybrid method called “Centaur” has been developed that shares the “interpretable internal states” of classical techniques with LLMs, outperforming all classical methods and pure LLM techniques with a mere 0.8B model.
💡 Key Points
- Exposing LLM Weaknesses: While LLMs excel in domain knowledge, they fall short compared to classical algorithms in managing optimization histories involving numerical data.
- Structure of Centaur: By directly providing LLMs with information such as the mean vector, step sizes, and covariance matrices from CMA-ES, the optimization capabilities of LLMs have been successfully integrated into the process.
- Rise of Small Models: This demonstrates that with clever methodologies, a model in the 0.8B class can achieve top performance in optimization tasks without relying on massive frontier models.
🦈 Shark’s Eye (Curator’s Perspective)
We’re witnessing the end of the era where we could just rely on LLMs for everything! The crucial takeaway here is that even when granted the freedom to “directly edit source code,” LLMs still couldn’t match classical algorithms within fixed exploration spaces. Sure, LLMs can generate “plausible” suggestions, but they still struggle to maintain the “state” of precise mathematical optimization.
Enter “Centaur,” which is brilliant! By showing LLMs the “inner workings” of the well-established CMA-ES, we’ve effectively combined LLMs’ domain knowledge with the solid exploratory capabilities of classical methods. The efficiency of this implementation, achieving state-of-the-art (SOTA) results with an ultra-lightweight 0.8B model, could set the standard for future AI development!
🚀 What’s Next?
Moving forward, we’ll see a shift from “going it alone with LLMs” to leveraging classical algorithms as “external tools” or “providers of internal states” in specific mathematical tasks. Especially in resource-constrained environments for model training, techniques like Centaur will become essential!
💬 A Word from Haru Shark
The interesting part of the AI world is that the latest massive models aren’t always the best! Smart sharks know how to pick and choose their tools! 🦈🔥
📚 Terminology Explained
-
HPO (Hyperparameter Optimization): A technique for automatically optimizing the “settings (hyperparameters)” that influence the efficiency and performance of machine learning models.
-
CMA-ES: Covariance Matrix Adaptation Evolution Strategy. A powerful classical algorithm for efficiently finding the minimum or maximum of functions.
-
0.8B Model: A relatively small language model with 800 million parameters. By 2026 standards, it runs smoothly even on smartphones and edge devices.
-
Source: Can LLMs Beat Classical Hyperparameter Optimization Algorithms?