Dominator of the Gemini-3 Era! OSS Agent “Dirac” Takes the Benchmark Crown with Dramatic Cost Cuts!
📰 News Overview
- The OSS coding agent “Dirac” scored 65.2% in Terminal-Bench-2, securing the top spot in the Gemini-3-flash-preview category.
- It outperformed Google’s official baseline (47.6%) and the top-tier closed-source agent “Junie CLI” (64.3%).
- Through unique optimizations, Dirac has achieved an average reduction of API costs by 64.8% (around 2.8 times more efficient), all while generating code more quickly and accurately.
💡 Key Highlights
- Selective Context: To prevent the model’s inference ability from diminishing with longer context lengths, information is tightly curated, achieving a balance between accuracy and cost.
- Advanced Editing Techniques: Utilizing hash anchors for parallel editing and Abstract Syntax Tree (AST) operations, Dirac completes large-scale code modifications in a single task.
- Minimal Prompts: Avoiding the Model Context Protocol (MCP), Dirac employs a design philosophy that maximizes results (bang-for-the-buck) with minimal instructions.
🦈 Shark’s Eye (Curator’s Perspective)
While existing AI agents tend to lean towards “just feed it long contexts,” Dirac’s approach of deliberately “narrowing down” information to unleash the true power of Gemini-3 is incredibly cool! Especially notable is how it builds on Cline while implementing hash anchor-based parallel editing, significantly enhancing the reliability of modifications in practical applications—this is a crucial approach for achieving “unbreakable AI fixes.” Rather than piling on unnecessary prompts, Dirac’s blend of traditional techniques like AST operations with cutting-edge LLMs represents a sharp refinement as a “tool,” producing outstanding cost performance!
🚀 What’s Next?
- With dramatic reductions in API costs, large-scale automated refactoring projects that were previously deemed too costly are set to accelerate.
- The focus will shift from “quantity of information” to “quality of information (curation)” in agent development.
💬 A Word from Haru-Same
Fast, cheap, and accurate—just like the ocean’s swiftest hunter, the shark itself! This is definitely the trusty sidekick every engineer needs!
📚 Term Explanations
-
Terminal-Bench-2: One of the highest-difficulty benchmarks measuring an AI agent’s ability to perform terminal operations and modify actual GitHub repositories.
-
AST Operations: A technique that treats code as a tree structure (abstract syntax tree) that is easier for computers to understand, allowing for accurate structural modifications while avoiding syntax errors.
-
Hash Anchors: A method for marking specific locations in code with hash values to ensure that modifications remain consistent even during parallel work.
-
Source: Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview