Dominator of the Gemini-3 Era! OSS Agent "Dirac" Takes the Benchmark Crown with Dramatic Cost Cuts!

#Dirac #Gemini-3 #Open Source

※この記事はアフィリエイト広告を含みます

Dominator of the Gemini-3 Era! OSS Agent “Dirac” Takes the Benchmark Crown with Dramatic Cost Cuts!

📰 News Overview

The OSS coding agent “Dirac” scored 65.2% in Terminal-Bench-2, securing the top spot in the Gemini-3-flash-preview category.
It outperformed Google’s official baseline (47.6%) and the top-tier closed-source agent “Junie CLI” (64.3%).
Through unique optimizations, Dirac has achieved an average reduction of API costs by 64.8% (around 2.8 times more efficient), all while generating code more quickly and accurately.

💡 Key Highlights

Selective Context: To prevent the model’s inference ability from diminishing with longer context lengths, information is tightly curated, achieving a balance between accuracy and cost.
Advanced Editing Techniques: Utilizing hash anchors for parallel editing and Abstract Syntax Tree (AST) operations, Dirac completes large-scale code modifications in a single task.
Minimal Prompts: Avoiding the Model Context Protocol (MCP), Dirac employs a design philosophy that maximizes results (bang-for-the-buck) with minimal instructions.

🦈 Shark’s Eye (Curator’s Perspective)

While existing AI agents tend to lean towards “just feed it long contexts,” Dirac’s approach of deliberately “narrowing down” information to unleash the true power of Gemini-3 is incredibly cool! Especially notable is how it builds on Cline while implementing hash anchor-based parallel editing, significantly enhancing the reliability of modifications in practical applications—this is a crucial approach for achieving “unbreakable AI fixes.” Rather than piling on unnecessary prompts, Dirac’s blend of traditional techniques like AST operations with cutting-edge LLMs represents a sharp refinement as a “tool,” producing outstanding cost performance!

🚀 What’s Next?

With dramatic reductions in API costs, large-scale automated refactoring projects that were previously deemed too costly are set to accelerate.
The focus will shift from “quantity of information” to “quality of information (curation)” in agent development.

💬 A Word from Haru-Same

Fast, cheap, and accurate—just like the ocean’s swiftest hunter, the shark itself! This is definitely the trusty sidekick every engineer needs!

📚 Term Explanations

Terminal-Bench-2: One of the highest-difficulty benchmarks measuring an AI agent’s ability to perform terminal operations and modify actual GitHub repositories.
AST Operations: A technique that treats code as a tree structure (abstract syntax tree) that is easier for computers to understand, allowing for accurate structural modifications while avoiding syntax errors.
Hash Anchors: A method for marking specific locations in code with hash values to ensure that modifications remain consistent even during parallel work.
Source: Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview