3 min read
[AI Minor News]

Dominator of the Gemini-3 Era! OSS Agent "Dirac" Takes the Benchmark Crown with Dramatic Cost Cuts!


  • The OSS coding agent "Dirac" scored 65.2% in Terminal-Bench-2, clinching the top spot in the Gemini-3-flash-preview category...
※この記事はアフィリエイト広告を含みます

Dominator of the Gemini-3 Era! OSS Agent “Dirac” Takes the Benchmark Crown with Dramatic Cost Cuts!

📰 News Overview

  • The OSS coding agent “Dirac” scored 65.2% in Terminal-Bench-2, securing the top spot in the Gemini-3-flash-preview category.
  • It outperformed Google’s official baseline (47.6%) and the top-tier closed-source agent “Junie CLI” (64.3%).
  • Through unique optimizations, Dirac has achieved an average reduction of API costs by 64.8% (around 2.8 times more efficient), all while generating code more quickly and accurately.

💡 Key Highlights

  • Selective Context: To prevent the model’s inference ability from diminishing with longer context lengths, information is tightly curated, achieving a balance between accuracy and cost.
  • Advanced Editing Techniques: Utilizing hash anchors for parallel editing and Abstract Syntax Tree (AST) operations, Dirac completes large-scale code modifications in a single task.
  • Minimal Prompts: Avoiding the Model Context Protocol (MCP), Dirac employs a design philosophy that maximizes results (bang-for-the-buck) with minimal instructions.

🦈 Shark’s Eye (Curator’s Perspective)

While existing AI agents tend to lean towards “just feed it long contexts,” Dirac’s approach of deliberately “narrowing down” information to unleash the true power of Gemini-3 is incredibly cool! Especially notable is how it builds on Cline while implementing hash anchor-based parallel editing, significantly enhancing the reliability of modifications in practical applications—this is a crucial approach for achieving “unbreakable AI fixes.” Rather than piling on unnecessary prompts, Dirac’s blend of traditional techniques like AST operations with cutting-edge LLMs represents a sharp refinement as a “tool,” producing outstanding cost performance!

🚀 What’s Next?

  • With dramatic reductions in API costs, large-scale automated refactoring projects that were previously deemed too costly are set to accelerate.
  • The focus will shift from “quantity of information” to “quality of information (curation)” in agent development.

💬 A Word from Haru-Same

Fast, cheap, and accurate—just like the ocean’s swiftest hunter, the shark itself! This is definitely the trusty sidekick every engineer needs!

📚 Term Explanations

  • Terminal-Bench-2: One of the highest-difficulty benchmarks measuring an AI agent’s ability to perform terminal operations and modify actual GitHub repositories.

  • AST Operations: A technique that treats code as a tree structure (abstract syntax tree) that is easier for computers to understand, allowing for accurate structural modifications while avoiding syntax errors.

  • Hash Anchors: A method for marking specific locations in code with hash values to ensure that modifications remain consistent even during parallel work.

  • Source: Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

🦈 はるサメ厳選!イチオシAI関連
【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈