3 min read
[AI Minor News]

The 1.58-bit Revolution! Introducing "Ternary Bonsai" – 8B Model Runs at Just 1.75GB


"- Introducing a new model with 1.58-bit (ternary) representation: PrismML has released the 'Ternary Bonsai' family (8B, 4B, 1.7B). By constraining weights to {-1, 0, +1}, it achieves an astonishing memory reduction of nearly 9 times compared to standard 16-bit models..."

※この記事はアフィリエイト広告を含みます

The 1.58-bit Revolution! Introducing “Ternary Bonsai” – 8B Model Runs at Just 1.75GB

📰 News Summary

  • Introducing a new model with 1.58-bit (ternary) representation: PrismML has released the ‘Ternary Bonsai’ family (8B, 4B, 1.7B). By constraining weights to {-1, 0, +1}, it achieves an astonishing memory reduction of nearly 9 times compared to standard 16-bit models.
  • Extreme Compression and Precision: The average benchmark score has improved by 5 points compared to the previous 1-bit model. The 8B model (1.75GB) hits an average score of 75.5, rivaling the performance of the Qwen3 8B, which is over 10 times its size.
  • Lightning-Fast Native Performance on Apple Devices: Achieving 82 toks/sec on the M4 Pro chip and 27 toks/sec on the iPhone 17 Pro Max, this model boasts energy efficiency improvements of 3 to 4 times over previous designs.

💡 Key Highlights

  • No Escape Route: Complete Quantization: The entire network consistently uses the 1.58-bit representation, from embedding and attention to MLP and LM head, with no compromises on precision.
  • Group-Level Quantization Scheme: Sharing FP16 scale factors across 128 weights while encoding each weight in 1.58 bits, maintaining high intelligence density.
  • Released Under Apache 2.0 License: Model weights are open-source and immediately available on Mac, iPhone, and iPad through MLX.

🦈 Shark’s Perspective (Curator’s Insights)

The “Ternary Bonsai” is truly a game changer, redefining the physical limits of local AI! What’s fascinating is the ingenious use of “1.58 bits,” a seemingly odd number that captures nuances of information that a mere 1 bit (binary) couldn’t. That extra 0.58 bits of cost is a genius move! Plus, by not leaving any high-precision escape routes, the implementation sticks to low bits across all layers, showcasing PrismML’s dedication! With this kind of performance, who needs the cloud? The era of the iPhone 17 Pro Max delivering server-grade intelligence at lightning speed is here!

🚀 What’s Next?

The standard for on-device AI is rapidly shifting from “16-bit” to “1.58-bit.” This will enable advanced inference even on low-memory, cost-effective devices, accelerating the proliferation of AI agents. Developers will soon adopt a new norm: differentiating between 1-bit (ultra-lightweight) and 1.58-bit (high-performance, lightweight) models!

💬 Shark’s Quick Take

Having an 8B model running smoothly on an iPhone feels like my stomach just grew tenfold! This efficiency is pure shark-level energy-saving power with high performance! 🦈🔥

📚 Terminology

  • Ternary Weights: A technique for representing the brain cells (weights) of AI using only three states {-1, 0, +1}, drastically reducing computational cost.

  • 1.58-bit Representation: The number of bits needed to express three states (log2(3) ≒ 1.58), offering greater expressiveness than 1 bit (binary).

  • Pareto Frontier: The “best” line in performance versus size trade-offs beyond which improvements are impossible. This breakthrough shifts it left (to smaller and more powerful) in this case!

  • Source: Ternary Bonsai: Top Intelligence at 1.58 Bits

🦈 はるサメ厳選!イチオシAI関連
【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈