3 min read
[AI Minor News]

Run Google's Gemma 4 Locally on Your Mac at Lightning Speed! The New LM Studio CLI is a Game Changer


Google's latest AI, 'Gemma 4 26B-A4B,' demonstrates performance on par with 400B models using the Mixture-of-Experts (MoE) architecture, all while consuming minimal resources. ...

※この記事はアフィリエイト広告を含みます

Run Google’s Gemma 4 Locally on Your Mac at Lightning Speed! The New LM Studio CLI is a Game Changer

📰 News Overview

  • Google’s latest AI, “Gemma 4 26B-A4B,” showcases performance rivaling 400B models thanks to the Mixture-of-Experts (MoE) architecture, all while using minimal resources.
  • The popular app LM Studio has been updated to version 0.4.0, introducing a headless CLI (lms) that operates without a GUI, allowing for direct control from servers or terminals.
  • Reports indicate that the 26B model can be executed locally at an impressive speed of 51 tokens per second on a MacBook Pro equipped with the M4 Pro chip.

💡 Key Points

  • The Power of MoE: With 26B parameters but activating only 4B (8 experts) per token, inference costs are drastically reduced. It has achieved a high benchmark score of 82.6% on MMLU Pro.
  • New Engine “llmster”: The core of LM Studio has transformed into an independent daemon (background service), adding support for parallel request handling and the Model Context Protocol (MCP).
  • Privacy and Cost: By not using external APIs, it eliminates latency and prevents data leaks, enabling a fully offline environment.

🦈 Shark’s Eye (Curator’s Perspective)

The crux of this news lies in the synergy between Google’s efficient model “Gemma 4” and LM Studio’s evolution as a “developer tool”! The balance of the 26B-A4B model is particularly impressive. Thanks to MoE, it achieves a groundbreaking combination of “the lightweight nature of a 4B model” and “the intelligence of a model exceeding 10B.” With the unified memory of the M4 Mac, you can summon this beast with just a command, without launching a desktop app. It’s incredibly cool how it crushes the typical local LLM challenges of being “heavy and slow” from both architecture and tool perspectives! 🦈🔥

🚀 What’s Next?

With the rise of headless CLIs that don’t require a GUI, the integration of AI into not just personal PCs but also corporate servers and CI/CD pipelines will accelerate. Moreover, with the proven efficiency of MoE models, we can expect a future where high-performance AI with vast knowledge bases runs smoothly on our devices without being “heavy.”

💬 A Word from Haru-Same

Finally, our shark’s Mac has gained some “thinking muscle”! We’ve entered an era where you can spar (interact) with AI via the command line without worrying about API fees! Shark-tastic times ahead! 🦈✨

📚 Terminology

  • MoE (Mixture of Experts): A technology that combines multiple “expert” models, activating only a subset as needed for each task, allowing massive models to operate quickly while remaining intelligent.

  • Headless: A system that operates without a screen (GUI), controlled via command line or network. It’s lightweight and suited for automation.

  • Token: The smallest unit of text processed by AI. A speed of 51 tokens per second is incredibly fast, far surpassing human reading speeds.

  • Source: Running Gemma 4 locally with LM Studio’s new headless CLI and Claude Code

【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈