[AI Minor News Flash] Claude Fees Slashed by Up to 90%! The Revolutionary Auto Prompt Caching Tool ‘Prompt-caching’ is Here!
📰 News Overview
- Cut Token Costs by Up to 90%: An open-source tool called ‘Prompt-caching’ has been released, which automatically caches system prompts, tool definitions, and file contents when using Anthropic’s Claude, drastically reducing costs on reuse.
- Versatile Caching Modes: Equipped with four usage-specific modes, including Bug Fix Mode (caching stack traces), Refactoring Mode (caching style guides), and File Tracking (auto-caching on second reads).
- Broad Compatibility: This tool is not just a plugin for Claude Code, but also works with MCP clients like Cursor and Windsurf, and can be used in apps developed with the Anthropic SDK.
💡 Key Points
- Automatic Cache Point Insertion: The tool automatically detects and marks cache points based on context (stable prompts and file reads), eliminating the need for manual setup of
cache_control. - Impressive Savings Rates: Statistics show potential token savings of 92% for general coding tasks, 85% for bug fixing, and 80% for refactoring.
- Visualization Tool Provided: Includes
get_cache_stats, a tool to track cache hit rates and cumulative savings, allowing you to see the benefits in numbers.
🦈 Shark’s Eye (Curator’s Perspective)
The true brilliance of this tool lies in the benefits it offers to SDK users! While Claude Code has a built-in caching feature, developers working with custom apps or agents using the SDK had to manually define cache boundaries, which was a hassle. This tool automates that process and even includes an “analyze_cacheability” feature to analyze cache effectiveness—now that’s clever! Creating the cache typically costs 1.25 times as much, but subsequent uses drop to just 0.1 times. This “invest now for massive future savings” approach is bound to be a game-changer for developing AI agents that engage in long conversations! 🦈🔥
🚀 What’s Next?
If development costs drop to a tenth, running agents based on “super long context” that were previously budget-busting will become feasible. The “fuel efficiency” of AI development will dramatically improve!
💬 A Word from Haru-Shark
Developers feeling parched from rising token costs! With these cost-saving tips from your friendly shark, you can fully unleash the power of Claude! Sharky shark! 🦈✨
📚 Terminology Explained
-
Prompt Caching: A technique that temporarily stores the contents of sent prompts on the AI server, reducing computational load and costs for future submissions of the same content.
-
MCP (Model Context Protocol): A common standard for AI models to communicate with external tools and data sources. With MCP compatibility, similar features can be used across various AI editors.
-
Token: The smallest unit of text processed by AI. Fees for using APIs like Claude are charged based on the number of tokens used.
-
Source: Prompt-caching