Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers.
Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models.
That’s likely to be welcome news to developers as the cost of using frontier models continues to grow.
Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to re-create answers to the same request.
Google previously offered model prompt caching, but only explicit prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work.
Some developers weren’t pleased with how Google’s explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills. Complaints reached a fever pitch in the past week, prompting the Gemini team to apologize and pledge to make changes.
In contrast to explicit caching, implicit caching is automatic. Enabled by default for Gemini 2.5 models, it passes on cost savings if a Gemini API request to a model hits a cache.
Techcrunch event
Berkeley, CA
|
June 5
“[W]hen you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit,” explained Google in a blog post. “We will dynamically pass cost savings back to you.”
The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, according to Google’s developer documentation, which is not a terribly big amount, meaning it shouldn’t take much to trigger these automatic savings. Tokens are the raw bits of data models work with, with a thousand tokens equivalent to about 750 words.
Given that Google’s last claims of cost savings from caching ran afoul, there are some buyer-beware areas in this new feature. For one, Google recommends that developers keep repetitive context at the beginning of requests to increase the chances of implicit cache hits. Context that might change from request to request should be appended at the end, the company says.
For another, Google didn’t offer any third-party verification that the new implicit caching system would deliver the promised automatic savings. So we’ll have to see what early adopters say.
Keep reading the article on Tech Crunch
Google announced on Thursday that it’s rolling out new AI-powered defenses to help combat scams on Chrome. The tech giant is going to start using Gemini Nano, its on-device large language model (LLM), on desktop to protect users against online scams. It’s also launching new AI-powered warnings for Chrome on Android to help users be aware of spammy notifications.
Google notes that Chrome’s Enhanced Protection mode of Safe Browsing on Chrome offers the highest level of protection, offering users twice the protection against phishing and other online threats compared to the browser’s Standard Protection mode. Now Google will use Gemini Nano to provide Enhanced Protection users with an additional layer of defense against online scams.
Google says this on-device approach will provide immediate insight into risky websites to protect users against scams, including those that haven’t been seen before.
“Gemini Nano’s LLM is perfect for this use because of its ability to distill the varied, complex nature of websites, helping us adapt to new scam tactics more quickly,” Google said in a blog post.
The company is already using this AI-powered defense to protect users from remote tech support claims. Google plans to expand this defense to Android devices and even more types of scams in the future.
As for the new AI-powered warnings, Google notes that the risk from scammy sites can extend beyond the site itself through notifications if you have them enabled. Malicious websites can use notifications to try to scam you, which is why Chrome will now help you be aware of malicious, spammy, or misleading notifications on Android.
Now when Chrome’s on-device machine learning model flags a notification as possibly being a scam, you will receive a warning. You can choose to either unsubscribe or view the content that was blocked. If you think the warning was shown incorrectly, you can allow all future notifications from that site.
Techcrunch event
Berkeley, CA
|
June 5
As part of today’s announcement, Google shared that it has been using AI to stop scams in Search by detecting and blocking hundreds of millions of scammy results every day. Its AI-powered scam detection systems have helped to catch 20 times the number of scammy pages, Google says.
For example, Google has seen an increase in bad actors impersonating airline customer service agents and scamming people looking for help. The company says it has reduced these scams by more than 80%, decreasing the risk of users coming across a scammy phone number on Search.
Keep reading the article on Tech Crunch