Store LLM responses and serve them instantly for identical requests. Cut costs and latency dramatically.
Prompt caching intercepts LLM API requests and checks if the same prompt has been seen before. If it has, the cached response is returned instantly without calling the provider. No API cost, no network latency, no waiting for the model to generate.
For workloads with repeated or similar prompts, caching can reduce costs by 50-90% and cut response times from seconds to single-digit milliseconds.
An LLM proxy sits between your application and the provider. When a request arrives, the proxy computes a cache key from the prompt, model, temperature, and other parameters. If a matching response exists in the cache and has not expired, it is returned directly.
The cache key must account for all parameters that affect the response. Two requests with the same prompt but different temperatures should not share a cache entry. Most caching implementations hash the full request body or a normalized version of it.
Caching works best when your application sends the same or very similar prompts repeatedly. Common scenarios include classification tasks (same categories, different inputs with identical system prompts), retrieval-augmented generation with overlapping context, customer support bots answering frequently asked questions, and development/testing where you call the same prompt hundreds of times.
Caching helps less for open-ended creative generation, unique conversational contexts, or when you want model variability (high temperature settings).
Stockyard includes a built-in prompt cache as one of its 76 middleware modules. It is toggleable at runtime, uses embedded SQLite for storage, and supports configurable TTL. Cache hits are tracked in the cost dashboard so you can see exactly how much you are saving.
Enable it with one API call:
curl -X PUT http://localhost:4200/api/proxy/modules/cache \ -d '{"enabled": true}'
No separate Redis or Memcached instance needed. The cache lives in the same SQLite database as everything else. Install Stockyard and try it.
Try Stockyard. One binary, 16 providers, under 60 seconds.
Get Started