Does prompt caching affect response quality?

No. Cached responses are identical to what the provider would return for the same request. Caching only applies to exact matches on all request parameters.

How much can prompt caching save?

Savings depend on your workload. Applications with repeated prompts (classification, templated queries, testing) can see 50-90% cost reduction. Unique conversational prompts see minimal cache hits.

Do I need Redis for prompt caching?

Not with Stockyard. The cache is built into the proxy using embedded SQLite. No external cache service required. Other tools like LiteLLM typically require Redis for caching.

What Is Prompt Caching? Save Money and Reduce LLM Latency

The short version

Prompt caching intercepts LLM API requests and checks if the same prompt has been seen before. If it has, the cached response is returned instantly without calling the provider. No API cost, no network latency, no waiting for the model to generate.

For workloads with repeated or similar prompts, caching can reduce costs by 50-90% and cut response times from seconds to single-digit milliseconds.

How it works

An LLM proxy sits between your application and the provider. When a request arrives, the proxy computes a cache key from the prompt, model, temperature, and other parameters. If a matching response exists in the cache and has not expired, it is returned directly.

The cache key must account for all parameters that affect the response. Two requests with the same prompt but different temperatures should not share a cache entry. Most caching implementations hash the full request body or a normalized version of it.

When caching helps

Caching works best when your application sends the same or very similar prompts repeatedly. Common scenarios include classification tasks (same categories, different inputs with identical system prompts), retrieval-augmented generation with overlapping context, customer support bots answering frequently asked questions, and development/testing where you call the same prompt hundreds of times.

Caching helps less for open-ended creative generation, unique conversational contexts, or when you want model variability (high temperature settings).

Caching in Stockyard

Stockyard includes a built-in prompt cache as one of its 76 middleware modules. It is toggleable at runtime, uses embedded SQLite for storage, and supports configurable TTL. Cache hits are tracked in the cost dashboard so you can see exactly how much you are saving.

Enable it with one API call:

curl -X PUT http://localhost:4200/api/proxy/modules/cache \
  -d '{"enabled": true}'

No separate Redis or Memcached instance needed. The cache lives in the same SQLite database as everything else. Install Stockyard and try it.

What is prompt caching?

The short version

How it works

When caching helps

Caching in Stockyard