Run Ollama or LM Studio locally. When your local model is down, overloaded, or too slow, Stockyard fails over to a cloud provider automatically. Your app never knows the difference.
Add your local Ollama endpoint and one or more cloud providers. Stockyard knows which provider to try first based on the model name.
FreeWhen your app sends a request, Stockyard routes to Ollama. If Ollama responds, your cost is zero and latency stays low.
FreeIf Ollama is down, overloaded, or times out, Stockyard retries the request against the next provider. Circuit breakers prevent hammering a dead endpoint.
FreeLookout records every request — including failovers. You can see which requests went local, which fell back to cloud, what it cost, and why.
Free — LookoutCombine aliasing with failover: your app requests "fast", Stockyard tries Ollama first, falls back to cloud if needed.
Stockyard's failover is model-aware. It sends Claude requests to Anthropic first, GPT requests to OpenAI first, and Ollama model requests to your local endpoint first. If the primary fails, it walks down the provider list by priority. This isn't round-robin — it's intelligent routing that understands which provider owns which model.
Circuit breakers track per-provider health. When a provider fails repeatedly, Stockyard stops sending traffic until it recovers. You can monitor and reset circuit breaker state through the admin API.
Cost: Local inference costs nothing per-request. Use cloud only when local can't handle it.
Privacy: Sensitive requests stay on your hardware. Only non-sensitive overflow hits cloud providers.
Reliability: Your app works even if cloud providers have outages. And it works even if your local GPU is being used for something else.
Flexibility: Start with Ollama, add cloud later. Or start with cloud, move to local as you scale. The proxy handles the transition.
Install Stockyard, add your local and cloud providers, and your app gets reliable LLM access regardless of which backend is available.
Install Stockyard