Failover Routing

Local first. Cloud when needed.

Run Ollama or LM Studio locally. When your local model is down, overloaded, or too slow, Stockyard fails over to a cloud provider automatically. Your app never knows the difference.

How it works

1

Configure your providers

Add your local Ollama endpoint and one or more cloud providers. Stockyard knows which provider to try first based on the model name.

Free
2

Requests go local first

When your app sends a request, Stockyard routes to Ollama. If Ollama responds, your cost is zero and latency stays low.

Free
3

Automatic failover when local is unavailable

If Ollama is down, overloaded, or times out, Stockyard retries the request against the next provider. Circuit breakers prevent hammering a dead endpoint.

Free
4

See what happened in traces

Lookout records every request — including failovers. You can see which requests went local, which fell back to cloud, what it cost, and why.

Free — Lookout
# stockyard.yaml providers: - name: ollama base_url: http://localhost:11434/v1 priority: 1 - name: openai api_key: sk-... priority: 2 # Use aliasing for clean model names model_alias: enabled: true aliases: fast: ollama/llama3 smart: claude-sonnet-4-20250514

Combine aliasing with failover: your app requests "fast", Stockyard tries Ollama first, falls back to cloud if needed.

Model-aware routing

Stockyard's failover is model-aware. It sends Claude requests to Anthropic first, GPT requests to OpenAI first, and Ollama model requests to your local endpoint first. If the primary fails, it walks down the provider list by priority. This isn't round-robin — it's intelligent routing that understands which provider owns which model.

Circuit breakers track per-provider health. When a provider fails repeatedly, Stockyard stops sending traffic until it recovers. You can monitor and reset circuit breaker state through the admin API.

Cost: Local inference costs nothing per-request. Use cloud only when local can't handle it.

Privacy: Sensitive requests stay on your hardware. Only non-sensitive overflow hits cloud providers.

Reliability: Your app works even if cloud providers have outages. And it works even if your local GPU is being used for something else.

Flexibility: Start with Ollama, add cloud later. Or start with cloud, move to local as you scale. The proxy handles the transition.

One endpoint for local and cloud.

Install Stockyard, add your local and cloud providers, and your app gets reliable LLM access regardless of which backend is available.

Install Stockyard
Proxy-Only Setup → Model Aliasing → Proxy Docs →
Explore: Self-hosted proxy · OpenAI-compatible · Model aliasing
Stockyard also makes 150 focused self-hosted tools — browse the catalog or get everything for $29/mo.