Failover Routing

Local first. Cloud when needed.

Run Ollama or LM Studio locally. When your local model is down, overloaded, or too slow, Stockyard fails over to a cloud provider automatically. Your app never knows the difference.

How it works

Configure your providers

Add your local Ollama endpoint and one or more cloud providers. Stockyard knows which provider to try first based on the model name.

Free

Requests go local first

When your app sends a request, Stockyard routes to Ollama. If Ollama responds, your cost is zero and latency stays low.

Free

Automatic failover when local is unavailable

If Ollama is down, overloaded, or times out, Stockyard retries the request against the next provider. Circuit breakers prevent hammering a dead endpoint.

Free

See what happened in traces

Lookout records every request — including failovers. You can see which requests went local, which fell back to cloud, what it cost, and why.

Free — Lookout

Example setup

# stockyard.yaml
providers:
  - name: ollama
    base_url: http://localhost:11434/v1
    priority: 1
  - name: openai
    api_key: sk-...
    priority: 2

# Use aliasing for clean model names
model_alias:
  enabled: true
  aliases:
    fast: ollama/llama3
    smart: claude-sonnet-4-20250514
    

Combine aliasing with failover: your app requests "fast", Stockyard tries Ollama first, falls back to cloud if needed.

Model-aware routing

Stockyard's failover is model-aware. It sends Claude requests to Anthropic first, GPT requests to OpenAI first, and Ollama model requests to your local endpoint first. If the primary fails, it walks down the provider list by priority. This isn't round-robin — it's intelligent routing that understands which provider owns which model.

Circuit breakers track per-provider health. When a provider fails repeatedly, Stockyard stops sending traffic until it recovers. You can monitor and reset circuit breaker state through the admin API.

Why local + cloud

Cost: Local inference costs nothing per-request. Use cloud only when local can't handle it.

Privacy: Sensitive requests stay on your hardware. Only non-sensitive overflow hits cloud providers.

Reliability: Your app works even if cloud providers have outages. And it works even if your local GPU is being used for something else.

Flexibility: Start with Ollama, add cloud later. Or start with cloud, move to local as you scale. The proxy handles the transition.

One endpoint for local and cloud.

Install Stockyard, add your local and cloud providers, and your app gets reliable LLM access regardless of which backend is available.

Install Stockyard

Proxy-Only Setup → Model Aliasing → Proxy Docs →