Provider Setup Guide

Environment variables, example requests, and notes for all 16 providers.

Stockyard auto-detects providers from environment variables at startup. Set a key, start the binary, and that provider is available through the OpenAI-compatible proxy. Every request gets the same middleware stack — caching, guardrails, rate limiting, tracing — regardless of which provider handles it. You can also use Stockyard as a proxy only without the full platform.

How it works
Set an environment variable (e.g., OPENAI_API_KEY). Stockyard detects it on startup and registers the provider. Send requests to http://localhost:4200/v1/chat/completions with the provider's model name. Stockyard routes to the right provider automatically.

How auto-detection works

On startup, Stockyard checks for known environment variables and registers each provider it finds. No configuration file is required. The startup log shows exactly what was detected:

# Set your keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export CEREBRAS_API_KEY=csk-...

# Start Stockyard
stockyard
# Output:
  Provider: openai (from OPENAI_API_KEY)
  Provider: anthropic (from ANTHROPIC_API_KEY)
  Provider: cerebras (from CEREBRAS_API_KEY)
  Providers: 3 (openai, anthropic, cerebras)

You can set as many provider keys as you want. Stockyard routes each request based on the model name in the request body — gpt-4o goes to OpenAI, claude-sonnet-4-20250514 goes to Anthropic, and so on.

Native providers

These four providers have custom adapters that handle their specific API formats natively. They support streaming, function calling, and embeddings out of the box.

OpenAI Native

Env var
OPENAI_API_KEY
Get a key
platform.openai.com/api-keys
Models
gpt-4o, gpt-4o-mini, gpt-4.1, o3, o4-mini, etc.
Supports
Chat, Streaming, Embeddings, Function calling, Vision
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'

Anthropic Native

Env var
ANTHROPIC_API_KEY
Get a key
console.anthropic.com
Models
claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku, etc.
Supports
Chat, Streaming, Function calling, Vision
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hello"}]}'

Stockyard translates OpenAI-format requests to Anthropic's Messages API format automatically. You send OpenAI-shaped requests; Stockyard handles the conversion.

Google Gemini Native

Env var
GEMINI_API_KEY
Get a key
aistudio.google.com/apikey
Models
gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, etc.
Supports
Chat, Streaming, Embeddings, Function calling, Vision
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemini-2.5-flash","messages":[{"role":"user","content":"hello"}]}'

Groq Native

Env var
GROQ_API_KEY
Get a key
console.groq.com/keys
Models
llama-3.3-70b-versatile, mixtral-8x7b-32768, gemma2-9b-it, etc.
Supports
Chat, Streaming, Function calling
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"hello"}]}'

Cloud providers (OpenAI-compatible)

These providers use the OpenAI-compatible protocol. Stockyard routes requests to their base URL using the standard /v1/chat/completions format. Set the env var and they work.

DeepSeek Compatible

Env var
DEEPSEEK_API_KEY
Get a key
platform.deepseek.com
Base URL
https://api.deepseek.com/v1
Models
deepseek-chat, deepseek-reasoner, deepseek-coder
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-chat","messages":[{"role":"user","content":"hello"}]}'

Mistral Compatible

Env var
MISTRAL_API_KEY
Get a key
console.mistral.ai
Base URL
https://api.mistral.ai/v1
Models
mistral-large-latest, mistral-small-latest, codestral-latest

Cerebras Compatible

Env var
CEREBRAS_API_KEY
Get a key
cloud.cerebras.ai
Base URL
https://api.cerebras.ai/v1
Models
llama3.1-8b, llama3.1-70b, llama-4-scout

Cerebras delivers extremely fast inference on their custom wafer-scale hardware. Expect the lowest time-to-first-token of any cloud provider for supported models.

SambaNova Compatible

Env var
SAMBANOVA_API_KEY
Get a key
cloud.sambanova.ai
Base URL
https://api.sambanova.ai/v1
Models
DeepSeek-R1, Meta-Llama-3.3-70B-Instruct, Qwen models

Fireworks AI Compatible

Env var
FIREWORKS_API_KEY
Get a key
fireworks.ai
Base URL
https://api.fireworks.ai/inference/v1

Together AI Compatible

Env var
TOGETHER_API_KEY
Get a key
together.xyz
Base URL
https://api.together.xyz/v1

DeepInfra Compatible

Env var
DEEPINFRA_API_KEY
Get a key
deepinfra.com
Base URL
https://api.deepinfra.com/v1/openai
Note
DeepInfra's base URL includes /openai at the end. This is correct — do not add an extra /v1.

NVIDIA NIM Compatible

Env var
NVIDIA_API_KEY
Get a key
build.nvidia.com
Base URL
https://integrate.api.nvidia.com/v1

For self-hosted NIM containers, override the base URL to point to your local instance (e.g., http://localhost:8000/v1).

Hugging Face Compatible

Env var
HF_TOKEN
Get a key
huggingface.co/settings/tokens
Base URL
https://router.huggingface.co/v1

HF Inference Providers routes to 15+ backend providers (Cerebras, Groq, Together, etc.) through a single endpoint. Use model names in HuggingFace format (e.g., meta-llama/Llama-3.1-8B-Instruct).

xAI (Grok) Compatible

Env var
XAI_API_KEY
Get a key
console.x.ai
Base URL
https://api.x.ai/v1

Cohere Compatible

Env var
COHERE_API_KEY
Get a key
dashboard.cohere.com
Base URL
https://api.cohere.com/compatibility/v1

Uses Cohere's OpenAI-compatible endpoint. Model names: command-r-plus, command-r, etc.

Replicate Compatible

Env var
REPLICATE_API_TOKEN
Get a key
replicate.com
Base URL
https://openai-proxy.replicate.com/v1

Perplexity Compatible

Env var
PERPLEXITY_API_KEY
Get a key
perplexity.ai/settings/api
Base URL
https://api.perplexity.ai

OpenRouter Compatible

Env var
OPENROUTER_API_KEY
Get a key
openrouter.ai/keys
Base URL
https://openrouter.ai/api/v1

OpenRouter aggregates 100+ models from multiple providers. Use their model naming format (e.g., anthropic/claude-3.5-sonnet).

Azure OpenAI Compatible

Env var
AZURE_OPENAI_API_KEY
Base URL
Set per-deployment: https://{resource}.openai.azure.com/openai/deployments/{deployment}
Note
Azure uses the api-key header instead of Authorization: Bearer. Stockyard handles this automatically when using the Azure adapter. You will need to set a custom base URL for your specific Azure deployment.

AI21 Labs Compatible

Env var
AI21_API_KEY
Get a key
studio.ai21.com
Base URL
https://api.ai21.com/studio/v1
Models
jamba-large, jamba-mini

FriendliAI Compatible

Env var
FRIENDLI_TOKEN
Get a key
friendli.ai
Base URL
https://api.friendli.ai/serverless/v1

Hyperbolic Compatible

Env var
HYPERBOLIC_API_KEY
Base URL
https://api.hyperbolic.xyz/v1

Novita AI Compatible

Env var
NOVITA_API_KEY
Base URL
https://api.novita.ai/v3/openai

Featherless AI Compatible

Env var
FEATHERLESS_API_KEY
Base URL
https://api.featherless.ai/v1

Lambda Labs Compatible

Env var
LAMBDA_API_KEY
Base URL
https://api.lambdalabs.com/v1

Nebius Compatible

Env var
NEBIUS_API_KEY
Base URL
https://api.studio.nebius.ai/v1

Lepton AI Compatible

Env var
LEPTON_API_KEY
Base URL
https://api.lepton.ai/v1

Nscale Compatible

Env var
NSCALE_API_KEY
Base URL
https://inference.api.nscale.com/v1

Baseten Compatible

Env var
BASETEN_API_KEY
Base URL
https://bridge.baseten.co/v1/direct

Moonshot / Kimi Compatible

Env var
MOONSHOT_API_KEY
Base URL
https://api.moonshot.cn/v1
Models
moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k

DashScope / Qwen Compatible

Env var
DASHSCOPE_API_KEY
Base URL
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Models
qwen-turbo, qwen-plus, qwen-max

Uses the international DashScope endpoint. For Chinese mainland access, the base URL may differ.

Yi / 01.AI Compatible

Env var
YI_API_KEY
Base URL
https://api.01.ai/v1

GitHub Models Compatible

Env var
GITHUB_MODELS_TOKEN
Get a key
github.com/settings/tokens (use a PAT with no scopes)
Base URL
https://models.inference.ai.azure.com

Free model inference through GitHub's marketplace. Good for prototyping. Rate-limited.

Local / self-hosted providers

These providers run on your own hardware. No API key is needed — Stockyard connects to them on localhost. Make sure the inference server is running before starting Stockyard, or configure the base URL to point to your server's address.

Auto-detection for local providers
Local providers (Ollama, LM Studio, vLLM, SGLang, TGI) are not auto-detected from environment variables because they do not require API keys. Configure them through the Stockyard API or config file instead. See custom providers below.

Ollama Local

Default URL
http://localhost:11434/v1
Install
ollama.com/download
# Start Ollama and pull a model
ollama serve
ollama pull llama3.1

# Send through Stockyard
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.1","messages":[{"role":"user","content":"hello"}]}'

LM Studio Local

Default URL
http://localhost:1234/v1
Install
lmstudio.ai

Start the LM Studio server from the app, load a model, then point Stockyard at it.

vLLM Local

Default URL
http://localhost:8000/v1
Install
pip install vllm or Docker: vllm/vllm-openai:latest
# Start vLLM
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

Production-grade serving with continuous batching, PagedAttention, and tensor parallelism. Use this for high-throughput self-hosted inference.

SGLang Local

Default URL
http://localhost:30000/v1
Install
pip install sglang
# Start SGLang
python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --port 30000

Text Generation Inference (TGI) Local

Default URL
http://localhost:8080/v1
Install
Docker: ghcr.io/huggingface/text-generation-inference:latest
# Start TGI
docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id meta-llama/Llama-3.1-8B-Instruct

Custom providers

Any OpenAI-compatible endpoint works with Stockyard. If your provider is not in the list above but speaks the OpenAI chat completions protocol, you can add it as a custom provider through the API:

curl -X POST http://localhost:4200/api/proxy/providers \
  -H "Content-Type: application/json" \
  -H "X-Admin-Key: YOUR_ADMIN_KEY" \
  -d '{
    "name": "my-provider",
    "base_url": "https://my-endpoint.com/v1",
    "api_key": "my-api-key"
  }'

This registers the provider at runtime without restarting Stockyard. The provider will be available immediately for routing.

Troubleshooting

Provider not detected on startup

Check that the environment variable name matches exactly (case-sensitive). Run env | grep API_KEY to verify. Stockyard logs every detected provider on startup — if you do not see it in the logs, the env var is not set in the process environment.

401 Unauthorized from provider

Your API key is invalid or expired. Go to the provider's dashboard and generate a new key. Make sure the key has no leading or trailing whitespace.

Connection refused (local providers)

The inference server is not running. Start Ollama, LM Studio, vLLM, SGLang, or TGI before sending requests through Stockyard. Check that the port matches the default or your custom configuration.

Model not found

The model name in your request does not match any model available at the provider. Check the provider's documentation for their exact model naming format. Some providers use prefixed names (e.g., meta-llama/Llama-3.1-8B-Instruct on HuggingFace) while others use short names (e.g., llama3.1 on Ollama).

Checking provider health

curl http://localhost:4200/api/proxy/providers/health \
  -H "X-Admin-Key: YOUR_ADMIN_KEY"

Returns per-provider status, latency, and any errors. Also available in the dashboard under the Overview page.

Explore: Model aliasing · Why SQLite · vs LiteLLM