Provider Setup Guide

Environment variables, example requests, and notes for all 16 providers.

Stockyard auto-detects providers from environment variables at startup. Set a key, start the binary, and that provider is available through the OpenAI-compatible proxy. Every request gets the same middleware stack — caching, guardrails, rate limiting, tracing — regardless of which provider handles it. You can also use Stockyard as a proxy only without the full platform.

How it works
Set an environment variable (e.g., OPENAI_API_KEY). Stockyard detects it on startup and registers the provider. Send requests to http://localhost:4200/v1/chat/completions with the provider's model name. Stockyard routes to the right provider automatically.

How auto-detection works

On startup, Stockyard checks for known environment variables and registers each provider it finds. No configuration file is required. The startup log shows exactly what was detected:

# Set your keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export CEREBRAS_API_KEY=csk-...

# Start Stockyard
stockyard
# Output:
  Provider: openai (from OPENAI_API_KEY)
  Provider: anthropic (from ANTHROPIC_API_KEY)
  Provider: cerebras (from CEREBRAS_API_KEY)
  Providers: 3 (openai, anthropic, cerebras)

You can set as many provider keys as you want. Stockyard routes each request based on the model name in the request body — gpt-4o goes to OpenAI, claude-sonnet-4-20250514 goes to Anthropic, and so on.

Native providers

These four providers have custom adapters that handle their specific API formats natively. They support streaming, function calling, and embeddings out of the box.

OpenAI Native

Env var: OPENAI_API_KEY
Get a key: platform.openai.com/api-keys
Models: gpt-4o, gpt-4o-mini, gpt-4.1, o3, o4-mini, etc.
Supports: Chat, Streaming, Embeddings, Function calling, Vision

curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'

Anthropic Native

Env var: ANTHROPIC_API_KEY
Get a key: console.anthropic.com
Models: claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku, etc.
Supports: Chat, Streaming, Function calling, Vision

curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hello"}]}'

Stockyard translates OpenAI-format requests to Anthropic's Messages API format automatically. You send OpenAI-shaped requests; Stockyard handles the conversion.

Google Gemini Native

Env var: GEMINI_API_KEY
Get a key: aistudio.google.com/apikey
Models: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, etc.
Supports: Chat, Streaming, Embeddings, Function calling, Vision

curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemini-2.5-flash","messages":[{"role":"user","content":"hello"}]}'

Groq Native

Env var: GROQ_API_KEY
Get a key: console.groq.com/keys
Models: llama-3.3-70b-versatile, mixtral-8x7b-32768, gemma2-9b-it, etc.
Supports: Chat, Streaming, Function calling

curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"hello"}]}'

Cloud providers (OpenAI-compatible)

These providers use the OpenAI-compatible protocol. Stockyard routes requests to their base URL using the standard /v1/chat/completions format. Set the env var and they work.

DeepSeek Compatible

Env var: DEEPSEEK_API_KEY
Get a key: platform.deepseek.com
Base URL: https://api.deepseek.com/v1
Models: deepseek-chat, deepseek-reasoner, deepseek-coder

curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-chat","messages":[{"role":"user","content":"hello"}]}'

Mistral Compatible

Env var: MISTRAL_API_KEY
Get a key: console.mistral.ai
Base URL: https://api.mistral.ai/v1
Models: mistral-large-latest, mistral-small-latest, codestral-latest

Cerebras Compatible

Env var: CEREBRAS_API_KEY
Get a key: cloud.cerebras.ai
Base URL: https://api.cerebras.ai/v1
Models: llama3.1-8b, llama3.1-70b, llama-4-scout

Cerebras delivers extremely fast inference on their custom wafer-scale hardware. Expect the lowest time-to-first-token of any cloud provider for supported models.

SambaNova Compatible

Env var: SAMBANOVA_API_KEY
Get a key: cloud.sambanova.ai
Base URL: https://api.sambanova.ai/v1
Models: DeepSeek-R1, Meta-Llama-3.3-70B-Instruct, Qwen models

Fireworks AI Compatible

Env var: FIREWORKS_API_KEY
Get a key: fireworks.ai
Base URL: https://api.fireworks.ai/inference/v1

Together AI Compatible

Env var: TOGETHER_API_KEY
Get a key: together.xyz
Base URL: https://api.together.xyz/v1

DeepInfra Compatible

Env var: DEEPINFRA_API_KEY
Get a key: deepinfra.com
Base URL: https://api.deepinfra.com/v1/openai

Note
DeepInfra's base URL includes /openai at the end. This is correct — do not add an extra /v1.

NVIDIA NIM Compatible

Env var: NVIDIA_API_KEY
Get a key: build.nvidia.com
Base URL: https://integrate.api.nvidia.com/v1

For self-hosted NIM containers, override the base URL to point to your local instance (e.g., http://localhost:8000/v1).

Hugging Face Compatible

Env var: HF_TOKEN
Get a key: huggingface.co/settings/tokens
Base URL: https://router.huggingface.co/v1

HF Inference Providers routes to 15+ backend providers (Cerebras, Groq, Together, etc.) through a single endpoint. Use model names in HuggingFace format (e.g., meta-llama/Llama-3.1-8B-Instruct).

xAI (Grok) Compatible

Env var: XAI_API_KEY
Get a key: console.x.ai
Base URL: https://api.x.ai/v1

Cohere Compatible

Env var: COHERE_API_KEY
Get a key: dashboard.cohere.com
Base URL: https://api.cohere.com/compatibility/v1

Uses Cohere's OpenAI-compatible endpoint. Model names: command-r-plus, command-r, etc.

Replicate Compatible

Env var: REPLICATE_API_TOKEN
Get a key: replicate.com
Base URL: https://openai-proxy.replicate.com/v1

Perplexity Compatible

Env var: PERPLEXITY_API_KEY
Get a key: perplexity.ai/settings/api
Base URL: https://api.perplexity.ai

OpenRouter Compatible

Env var: OPENROUTER_API_KEY
Get a key: openrouter.ai/keys
Base URL: https://openrouter.ai/api/v1

OpenRouter aggregates 100+ models from multiple providers. Use their model naming format (e.g., anthropic/claude-3.5-sonnet).

Azure OpenAI Compatible

Env var: AZURE_OPENAI_API_KEY
Base URL: Set per-deployment: https://{resource}.openai.azure.com/openai/deployments/{deployment}

Note
Azure uses the api-key header instead of Authorization: Bearer. Stockyard handles this automatically when using the Azure adapter. You will need to set a custom base URL for your specific Azure deployment.

AI21 Labs Compatible

Env var: AI21_API_KEY
Get a key: studio.ai21.com
Base URL: https://api.ai21.com/studio/v1
Models: jamba-large, jamba-mini

FriendliAI Compatible

Env var: FRIENDLI_TOKEN
Get a key: friendli.ai
Base URL: https://api.friendli.ai/serverless/v1

Hyperbolic Compatible

Env var: HYPERBOLIC_API_KEY
Base URL: https://api.hyperbolic.xyz/v1

Novita AI Compatible

Env var: NOVITA_API_KEY
Base URL: https://api.novita.ai/v3/openai

Featherless AI Compatible

Env var: FEATHERLESS_API_KEY
Base URL: https://api.featherless.ai/v1

Lambda Labs Compatible

Env var: LAMBDA_API_KEY
Base URL: https://api.lambdalabs.com/v1

Nebius Compatible

Env var: NEBIUS_API_KEY
Base URL: https://api.studio.nebius.ai/v1

Lepton AI Compatible

Env var: LEPTON_API_KEY
Base URL: https://api.lepton.ai/v1

Nscale Compatible

Env var: NSCALE_API_KEY
Base URL: https://inference.api.nscale.com/v1

Baseten Compatible

Env var: BASETEN_API_KEY
Base URL: https://bridge.baseten.co/v1/direct

Moonshot / Kimi Compatible

Env var: MOONSHOT_API_KEY
Base URL: https://api.moonshot.cn/v1
Models: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k

DashScope / Qwen Compatible

Env var: DASHSCOPE_API_KEY
Base URL: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Models: qwen-turbo, qwen-plus, qwen-max

Uses the international DashScope endpoint. For Chinese mainland access, the base URL may differ.

Yi / 01.AI Compatible

Env var: YI_API_KEY
Base URL: https://api.01.ai/v1

GitHub Models Compatible

Env var: GITHUB_MODELS_TOKEN
Get a key: github.com/settings/tokens (use a PAT with no scopes)
Base URL: https://models.inference.ai.azure.com

Free model inference through GitHub's marketplace. Good for prototyping. Rate-limited.

Local / self-hosted providers

These providers run on your own hardware. No API key is needed — Stockyard connects to them on localhost. Make sure the inference server is running before starting Stockyard, or configure the base URL to point to your server's address.

Auto-detection for local providers
Local providers (Ollama, LM Studio, vLLM, SGLang, TGI) are not auto-detected from environment variables because they do not require API keys. Configure them through the Stockyard API or config file instead. See custom providers below.

Ollama Local

Default URL: http://localhost:11434/v1
Install: ollama.com/download

# Start Ollama and pull a model
ollama serve
ollama pull llama3.1

# Send through Stockyard
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.1","messages":[{"role":"user","content":"hello"}]}'

LM Studio Local

Default URL: http://localhost:1234/v1
Install: lmstudio.ai

Start the LM Studio server from the app, load a model, then point Stockyard at it.

vLLM Local

Default URL: http://localhost:8000/v1
Install: pip install vllm or Docker: vllm/vllm-openai:latest

# Start vLLM
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

Production-grade serving with continuous batching, PagedAttention, and tensor parallelism. Use this for high-throughput self-hosted inference.

SGLang Local

Default URL: http://localhost:30000/v1
Install: pip install sglang

# Start SGLang
python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --port 30000

Text Generation Inference (TGI) Local

Default URL: http://localhost:8080/v1
Install: Docker: ghcr.io/huggingface/text-generation-inference:latest

# Start TGI
docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id meta-llama/Llama-3.1-8B-Instruct

Custom providers

Any OpenAI-compatible endpoint works with Stockyard. If your provider is not in the list above but speaks the OpenAI chat completions protocol, you can add it as a custom provider through the API:

curl -X POST http://localhost:4200/api/proxy/providers \
  -H "Content-Type: application/json" \
  -H "X-Admin-Key: YOUR_ADMIN_KEY" \
  -d '{
    "name": "my-provider",
    "base_url": "https://my-endpoint.com/v1",
    "api_key": "my-api-key"
  }'

This registers the provider at runtime without restarting Stockyard. The provider will be available immediately for routing.

Troubleshooting

Provider not detected on startup

Check that the environment variable name matches exactly (case-sensitive). Run env | grep API_KEY to verify. Stockyard logs every detected provider on startup — if you do not see it in the logs, the env var is not set in the process environment.

401 Unauthorized from provider

Your API key is invalid or expired. Go to the provider's dashboard and generate a new key. Make sure the key has no leading or trailing whitespace.

Connection refused (local providers)

The inference server is not running. Start Ollama, LM Studio, vLLM, SGLang, or TGI before sending requests through Stockyard. Check that the port matches the default or your custom configuration.

Model not found

The model name in your request does not match any model available at the provider. Check the provider's documentation for their exact model naming format. Some providers use prefixed names (e.g., meta-llama/Llama-3.1-8B-Instruct on HuggingFace) while others use short names (e.g., llama3.1 on Ollama).

Checking provider health

curl http://localhost:4200/api/proxy/providers/health \
  -H "X-Admin-Key: YOUR_ADMIN_KEY"

Returns per-provider status, latency, and any errors. Also available in the dashboard under the Overview page.

← Docker Proxy →