Provider Setup Guide
Environment variables, example requests, and notes for all 16 providers.
Stockyard auto-detects providers from environment variables at startup. Set a key, start the binary, and that provider is available through the OpenAI-compatible proxy. Every request gets the same middleware stack — caching, guardrails, rate limiting, tracing — regardless of which provider handles it. You can also use Stockyard as a proxy only without the full platform.
Set an environment variable (e.g.,
OPENAI_API_KEY). Stockyard detects it on startup and registers the provider. Send requests to http://localhost:4200/v1/chat/completions with the provider's model name. Stockyard routes to the right provider automatically.
How auto-detection works
On startup, Stockyard checks for known environment variables and registers each provider it finds. No configuration file is required. The startup log shows exactly what was detected:
# Set your keys export OPENAI_API_KEY=sk-... export ANTHROPIC_API_KEY=sk-ant-... export CEREBRAS_API_KEY=csk-... # Start Stockyard stockyard # Output: Provider: openai (from OPENAI_API_KEY) Provider: anthropic (from ANTHROPIC_API_KEY) Provider: cerebras (from CEREBRAS_API_KEY) Providers: 3 (openai, anthropic, cerebras)
You can set as many provider keys as you want. Stockyard routes each request based on the model name in the request body — gpt-4o goes to OpenAI, claude-sonnet-4-20250514 goes to Anthropic, and so on.
Native providers
These four providers have custom adapters that handle their specific API formats natively. They support streaming, function calling, and embeddings out of the box.
OpenAI Native
curl http://localhost:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'
Anthropic Native
curl http://localhost:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"hello"}]}'
Stockyard translates OpenAI-format requests to Anthropic's Messages API format automatically. You send OpenAI-shaped requests; Stockyard handles the conversion.
Google Gemini Native
curl http://localhost:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gemini-2.5-flash","messages":[{"role":"user","content":"hello"}]}'
Groq Native
curl http://localhost:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"hello"}]}'
Cloud providers (OpenAI-compatible)
These providers use the OpenAI-compatible protocol. Stockyard routes requests to their base URL using the standard /v1/chat/completions format. Set the env var and they work.
DeepSeek Compatible
curl http://localhost:4200/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-chat","messages":[{"role":"user","content":"hello"}]}'
Mistral Compatible
Cerebras Compatible
Cerebras delivers extremely fast inference on their custom wafer-scale hardware. Expect the lowest time-to-first-token of any cloud provider for supported models.
SambaNova Compatible
Fireworks AI Compatible
Together AI Compatible
DeepInfra Compatible
DeepInfra's base URL includes
/openai at the end. This is correct — do not add an extra /v1.NVIDIA NIM Compatible
For self-hosted NIM containers, override the base URL to point to your local instance (e.g., http://localhost:8000/v1).
Hugging Face Compatible
HF Inference Providers routes to 15+ backend providers (Cerebras, Groq, Together, etc.) through a single endpoint. Use model names in HuggingFace format (e.g., meta-llama/Llama-3.1-8B-Instruct).
xAI (Grok) Compatible
Cohere Compatible
Uses Cohere's OpenAI-compatible endpoint. Model names: command-r-plus, command-r, etc.
Replicate Compatible
Perplexity Compatible
OpenRouter Compatible
OpenRouter aggregates 100+ models from multiple providers. Use their model naming format (e.g., anthropic/claude-3.5-sonnet).
Azure OpenAI Compatible
Azure uses the
api-key header instead of Authorization: Bearer. Stockyard handles this automatically when using the Azure adapter. You will need to set a custom base URL for your specific Azure deployment.AI21 Labs Compatible
FriendliAI Compatible
Hyperbolic Compatible
Novita AI Compatible
Featherless AI Compatible
Lambda Labs Compatible
Nebius Compatible
Lepton AI Compatible
Nscale Compatible
Baseten Compatible
Moonshot / Kimi Compatible
DashScope / Qwen Compatible
Uses the international DashScope endpoint. For Chinese mainland access, the base URL may differ.
Yi / 01.AI Compatible
GitHub Models Compatible
Free model inference through GitHub's marketplace. Good for prototyping. Rate-limited.
Local / self-hosted providers
These providers run on your own hardware. No API key is needed — Stockyard connects to them on localhost. Make sure the inference server is running before starting Stockyard, or configure the base URL to point to your server's address.
Local providers (Ollama, LM Studio, vLLM, SGLang, TGI) are not auto-detected from environment variables because they do not require API keys. Configure them through the Stockyard API or config file instead. See custom providers below.
Ollama Local
# Start Ollama and pull a model ollama serve ollama pull llama3.1 # Send through Stockyard curl http://localhost:4200/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"llama3.1","messages":[{"role":"user","content":"hello"}]}'
LM Studio Local
Start the LM Studio server from the app, load a model, then point Stockyard at it.
vLLM Local
# Start vLLM
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000
Production-grade serving with continuous batching, PagedAttention, and tensor parallelism. Use this for high-throughput self-hosted inference.
SGLang Local
# Start SGLang
python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --port 30000
Text Generation Inference (TGI) Local
# Start TGI
docker run --gpus all -p 8080:80 \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id meta-llama/Llama-3.1-8B-Instruct
Custom providers
Any OpenAI-compatible endpoint works with Stockyard. If your provider is not in the list above but speaks the OpenAI chat completions protocol, you can add it as a custom provider through the API:
curl -X POST http://localhost:4200/api/proxy/providers \
-H "Content-Type: application/json" \
-H "X-Admin-Key: YOUR_ADMIN_KEY" \
-d '{
"name": "my-provider",
"base_url": "https://my-endpoint.com/v1",
"api_key": "my-api-key"
}'
This registers the provider at runtime without restarting Stockyard. The provider will be available immediately for routing.
Troubleshooting
Provider not detected on startup
Check that the environment variable name matches exactly (case-sensitive). Run env | grep API_KEY to verify. Stockyard logs every detected provider on startup — if you do not see it in the logs, the env var is not set in the process environment.
401 Unauthorized from provider
Your API key is invalid or expired. Go to the provider's dashboard and generate a new key. Make sure the key has no leading or trailing whitespace.
Connection refused (local providers)
The inference server is not running. Start Ollama, LM Studio, vLLM, SGLang, or TGI before sending requests through Stockyard. Check that the port matches the default or your custom configuration.
Model not found
The model name in your request does not match any model available at the provider. Check the provider's documentation for their exact model naming format. Some providers use prefixed names (e.g., meta-llama/Llama-3.1-8B-Instruct on HuggingFace) while others use short names (e.g., llama3.1 on Ollama).
Checking provider health
curl http://localhost:4200/api/proxy/providers/health \ -H "X-Admin-Key: YOUR_ADMIN_KEY"
Returns per-provider status, latency, and any errors. Also available in the dashboard under the Overview page.