Every option for routing LLM traffic through your own infrastructure, compared on what actually matters: deployment complexity, provider coverage, and operational overhead.
A self-hosted LLM proxy sits between your application and LLM providers. It routes requests, tracks costs, adds guardrails, and keeps your data on your own infrastructure. The best ones minimize operational burden while maximizing control.
The criteria that matter most: how many providers does it support out of the box? How much infrastructure does it require beyond the proxy itself? Can you deploy it in five minutes or does it need a weekend? Does it handle streaming, function calling, and the latest model features?
Single Go binary with embedded SQLite. Ships as a ~25MB download, runs with zero external dependencies. 76 middleware modules for cost tracking, PII redaction, rate limiting, caching, and model aliasing. 16 provider integrations. OpenAI-compatible API so any SDK that talks to /v1/chat/completions works without code changes. Proxy core is Apache 2.0; full platform is BSL 1.1.
The tradeoff: SQLite means single-node. No horizontal scaling. If you need multi-region clustering, this is not the right tool. If you need a proxy that runs on one server and just works, it is.
Python-based proxy with broad provider support (100+ models). Requires Python, pip, and typically Postgres for production use. Good if your team already runs Python infrastructure and needs a proxy they can extend with custom Python code. Active open-source community.
The tradeoff: Python runtime adds deployment complexity. Postgres dependency means another service to manage. Cold starts can be slow. Full comparison.
Primarily a SaaS product with a gateway component. Strong observability features and a polished dashboard. The self-hosted option requires Docker and external databases.
The tradeoff: self-hosted mode is not the primary product. Fewer battle-tested self-hosted deployments in the wild. Full comparison.
Enterprise API gateway with LLM plugins. Mature infrastructure, battle-tested at scale. If you already run Kong for your API layer, adding LLM routing is natural.
The tradeoff: massive operational footprint. Postgres required. Configuration is complex. Overkill if all you need is an LLM proxy.
You can build an LLM proxy with Nginx reverse proxy rules or Envoy filters. Zero dependencies beyond the proxy itself. Maximum control.
The tradeoff: you build everything yourself. No cost tracking, no model aliasing, no provider failover, no token counting, no streaming support out of the box. Every feature is a custom implementation.
| Feature | Stockyard | LiteLLM | Portkey | Kong AI |
|---|---|---|---|---|
| Self-contained binary | ✓ | ✗ | ✗ | ✗ |
| No external DB | ✓ | ✗ | ✗ | ✗ |
| Providers (built-in) | 40 | 100+ | 20+ | 10+ |
| Middleware modules | 76 | ~15 | ~20 | ~10 |
| Deploy time | <5 min | ~30 min | ~1 hr | ~2 hr |
| OpenAI-compatible | ✓ | ✓ | ✓ | ✓ |
| Cost tracking | ✓ | ✓ | ✓ | Plugin |
| PII redaction | ✓ | ✗ | ✓ | ✗ |
Stockyard installs in one command. No Docker, no Postgres, no Python.
Get started in 5 minutes