Building an LLM proxy in Go: architecture decisions

· Michael

Stockyard is an LLM proxy written in Go. It ships as a single ~25MB binary with embedded SQLite, 76 middleware modules, 29 integrated products, and a web console. No external dependencies, no runtime, no container required.

This post is about the Go-specific decisions that make that possible. Not a tutorial on how to build a proxy, but the architectural choices I made and what I would do differently.

Why Go

The main competitor in this space, LiteLLM, is written in Python. Python is fine for a proxy, but it requires a runtime, a package manager, and typically a Postgres and Redis alongside it. The install story is pip install plus setting up two external services.

I wanted the install story to be curl | sh and done. That means a statically linked binary with no runtime dependencies. Go gives you that with CGO_ENABLED=0 go build. Rust would also work, but I know Go better and the ecosystem is more mature for the kind of HTTP middleware and database work an LLM proxy needs.

Go's concurrency model also fits the workload. An LLM proxy spends most of its time waiting for upstream API responses. Goroutines handle that naturally without explicit async/await or thread pool management.

Go 1.22 ServeMux

Stockyard uses the standard library net/http ServeMux, specifically the Go 1.22 version with method-based routing. No Gin, no Echo, no Chi.

This was a deliberate choice. Third-party routers add a dependency, introduce their own middleware patterns, and make the binary depend on someone else's release cycle. The stdlib ServeMux in 1.22 supports GET /path and POST /path patterns, which is enough for an API with 360+ endpoints.

One gotcha: Go 1.22's ServeMux panics on duplicate route registration at startup. This bit me more than once when two packages tried to register the same path. The fix is to check for duplicates before registration, but the panic-on-boot behavior is aggressive compared to how other routers handle it.

// Go 1.22 method-based routing
mux.HandleFunc("GET /api/proxy/modules", listModules)
mux.HandleFunc("PUT /api/proxy/modules/{name}", updateModule)
mux.HandleFunc("DELETE /api/proxy/aliases/{alias}", deleteAlias)

// Homepage: exact match prevents catch-all
mux.HandleFunc("GET /{$}", serveHomepage)
  

The middleware chain

Stockyard has 76 middleware modules. Rate limiting, cost tracking, caching, PII redaction, prompt injection detection, model aliasing, failover routing, and more. They all run in a single request pipeline.

Each module implements a simple interface: receive the request, optionally modify it, optionally short-circuit it, or pass it to the next module. The chain is built at startup and modules can be toggled at runtime without restarting the server.

// Simplified middleware interface
type Module interface {
    Name() string
    Enabled() bool
    Process(req *ProxyRequest) (*ProxyResponse, error)
}
  

Runtime toggling works because each module checks its Enabled() state on every request. Toggling a module flips a boolean in the database. The next request sees the new state. No restart, no config reload, no signal handling.

The order matters. Rate limiting runs before cost tracking, which runs before caching, which runs before the upstream call. The chain is deterministic and the order is hardcoded. I considered making it configurable, but the complexity was not worth it for the current product.

Pure-Go SQLite

Stockyard uses modernc.org/sqlite instead of mattn/go-sqlite3. The reason is simple: mattn requires CGO, which means a C compiler in the build chain, cross-compilation headaches, and a dynamically linked binary on some platforms.

modernc.org/sqlite is a machine-translated C-to-Go port of SQLite. It is slower than the C version, but it compiles with CGO_ENABLED=0 and produces a fully static binary. For an LLM proxy where the database is never the bottleneck, the performance trade-off is worth it.

// Build command - no C compiler needed
CGO_ENABLED=0 go build -o stockyard ./cmd/stockyard/
  

I wrote about the SQLite production experience in more detail in SQLite in production.

Embedded static files

The web console, documentation site, and all static assets are embedded in the binary using Go's embed directive. This means the binary serves its own UI without a separate web server, CDN, or file system dependency.

//go:embed static
var staticFiles embed.FS
  

The trade-off is that changing a page requires rebuilding the binary. For a product that ships as a single file, this is acceptable. For a product with a large team making frequent UI changes, it would be painful.

One practical consequence: the site HTML files exist twice in the repo. Once in site/ for editing, and once in internal/site/static/ for embedding. They must stay in sync. This is annoying but manageable with a copy step before each build.

Streaming SSE pass-through

LLM APIs stream responses as Server-Sent Events. The proxy needs to forward these chunks as they arrive, not buffer the entire response. Go's http.Flusher interface makes this straightforward:

flusher, ok := w.(http.Flusher)
if !ok {
    // fallback to buffered response
}

// For each chunk from upstream:
w.Write(chunk)
flusher.Flush()
  

The complication is failover. If the upstream provider fails mid-stream, you cannot retry transparently because the client has already received partial data. Stockyard handles this by only failing over before the first chunk is sent. Once streaming starts, if the provider fails, the error propagates to the client.

Shared HTTP transport

Early versions of Stockyard created a new HTTP client for each provider on each request. This worked but was slow: TLS handshake, TCP connection setup, and connection negotiation on every call. The fix was a shared http.Transport with connection pooling, reused across all requests to each provider.

This is the kind of thing that is obvious in retrospect but easy to miss when you are focused on getting the middleware chain right. The proxy overhead dropped significantly once connections were reused.

What I would do differently

Use a code generator for the API routes. With 360+ endpoints registered by hand, route registration is tedious and error-prone. A generator that reads handler function signatures and emits route registration code would save time and catch duplicates at build time instead of at boot time.

Separate the site build from the binary build. Right now, editing an HTML page requires rebuilding the entire Go binary. A file-watching development mode that serves from disk instead of the embedded FS would speed up site iteration.

Add structured logging from day one. I started with log.Printf and it works, but structured logging with levels would make production debugging easier. This is a common Go regret.

The full source is at github.com/stockyard-dev/Stockyard if you want to see how any of this actually works.

— Michael

Self-hosted LLM proxy · Why SQLite · Architecture overview

SQLite in production · Why Go + SQLite · Self-hosted vs SaaS

Explore: Proxy-only mode · Install guide · Best self-hosted proxy · OpenAI-compatible