Tack Room (Studio)
Version, test, and optimize every prompt in your stack.
Prompt Templates
# Create a versioned prompt template curl -X POST http://localhost:4200/api/studio/templates \ -d '{"slug":"summarizer","name":"Summarizer","content":"Summarize: {{text}}"}' # List all templates curl http://localhost:4200/api/studio/templates
Templates support {{variable}} interpolation. Every edit creates a new version automatically.
A/B Experiments
# Run an A/B test across models curl -X POST http://localhost:4200/api/studio/experiments/run \ -d '{ "name": "speed-vs-quality", "prompt": "Explain quantum computing in one paragraph", "models": ["gpt-4o","claude-sonnet-4-5-20250929","gemini-2.0-flash"], "runs": 3, "eval": "length" }'
Eval methods: length (longer = better), concise (shorter = better), json (valid JSON), contains (keyword match), or empty for cost comparison.
Benchmarks
# Run a multi-prompt benchmark curl -X POST http://localhost:4200/api/studio/benchmarks/run \ -d '{ "name": "model-eval-q1", "models": ["gpt-4o-mini","deepseek-chat"], "prompts": [ {"name":"summarize","prompt":"Summarize: ...","eval":"concise"}, {"name":"code","prompt":"Write fizzbuzz","eval":"contains","eval_arg":"for"} ], "runs": 3 }'
See the Studio product page and the Diff view (coming soon).
Prompt Templates
Create versioned prompt templates for reuse and A/B testing:
curl -X POST http://localhost:4200/api/studio/templates \
-H "Authorization: Bearer sy_admin_..." \
-H "Content-Type: application/json" \
-d '{
"name": "customer-support-v1",
"system_prompt": "You are a helpful customer support agent for {{company}}. Be concise and friendly.",
"variables": ["company"],
"model": "gpt-4o",
"temperature": 0.3
}'
{
"id": "tpl_abc123",
"name": "customer-support-v1",
"version": 1,
"created_at": "2026-02-28T12:00:00Z"
}
Listing Templates
curl http://localhost:4200/api/studio/templates \
-H "Authorization: Bearer sy_admin_..."
{
"templates": [
{"id": "tpl_abc123", "name": "customer-support-v1", "version": 1, "model": "gpt-4o"},
{"id": "tpl_def456", "name": "code-review-v2", "version": 2, "model": "claude-sonnet-4-5"}
],
"total": 2
}
A/B Experiments
Run experiments to compare prompts, models, or temperatures:
curl -X POST http://localhost:4200/api/studio/experiments/run \
-H "Authorization: Bearer sy_admin_..." \
-H "Content-Type: application/json" \
-d '{
"name": "GPT-4o vs Claude on support",
"variants": [
{"model": "gpt-4o", "template": "customer-support-v1", "variables": {"company": "Acme"}},
{"model": "claude-sonnet-4-5", "template": "customer-support-v1", "variables": {"company": "Acme"}}
],
"test_inputs": [
{"messages": [{"role": "user", "content": "How do I reset my password?"}]},
{"messages": [{"role": "user", "content": "I want a refund"}]}
],
"eval_criteria": ["helpfulness", "conciseness", "tone"]
}'
Results include side-by-side comparisons and cost breakdowns:
curl http://localhost:4200/api/studio/experiments/exp_id \
-H "Authorization: Bearer sy_admin_..."
Benchmark Runs
Run benchmark suites to compare model performance and cost:
curl -X POST http://localhost:4200/api/studio/benchmarks/run \
-H "Authorization: Bearer sy_admin_..." \
-H "Content-Type: application/json" \
-d '{
"models": ["gpt-4o", "claude-sonnet-4-5", "llama-3.3-70b-versatile"],
"prompts": ["Explain quantum computing in one paragraph"],
"runs_per_model": 3
}'
Tack Room Configuration
# stockyard.yaml
apps:
studio:
max_experiments: 100
max_templates: 500
default_eval_model: "gpt-4o"
promptlint module to automatically validate prompt templates against common anti-patterns before they reach the LLM.Tack Room Status
curl http://localhost:4200/api/studio/status \
-H "Authorization: Bearer sy_admin_..."
{
"templates": 12,
"experiments_run": 47,
"last_experiment": "2026-02-27T18:00:00Z"
}
Template Variables
Templates support Mustache-style variables ({{variable}}) that are substituted at runtime. This lets you reuse the same prompt across different contexts:
# Template with variables
{
"name": "product-description",
"system_prompt": "Write a {{tone}} product description for {{product}} targeting {{audience}}.",
"variables": ["tone", "product", "audience"],
"defaults": {"tone": "professional", "audience": "enterprise buyers"}
}
Variables without defaults are required at runtime. Variables with defaults can be overridden.
Version History
Each template update creates a new version. Previous versions are preserved for rollback and A/B comparison. The promptpad module automatically resolves template references to the latest version unless a specific version is pinned.
Evaluation Criteria
Experiments support custom evaluation criteria. Built-in criteria include: helpfulness, conciseness, tone, accuracy, and relevance. Define custom criteria as natural language descriptions that an evaluator model scores on a 1–5 scale.
mockllm during prompt development to test template rendering and workflow logic without making real API calls.Typical Tack Room Workflow
A typical prompt engineering workflow with Tack Room:
| Step | Action | API |
|---|---|---|
| 1. Draft | Create a prompt template with variables | POST /api/studio/templates |
| 2. Lint | Validate with promptlint module | Automatic on proxy |
| 3. Test | Run A/B experiment across models | POST /api/studio/experiments/run |
| 4. Evaluate | Review results and scores | GET /api/studio/experiments/{id} |
| 5. Benchmark | Measure latency and cost across providers | POST /api/studio/benchmarks/run |
| 6. Deploy | Pin the winning template version in production | PUT /api/proxy/modules/promptpad |
API Summary
| Method | Path | Description |
|---|---|---|
| GET | /api/studio/status | Studio app health and stats |
| GET | /api/studio/templates | List prompt templates |
| POST | /api/studio/templates | Create versioned template |
| POST | /api/studio/experiments/run | Run A/B experiment |
| GET | /api/studio/experiments/{id} | Get experiment results and scores |
| POST | /api/studio/benchmarks/run | Benchmark models on a prompt set |
All Tack Room endpoints require admin authentication. Templates created via the API are immediately available for use through the promptpad proxy module.
/playground provides a visual interface for testing templates and running experiments without curl commands.Prompt Engineering Tips
When creating templates in Tack Room, keep these best practices in mind:
| Practice | Why It Matters |
|---|---|
| Use system prompts for role | More consistent behavior across models |
| Keep variables focused | Smaller variable surface = fewer edge cases |
| Test with multiple models | Prompts that work on GPT-4o may need tweaks for Claude |
| Version every change | Tack Room auto-versions; never edit in place |
| Run experiments before deploy | A/B test catches regressions that manual review misses |
For the full Tack Room API reference, see API Reference: Tack Room.
The playground provides an interactive UI for quick prompt experimentation without writing API calls.