Deployment
This guide covers deploying an Agentrail server to production — from environment variables and Docker setup to logging, security, and horizontal scaling considerations.
Environment Variables
Never put secrets in config files. Use environment variables for all credentials:
| Variable | Required | Purpose |
|---|---|---|
ANTHROPIC_API_KEY | If using Anthropic | LLM provider API key |
OPENAI_API_KEY | If using OpenAI | LLM provider API key |
TAVILY_API_KEY | If using web search | Tavily search integration |
BRAVE_SEARCH_API_KEY | If using web search | Brave Search integration |
JINA_API_KEY | If using web search | Jina Search integration |
AGENTRAIL_DATA_DIR | Recommended | Root data directory for sessions, KB, skills |
AGENTRAIL_CONFIG_PATH | Optional | Override config file location |
UI_SECRET_TOKEN | If UI is public | Bearer token for /api/* routes |
PORT | Optional | Server port (default: 3000) |
SANDBOX_IMAGE | Optional | Override sandbox Docker image |
Non-sensitive settings (timeouts, feature flags, model IDs) can stay in agentrail.yaml.
Docker Compose
A production-ready docker-compose.yml:
version: "3.8"
services:
server:
build: .
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- AGENTRAIL_DATA_DIR=/data/agentrail
- AGENTRAIL_CONFIG_PATH=/app/config/agentrail.yaml
- UI_SECRET_TOKEN=${UI_SECRET_TOKEN}
volumes:
- agentrail-data:/data/agentrail
- /var/run/docker.sock:/var/run/docker.sock
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
retries: 3
sandbox:
image: ghcr.io/yai-dev/agentrail-sandbox:latest
# Pre-pulls the image so it is ready when first needed.
command: ["echo", "image ready"]
volumes:
agentrail-data:Data Directory
All persistent data lives under dataDir:
{dataDir}/
tenants/
{tenantId}/
sessions/ # conversation history and compaction archives
knowledge_bases/ # knowledge base indexes
users/ # user memory notes
skills/ # skill packagesIn production, always mount dataDir as a persistent volume. Data written here must survive container restarts.
Set the path via environment variable in your app:
const dataDir = process.env.AGENTRAIL_DATA_DIR ?? `${process.env.HOME}/.agentrail`;
const sessionStore = new SessionManager(dataDir);
const knowledgeManager = new KnowledgeManager(dataDir);
const sandboxManager = new SandboxManager(dataDir, { image: config.sandbox.image });Health Check
Add a health endpoint for load balancer and orchestrator probes:
app.get("/health", (c) => c.json({ status: "ok", uptime: process.uptime() }));Use separate liveness and readiness probes if your orchestrator supports them:
- Liveness: return
200if the process is running - Readiness: return
200only once startup tasks (e.g. image pre-pull) are complete
Logging
Agentrail does not bundle a logging library. Use the structured logger of your choice and instrument key lifecycle points:
import { createChatRoute } from "@agentrail/app/advanced";
app.route(
"/chat",
createChatRoute({
// ...
plugins: [
{
name: "request-logger",
onRequestStart(ctx) {
console.log(
JSON.stringify({
level: "info",
event: "request_start",
kind: ctx.kind,
agentId: ctx.agentId,
sessionId: ctx.sessionId,
tenantId: ctx.tenantId,
}),
);
},
onRequestEnd(ctx) {
console.log(
JSON.stringify({
level: "info",
event: "request_end",
sessionId: ctx.sessionId,
}),
);
},
},
],
}),
);Pipe the server's stdout/stderr to your logging infrastructure (CloudWatch, Datadog, GCP Logging, etc.).
Security
API Authentication
Protect /api/* routes with a Bearer token when the server is publicly accessible:
app.use("/api/*", async (c, next) => {
const token = process.env.UI_SECRET_TOKEN;
if (!token) return next(); // skip auth in dev
if (c.req.header("Authorization") !== `Bearer ${token}`) {
return c.json({ error: "Unauthorized" }, 401);
}
return next();
});CORS
If the API is consumed by a browser from a different origin, configure CORS explicitly:
import { cors } from "hono/cors";
app.use(
"/api/*",
cors({
origin: process.env.ALLOWED_ORIGIN ?? "http://localhost:5173",
allowMethods: ["GET", "POST", "DELETE"],
allowHeaders: ["Content-Type", "Authorization"],
exposeHeaders: ["X-Session-Id"],
}),
);Note: expose X-Session-Id so the browser client can read the session ID from stream responses.
Secrets Rotation
API keys can be rotated without restarting the server if you load them lazily (e.g. from a secrets manager) at request time rather than injecting them into a static AgentConfig at startup.
Sandbox
The sandbox requires Docker. See the Docker Compose section above for the socket mount.
Pin the sandbox image to a specific version tag in production:
environment:
- SANDBOX_IMAGE=ghcr.io/yai-dev/agentrail-sandbox:v1.2.3Set an idle timeout to automatically destroy containers that have been inactive:
const sandboxManager = new SandboxManager(dataDir, {
image: process.env.SANDBOX_IMAGE ?? "ghcr.io/yai-dev/agentrail-sandbox:latest",
idleTimeoutMs: 20 * 60 * 1000, // 20 minutes
});On graceful shutdown, destroy all running containers before exiting:
async function shutdown() {
await runPluginLifecycle(plugins, "stop");
await sandboxManager.destroyAll();
process.exit(0);
}
process.on("SIGTERM", () => void shutdown());
process.on("SIGINT", () => void shutdown());Horizontal Scaling
The default SessionManager writes to the local filesystem, which ties sessions to one server instance. For horizontal scaling:
Option 1 — Shared filesystem (NFS, EFS, etc.)
Mount the same dataDir path on all instances. This works for moderate scale with a small number of instances.
Option 2 — Custom AgentrailSessionStore (advanced)
Implement AgentrailSessionStore for cases where a shared filesystem is not available. The interface is designed so this swap requires no changes to route or profile code. See Build a Storage Backend for the full implementation guide.
See Configure Sessions for the custom store implementation guide.
Note: The sandbox creates Docker containers on the local host and cannot be shared across instances directly. For multi-instance deployments with sandbox support, consider routing requests for the same session to the same instance (sticky sessions at the load balancer), or using a remote Docker daemon.
Production Checklist
- [ ] API keys are set via environment variables, not in config files
- [ ]
AGENTRAIL_DATA_DIRpoints to a persistent volume - [ ] Docker socket is mounted for sandbox operations
- [ ]
SANDBOX_IMAGEpins a specific version tag (notlatest) - [ ]
UI_SECRET_TOKENis set if the server is publicly accessible - [ ] CORS is configured with an explicit
origin - [ ] A
/healthendpoint is registered - [ ] Graceful shutdown destroys sandbox containers
- [ ] Log output is captured by your logging infrastructure
- [ ] Idle sandbox timeout is tuned for your workload
- [ ] Session storage strategy is chosen for your scaling requirements