Troubleshooting
Common issues and how to resolve them.
LLM Provider Errors
"Provider not found" or ProviderNotFoundError
You forgot to register the built-in providers. Add this import before any agent code runs:
import "@agentrail/runtime-core/providers";This registers both the Anthropic and OpenAI providers.
"401 Unauthorized" or "Invalid API key"
- Check that the correct environment variable is set (
ANTHROPIC_API_KEYorOPENAI_API_KEY). - Make sure the key is not expired or revoked.
- If using a custom
baseUrl, confirm the endpoint accepts your key format.
Wrong model or provider
The model.provider field must match a registered provider name exactly:
"anthropic"— for Claude models"openai"— for GPT models (and compatible APIs)
Check for typos. The error message will list available providers.
Sandbox Issues
"Cannot connect to Docker daemon"
The sandbox requires a running Docker daemon. Verify:
docker infoIf Docker is not running, start it before launching the server.
"Image not found" or slow first start
The sandbox image must be pulled or built locally:
# Pull the published image
docker pull ghcr.io/yai-dev/agentrail-sandbox:latest
# Or build locally
docker build -t agentrail-sandbox:latest docker/sandboxThen make sure sandbox.image in your config matches the image name.
Sandbox container exits immediately
Check the container logs:
docker logs <container-id>Common causes:
- Port conflict (another process on the same port)
- Missing environment variables inside the container
- Insufficient memory allocated to Docker
Session and Storage Issues
Corrupted session (parse errors on startup)
If session.jsonl or messages.jsonl is corrupted (e.g. partial write during a crash), you have two options:
- Delete the session — remove the session directory under
<dataDir>/tenants/<tenantId>/sessions/<sessionId>/ - Manual repair — open the JSONL file, find the malformed line (usually the last one), and remove it
"Unable to find config/agentrail.yaml"
The config loader searches upward from the current working directory. Make sure you are running the server from the project root, or set the AGENTRAIL_CONFIG_PATH environment variable to an absolute path.
Orchestration Issues
Sub-agent never responds
Check these settings in agentrail.yaml:
orchestration:
subagent:
pollIntervalMs: 500 # lower = faster polling, higher CPU
fakeExecution: "" # must be empty for real executionIf fakeExecution is set to "echo", the sub-agent will echo inputs without calling the LLM.
Stale orchestration state after code changes
If you changed orchestration logic but the manager replays old events on startup, the state may be inconsistent. For development, delete the session directory to start fresh:
rm -rf <dataDir>/tenants/<tenantId>/sessions/<sessionId>/Wait condition never resolves
- Verify the target
agentIdsare correct - Check
kind: use"agent-idle"to wait for job completion,"agent-closed"to wait for termination - If using
match: "all", all listed agents must satisfy the condition - Check
timeoutAtif you set one — the wait resolves withtimed_outstatus when the deadline passes
Build and Development Issues
TypeScript errors after pulling changes
pnpm install
pnpm build:packages
pnpm typecheckThe workspace uses project references. A fresh build resolves most type resolution issues.
Tests fail with "module not found"
Make sure packages are built before running tests:
pnpm build:packages && pnpm testStill Stuck?
If none of the above helps:
- Search existing issues
- Open a bug report with reproduction steps