Automate with MCP write tools
I want my coding agent (Cursor / Devin / Codex / Claude Code) to capture failures as tests programmatically without going through the UI. The four MCP write tools — snapshots.capture, runs.trigger, settings.update, cluster.dismiss — close that loop.
Automate with MCP write tools
Sprint 14 shipped 4 MCP read tools. Sprint 15 ships 4 MCP write tools, gated by the per-tool permission tier system from P10 of the AI-native principles ADR. The promise is: if you can click it, you can curl it — your coding agent now has the same write surface a human has via the UI, with every mutation logged to audit_log and surfaced in the Settings tab.
The four write tools:
| Tool | What it does | Required tier |
|---|---|---|
snapshots.capture | Capture a trace as a regression-test snapshot | propose-only |
runs.trigger | Queue a suite run | propose-only |
settings.update | Apply a YAML diff to an agent's settings | propose-only |
cluster.dismiss | Mark a failure cluster as dismissed | propose-only |
Permission tiers
Every write tool dispatch threads a CallerContext through lib/mcp/permission-tiers.ts. Three tiers, ordered least → most restrictive:
auto-allowed— the action runs immediately without human approval. Reads sit here, plus RJ's self-maintenance (clustering, headline regen). No user-visible writes are taggedauto-allowedin v1.propose-only— the action MAY run but produces a proposal the user has to one-click commit (or the call writes an audit row that the UI exposes as a<ProposedAction>card). All 4 Sprint 15 write tools sit here.read-only— the tool is registered but the dispatcher rejects the call with a permission-denied JSON-RPC error (-32001). Used for "block this tool entirely for this caller" (e.g. a read-only API token or demo-mode session).
A caller may invoke a tool when rank(callerTier) <= rank(toolTier). In v1, every authenticated user defaults to propose-only for writes. Sprint 16+ will pull per-token / per-org tiers from a dedicated table.
snapshots.capture — capture a trace as a test
The typical pattern: your agent runs the suite, sees a CHANGED-UNEXPECTED verdict on a new trace, asks the user to confirm, then calls snapshots.capture to lock the trace as a baseline.
curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
-H "Authorization: Bearer $RJ_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "snapshots.capture",
"arguments": {
"agentId": "01HZAGENT...",
"traceId": "01HZTRACE...",
"name": "Refund flow — empty result on order #243",
"suite": "refunds-must-pass",
"verdictCriterion": "search-tool must not return empty for valid order ids"
}
}
}' | jq '.result.structuredContent'
Response (success):
{
"ok": true,
"snapshotId": "01HZSNAP...",
"auditId": "01HZAUDIT..."
}
lib/mcp/tools/snapshots-capture.ts resolves the trace's most recent attribution server-side, hands off to createSnapshot(), and writes an audit_log row with action snapshot_captured. verdictCriterion is stashed on the audit payload (not on snapshots — the schema asserts on cited spans + L1/L4, which are structural) so the human-readable criterion surfaces in the audit-log UI without affecting verdict computation.
runs.trigger — queue a suite run
curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
-H "Authorization: Bearer $RJ_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "runs.trigger",
"arguments": {
"agentId": "01HZAGENT...",
"suiteId": "01HZSUITE...",
"triggerKind": "manual",
"triggerLabel": "post-cursor-PR-243"
}
}
}' | jq '.result.structuredContent'
Response:
{
"ok": true,
"runId": "01HZRUN...",
"auditId": "01HZAUDIT..."
}
lib/mcp/tools/runs-trigger.ts inserts a single snapshot_runs row with verdict='PENDING' (the value added in migration 0030) and writes a run_triggered audit entry. The orchestrator (Sprint 15) picks up the audit row, fans the placeholder out across every snapshot in the suite, then stamps the final verdict.
Poll the run with runs.get (or stream via /api/runs/[id]/stream) until verdict !== "PENDING".
settings.update — apply a YAML diff
The same YAML round-trip the Settings page uses, exposed as an MCP write. Validates via validateSettingsYaml(), then delegates to saveAgentSettings() which writes its own audit_log row.
YAML='agent:
id: 01HZAGENT...
name: cx-support
policies:
pr_gate_required: true
slo_floor: 0.95
integrations:
slack:
channels:
- "#cx-alerts"
'
curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
-H "Authorization: Bearer $RJ_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d "$(jq -nc --arg yaml "$YAML" '{
jsonrpc: "2.0", id: 1,
method: "tools/call",
params: { name: "settings.update", arguments: { agentId: "01HZAGENT...", yaml: $yaml } }
}')" | jq '.result.structuredContent'
Response includes the parsed AgentSettings shape + the audit id. saveAgentSettings() already writes the settings_updated audit row (with the full from/to diff and tier: propose-only); the MCP wrapper just propagates the audit id rather than double-logging.
cluster.dismiss — dismiss a failure cluster
curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
-H "Authorization: Bearer $RJ_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0", "id": 1,
"method": "tools/call",
"params": {
"name": "cluster.dismiss",
"arguments": {
"clusterId": "01HZCLUSTER...",
"reason": "Same root cause as the PII redaction work in PR #244 — already resolved upstream."
}
}
}' | jq '.result.structuredContent'
Marks the row dismissed (the cluster stays for audit; loadActiveClusters() filters it out). reason is optional free-text — not persisted on failure_clusters but stamped on the audit payload so the dismissed-with-reason history is queryable. Idempotent: a second dismiss is a no-op on the row but still writes a fresh audit entry so you can see every dismiss attempt.
Audit log
Every write tool dispatch creates an audit_log row (migration 0029) tagged with:
action—snapshot_captured·run_triggered·settings_updated·cluster_dismissedtier— the policy-required tier at dispatch time (propose-onlyfor all 4 v1 writes)actor— the Clerk user id of the callerpayload_jsonb— the structured input + any auxiliary context (e.g.verdictCriterion,reason, the YAML from/to diff)
The Settings tab's <AuditTimeline> (/app/agents/[id]/settings) renders these in reverse-chrono with one row per action. From the API: GET /api/agents/[id]/audit returns the same list as JSON.
Common pattern: Cursor PR → trigger run → capture on regress
// Pseudo-code for a Cursor extension hook
async function onPrOpened(prMeta) {
const { runId } = await rj.mcp.call("runs.trigger", {
agentId: prMeta.rjAgentId,
suiteId: prMeta.gatedSuiteId,
triggerKind: "manual",
triggerLabel: `cursor-pr-${prMeta.number}`,
})
// Poll until done (or stream via /api/runs/[id]/stream)
const summary = await pollRunUntilFinal(runId)
if (summary.verdict === "regression") {
const traceId = summary.perTest.find(t => t.verdict === "CHANGED-UNEXPECTED").traceId
const confirmed = await askUser("RJ regressed on a new failure pattern. Capture as test?")
if (confirmed) {
await rj.mcp.call("snapshots.capture", {
agentId: prMeta.rjAgentId,
traceId,
name: `Regression from PR #${prMeta.number}`,
suite: prMeta.gatedSuiteId,
})
}
}
}
The audit log captures the full sequence — run_triggered → snapshot_captured — so the user can reconstruct what the agent did even if they weren't watching.
Pitfalls
-32001 permission_denied
Your caller tier is more restrictive than the tool requires. In v1, this almost always means the caller is read-only — check the token's tier (or the route's [MCP-TIER-DOWNGRADE] marker in app/api/mcp/route.ts). Read tools work; writes need a propose-only-or-better caller.
-32000 agent not found
Ownership is enforced at every write tool (getAgentForUser() or isOwner()). An agentId you don't own returns the same error as a missing agent so the API doesn't leak the existence-vs-ownership distinction. Confirm with rj agents list or agents.list.
Settings update fails silently in the UI
settings.update returns the parsed AgentSettings on success — if you see the response but the Settings page hasn't refreshed, the headline cache may still be warm. The route invalidates it on every commit but a stale tab might show the old YAML; reload to confirm.
runs.trigger returns immediately but no run shows up
The tool only registers the request and writes the run_triggered audit row — the orchestrator picks up the audit log and does the actual fan-out asynchronously. Watch audit_log (or the Settings audit timeline) for the matching entry; if it's there but no snapshot_runs rows materialise, the orchestrator is stuck. Sprint 15+'s orchestrator publishes structured logs to lib/observability/.
Next steps
- Read tools — the other half of the MCP surface: Use RJ from Claude Code via MCP
- Wire your repo so PR webhooks resolve to one agent: Connect your repo to an agent
- Authenticate the CLI without copy-pasting tokens: Log the rj CLI in
- The P10 principle that motivates the tier system:
dev-docs/strategy/2026-05-26-adr-ai-native-principles.md(Agent-as-collaborator, not tool)