Try it
cookbookintermediate10 min readUpdated 2026-05-27

Automate with MCP write tools

I want my coding agent (Cursor / Devin / Codex / Claude Code) to capture failures as tests programmatically without going through the UI. The four MCP write tools — snapshots.capture, runs.trigger, settings.update, cluster.dismiss — close that loop.

Automate with MCP write tools

Sprint 14 shipped 4 MCP read tools. Sprint 15 ships 4 MCP write tools, gated by the per-tool permission tier system from P10 of the AI-native principles ADR. The promise is: if you can click it, you can curl it — your coding agent now has the same write surface a human has via the UI, with every mutation logged to audit_log and surfaced in the Settings tab.

The four write tools:

ToolWhat it doesRequired tier
snapshots.captureCapture a trace as a regression-test snapshotpropose-only
runs.triggerQueue a suite runpropose-only
settings.updateApply a YAML diff to an agent's settingspropose-only
cluster.dismissMark a failure cluster as dismissedpropose-only

Permission tiers

Every write tool dispatch threads a CallerContext through lib/mcp/permission-tiers.ts. Three tiers, ordered least → most restrictive:

  • auto-allowed — the action runs immediately without human approval. Reads sit here, plus RJ's self-maintenance (clustering, headline regen). No user-visible writes are tagged auto-allowed in v1.
  • propose-only — the action MAY run but produces a proposal the user has to one-click commit (or the call writes an audit row that the UI exposes as a <ProposedAction> card). All 4 Sprint 15 write tools sit here.
  • read-only — the tool is registered but the dispatcher rejects the call with a permission-denied JSON-RPC error (-32001). Used for "block this tool entirely for this caller" (e.g. a read-only API token or demo-mode session).

A caller may invoke a tool when rank(callerTier) <= rank(toolTier). In v1, every authenticated user defaults to propose-only for writes. Sprint 16+ will pull per-token / per-org tiers from a dedicated table.


snapshots.capture — capture a trace as a test

The typical pattern: your agent runs the suite, sees a CHANGED-UNEXPECTED verdict on a new trace, asks the user to confirm, then calls snapshots.capture to lock the trace as a baseline.

curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
  -H "Authorization: Bearer $RJ_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "snapshots.capture",
      "arguments": {
        "agentId": "01HZAGENT...",
        "traceId": "01HZTRACE...",
        "name": "Refund flow — empty result on order #243",
        "suite": "refunds-must-pass",
        "verdictCriterion": "search-tool must not return empty for valid order ids"
      }
    }
  }' | jq '.result.structuredContent'

Response (success):

{
  "ok": true,
  "snapshotId": "01HZSNAP...",
  "auditId": "01HZAUDIT..."
}

lib/mcp/tools/snapshots-capture.ts resolves the trace's most recent attribution server-side, hands off to createSnapshot(), and writes an audit_log row with action snapshot_captured. verdictCriterion is stashed on the audit payload (not on snapshots — the schema asserts on cited spans + L1/L4, which are structural) so the human-readable criterion surfaces in the audit-log UI without affecting verdict computation.


runs.trigger — queue a suite run

curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
  -H "Authorization: Bearer $RJ_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "runs.trigger",
      "arguments": {
        "agentId": "01HZAGENT...",
        "suiteId": "01HZSUITE...",
        "triggerKind": "manual",
        "triggerLabel": "post-cursor-PR-243"
      }
    }
  }' | jq '.result.structuredContent'

Response:

{
  "ok": true,
  "runId": "01HZRUN...",
  "auditId": "01HZAUDIT..."
}

lib/mcp/tools/runs-trigger.ts inserts a single snapshot_runs row with verdict='PENDING' (the value added in migration 0030) and writes a run_triggered audit entry. The orchestrator (Sprint 15) picks up the audit row, fans the placeholder out across every snapshot in the suite, then stamps the final verdict.

Poll the run with runs.get (or stream via /api/runs/[id]/stream) until verdict !== "PENDING".


settings.update — apply a YAML diff

The same YAML round-trip the Settings page uses, exposed as an MCP write. Validates via validateSettingsYaml(), then delegates to saveAgentSettings() which writes its own audit_log row.

YAML='agent:
  id: 01HZAGENT...
  name: cx-support
policies:
  pr_gate_required: true
  slo_floor: 0.95
integrations:
  slack:
    channels:
      - "#cx-alerts"
'

curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
  -H "Authorization: Bearer $RJ_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d "$(jq -nc --arg yaml "$YAML" '{
    jsonrpc: "2.0", id: 1,
    method: "tools/call",
    params: { name: "settings.update", arguments: { agentId: "01HZAGENT...", yaml: $yaml } }
  }')" | jq '.result.structuredContent'

Response includes the parsed AgentSettings shape + the audit id. saveAgentSettings() already writes the settings_updated audit row (with the full from/to diff and tier: propose-only); the MCP wrapper just propagates the audit id rather than double-logging.


cluster.dismiss — dismiss a failure cluster

curl -s -X POST https://runtime-judgement-app.vercel.app/api/mcp \
  -H "Authorization: Bearer $RJ_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0", "id": 1,
    "method": "tools/call",
    "params": {
      "name": "cluster.dismiss",
      "arguments": {
        "clusterId": "01HZCLUSTER...",
        "reason": "Same root cause as the PII redaction work in PR #244 — already resolved upstream."
      }
    }
  }' | jq '.result.structuredContent'

Marks the row dismissed (the cluster stays for audit; loadActiveClusters() filters it out). reason is optional free-text — not persisted on failure_clusters but stamped on the audit payload so the dismissed-with-reason history is queryable. Idempotent: a second dismiss is a no-op on the row but still writes a fresh audit entry so you can see every dismiss attempt.


Audit log

Every write tool dispatch creates an audit_log row (migration 0029) tagged with:

  • actionsnapshot_captured · run_triggered · settings_updated · cluster_dismissed
  • tier — the policy-required tier at dispatch time (propose-only for all 4 v1 writes)
  • actor — the Clerk user id of the caller
  • payload_jsonb — the structured input + any auxiliary context (e.g. verdictCriterion, reason, the YAML from/to diff)

The Settings tab's <AuditTimeline> (/app/agents/[id]/settings) renders these in reverse-chrono with one row per action. From the API: GET /api/agents/[id]/audit returns the same list as JSON.


Common pattern: Cursor PR → trigger run → capture on regress

// Pseudo-code for a Cursor extension hook
async function onPrOpened(prMeta) {
  const { runId } = await rj.mcp.call("runs.trigger", {
    agentId: prMeta.rjAgentId,
    suiteId: prMeta.gatedSuiteId,
    triggerKind: "manual",
    triggerLabel: `cursor-pr-${prMeta.number}`,
  })

  // Poll until done (or stream via /api/runs/[id]/stream)
  const summary = await pollRunUntilFinal(runId)

  if (summary.verdict === "regression") {
    const traceId = summary.perTest.find(t => t.verdict === "CHANGED-UNEXPECTED").traceId
    const confirmed = await askUser("RJ regressed on a new failure pattern. Capture as test?")
    if (confirmed) {
      await rj.mcp.call("snapshots.capture", {
        agentId: prMeta.rjAgentId,
        traceId,
        name: `Regression from PR #${prMeta.number}`,
        suite: prMeta.gatedSuiteId,
      })
    }
  }
}

The audit log captures the full sequence — run_triggeredsnapshot_captured — so the user can reconstruct what the agent did even if they weren't watching.


Pitfalls

-32001 permission_denied

Your caller tier is more restrictive than the tool requires. In v1, this almost always means the caller is read-only — check the token's tier (or the route's [MCP-TIER-DOWNGRADE] marker in app/api/mcp/route.ts). Read tools work; writes need a propose-only-or-better caller.

-32000 agent not found

Ownership is enforced at every write tool (getAgentForUser() or isOwner()). An agentId you don't own returns the same error as a missing agent so the API doesn't leak the existence-vs-ownership distinction. Confirm with rj agents list or agents.list.

Settings update fails silently in the UI

settings.update returns the parsed AgentSettings on success — if you see the response but the Settings page hasn't refreshed, the headline cache may still be warm. The route invalidates it on every commit but a stale tab might show the old YAML; reload to confirm.

runs.trigger returns immediately but no run shows up

The tool only registers the request and writes the run_triggered audit row — the orchestrator picks up the audit log and does the actual fan-out asynchronously. Watch audit_log (or the Settings audit timeline) for the matching entry; if it's there but no snapshot_runs rows materialise, the orchestrator is stuck. Sprint 15+'s orchestrator publishes structured logs to lib/observability/.


Next steps

Related articles

Try it