A self-correcting agent loop with the RJ MCP server
Wire Claude Code into a closed loop that attributes its own failures, applies the suggested fix, locks a regression snapshot, and re-verifies — all without leaving the editor.
A self-correcting agent loop with the RJ MCP server
A normal coding agent fixes the test it can see. A self-correcting agent attributes the failure to its true root-cause span, fixes that, locks the failure in as a regression snapshot, and re-verifies before it commits — so the next change can't silently reintroduce the bug.
The RJ MCP server gives the agent three tools that close that loop:
rj.attribute_trace— turn a failing trace into a cited root-cause span + asuggestedFixrj.suggest_snapshot— lock that attribution in as a regression baselinerj.verify_change— replay the suite against the current code; block commit if anything regressed
This guide is the reference loop: the mcp.json config block, a
copy-pasteable Claude Code prompt, and the exact tool-call sequence the agent
runs on each iteration.
For setup details (Codex / Aider config, token handling, pitfalls) see Use RJ from Claude Code via MCP. This doc focuses only on the loop.
The loop in one diagram
┌──────────────────────────────────────────────────────────┐
│ agent runs the code → a test fails (or a trace looks bad) │
└──────────────────────────────────────────────────────────┘
│
▼
rj.attribute_trace ← which span actually caused it?
│
{ citedSpans, l1, l4, suggestedFix }
│
agent applies suggestedFix
│
rj.suggest_snapshot ← lock the failure as a baseline
│
{ snapshotId, suiteId }
│
rj.verify_change ← did the fix hold, suite-wide?
│
┌─────────────────┼──────────────────┐
changedUnexpected else │
> 0 (passed) │
│ │ │
loop again git commit │
(re-attribute the (the snapshot now │
new failing span) guards this forever) │
The single decision is changedUnexpected: if the verify step reports any
unexpected change, the fix moved or reintroduced a failure — loop again on the
new cited span. Otherwise the loop terminates and the agent commits.
Step 1: Register the MCP server
Add the server to your agent's MCP config. For Claude Code, create or edit
mcp.json (project root) or ~/.config/claude-code/mcp_servers.json:
{
"mcpServers": {
"rj": {
"command": "npx",
"args": ["@runtime-judgement/mcp-server"],
"env": {
"RJ_API_URL": "https://app.runtimejudgement.com",
"RJ_API_KEY": "rj_live_..."
}
}
}
}
Restart the agent. Ask it "What MCP tools do you have?" — it should list
rj.attribute_trace, rj.suggest_snapshot, and rj.verify_change.
Keep the key out of git. The
envblock is plaintext on disk. Addmcp.jsonto.gitignore, or injectRJ_API_KEYfrom your shell / a secrets manager at launch time instead of hardcoding it.
Step 2: The loop prompt
Paste this into Claude Code once. It defines the loop as a standing instruction the agent follows whenever a test fails:
You have the RJ MCP tools (rj.attribute_trace, rj.suggest_snapshot,
rj.verify_change). Whenever a test fails while you work, run this loop
instead of guessing at the fix:
1. Capture the trace of the failing run and the span id where the failure
surfaced. Call rj.attribute_trace with:
- trace: the run's trace JSON
- errorSpanId: the span where it surfaced
- errorDescription: one sentence on what went wrong
Read citedSpans, l1, l4, and suggestedFix from the result.
2. Apply suggestedFix as a concrete code edit. If suggestedFix is too vague,
use citedSpans to locate the real cause and fix that span's code — not the
symptom downstream.
3. Lock the failure in: call rj.suggest_snapshot with the attributionId from
step 1 and a descriptive name. Add it to my suite (suiteName) so it's
replayed on every future verify. Keep the returned suiteId.
4. Re-verify: call rj.verify_change with that suiteId.
- If changedUnexpected > 0: the fix didn't hold. Take the regressed
outcome's cited span and GO BACK TO STEP 1 for that span.
- If changedUnexpected == 0 (verdict pass or drift): stop and show me
the verdict. Only then propose the commit.
Never commit while changedUnexpected > 0. Show me each tool's result as you
go — don't summarize the counts, paste them.
Step 3: What each tool call looks like
rj.attribute_trace
// in:
{
trace: <raw trace JSON>,
errorSpanId: "orchestrate-tool-selection",
errorDescription: "orchestrator selected web-search instead of internal-kb"
}
// out:
{
attributionId: "01HZATTR...", // → feeds rj.suggest_snapshot
l1: { axis: "semantic", confidence: 0.87 },
l4: { category: "Tool Selection Errors", confidence: 0.81 },
citedSpans: ["orchestrate-tool-selection"],
suggestedFix: "Refresh tool descriptions on every invocation rather than caching them across sessions."
}
The agent reads suggestedFix and edits the code. citedSpans is the
fallback when suggestedFix is too abstract to apply directly.
rj.suggest_snapshot
// in:
{
attributionId: "01HZATTR...",
name: "orchestrator picks web-search over internal-kb — stale tool descriptions",
suiteName: "self-correcting-loop"
}
// out:
{
snapshotId: "01HZSNAP...",
suiteId: "01HZSUITE...", // → feeds rj.verify_change
nextStep: "Run rj.verify_change against suiteId to confirm the fix holds."
}
nextStep literally points the agent at the verify call — the loop is
self-describing.
rj.verify_change
// in:
{ suiteId: "01HZSUITE..." }
// out (the decision point):
{
total: 6,
passed: 6,
changedUnexpected: 0, // 0 → commit; > 0 → loop again
changedIntentional: 0,
outcomes: [ /* per-snapshot status + citedSpanIds when regressed */ ]
}
changedUnexpected == 0 is the green light. If it's > 0, the regressed
entry in outcomes carries the new cited span — that span becomes the
errorSpanId for the next rj.attribute_trace call, and the loop repeats.
Why this is better than "just fix the test"
- It fixes the cause, not the symptom.
rj.attribute_tracetells the agent which span is actually responsible, so the patch lands upstream instead of papering over the surfacing span. - Every fix becomes a permanent guard.
rj.suggest_snapshotturns the one-off bug into a snapshot that the suite replays forever — the same suite your GitHub Actions gate runs on every PR. - The loop terminates on evidence, not vibes.
changedUnexpectedis a hard signal. The agent doesn't decide it's "probably fixed" — it re-runs the suite and reads the count.
What next
- Gate the same suite in CI: the
suiteIdfromrj.suggest_snapshotis exactly what the GitHub Actions gate runs — the inner loop and the PR gate share one baseline. - Full MCP setup + pitfalls: Use RJ from Claude Code via MCP
- Design the suite the loop builds: Build a snapshot suite for regression gating