Try it
cookbookintermediate7 min readUpdated 2026-06-04

A self-correcting agent loop with the RJ MCP server

Wire Claude Code into a closed loop that attributes its own failures, applies the suggested fix, locks a regression snapshot, and re-verifies — all without leaving the editor.

A self-correcting agent loop with the RJ MCP server

A normal coding agent fixes the test it can see. A self-correcting agent attributes the failure to its true root-cause span, fixes that, locks the failure in as a regression snapshot, and re-verifies before it commits — so the next change can't silently reintroduce the bug.

The RJ MCP server gives the agent three tools that close that loop:

  • rj.attribute_trace — turn a failing trace into a cited root-cause span + a suggestedFix
  • rj.suggest_snapshot — lock that attribution in as a regression baseline
  • rj.verify_change — replay the suite against the current code; block commit if anything regressed

This guide is the reference loop: the mcp.json config block, a copy-pasteable Claude Code prompt, and the exact tool-call sequence the agent runs on each iteration.

For setup details (Codex / Aider config, token handling, pitfalls) see Use RJ from Claude Code via MCP. This doc focuses only on the loop.


The loop in one diagram

   ┌──────────────────────────────────────────────────────────┐
   │ agent runs the code → a test fails (or a trace looks bad) │
   └──────────────────────────────────────────────────────────┘
                              │
                              ▼
                     rj.attribute_trace        ← which span actually caused it?
                              │
              { citedSpans, l1, l4, suggestedFix }
                              │
                  agent applies suggestedFix
                              │
                     rj.suggest_snapshot        ← lock the failure as a baseline
                              │
                       { snapshotId, suiteId }
                              │
                     rj.verify_change            ← did the fix hold, suite-wide?
                              │
            ┌─────────────────┼──────────────────┐
     changedUnexpected      else                 │
       > 0                   (passed)            │
            │                  │                  │
      loop again          git commit             │
   (re-attribute the   (the snapshot now         │
    new failing span)   guards this forever)     │

The single decision is changedUnexpected: if the verify step reports any unexpected change, the fix moved or reintroduced a failure — loop again on the new cited span. Otherwise the loop terminates and the agent commits.


Step 1: Register the MCP server

Add the server to your agent's MCP config. For Claude Code, create or edit mcp.json (project root) or ~/.config/claude-code/mcp_servers.json:

{
  "mcpServers": {
    "rj": {
      "command": "npx",
      "args": ["@runtime-judgement/mcp-server"],
      "env": {
        "RJ_API_URL": "https://app.runtimejudgement.com",
        "RJ_API_KEY": "rj_live_..."
      }
    }
  }
}

Restart the agent. Ask it "What MCP tools do you have?" — it should list rj.attribute_trace, rj.suggest_snapshot, and rj.verify_change.

Keep the key out of git. The env block is plaintext on disk. Add mcp.json to .gitignore, or inject RJ_API_KEY from your shell / a secrets manager at launch time instead of hardcoding it.


Step 2: The loop prompt

Paste this into Claude Code once. It defines the loop as a standing instruction the agent follows whenever a test fails:

You have the RJ MCP tools (rj.attribute_trace, rj.suggest_snapshot,
rj.verify_change). Whenever a test fails while you work, run this loop
instead of guessing at the fix:

1. Capture the trace of the failing run and the span id where the failure
   surfaced. Call rj.attribute_trace with:
     - trace:        the run's trace JSON
     - errorSpanId:  the span where it surfaced
     - errorDescription: one sentence on what went wrong
   Read citedSpans, l1, l4, and suggestedFix from the result.

2. Apply suggestedFix as a concrete code edit. If suggestedFix is too vague,
   use citedSpans to locate the real cause and fix that span's code — not the
   symptom downstream.

3. Lock the failure in: call rj.suggest_snapshot with the attributionId from
   step 1 and a descriptive name. Add it to my suite (suiteName) so it's
   replayed on every future verify. Keep the returned suiteId.

4. Re-verify: call rj.verify_change with that suiteId.
     - If changedUnexpected > 0: the fix didn't hold. Take the regressed
       outcome's cited span and GO BACK TO STEP 1 for that span.
     - If changedUnexpected == 0 (verdict pass or drift): stop and show me
       the verdict. Only then propose the commit.

Never commit while changedUnexpected > 0. Show me each tool's result as you
go — don't summarize the counts, paste them.

Step 3: What each tool call looks like

rj.attribute_trace

// in:
{
  trace: <raw trace JSON>,
  errorSpanId: "orchestrate-tool-selection",
  errorDescription: "orchestrator selected web-search instead of internal-kb"
}
// out:
{
  attributionId: "01HZATTR...",      // → feeds rj.suggest_snapshot
  l1: { axis: "semantic", confidence: 0.87 },
  l4: { category: "Tool Selection Errors", confidence: 0.81 },
  citedSpans: ["orchestrate-tool-selection"],
  suggestedFix: "Refresh tool descriptions on every invocation rather than caching them across sessions."
}

The agent reads suggestedFix and edits the code. citedSpans is the fallback when suggestedFix is too abstract to apply directly.

rj.suggest_snapshot

// in:
{
  attributionId: "01HZATTR...",
  name: "orchestrator picks web-search over internal-kb — stale tool descriptions",
  suiteName: "self-correcting-loop"
}
// out:
{
  snapshotId: "01HZSNAP...",
  suiteId: "01HZSUITE...",            // → feeds rj.verify_change
  nextStep: "Run rj.verify_change against suiteId to confirm the fix holds."
}

nextStep literally points the agent at the verify call — the loop is self-describing.

rj.verify_change

// in:
{ suiteId: "01HZSUITE..." }
// out (the decision point):
{
  total: 6,
  passed: 6,
  changedUnexpected: 0,              // 0 → commit;  > 0 → loop again
  changedIntentional: 0,
  outcomes: [ /* per-snapshot status + citedSpanIds when regressed */ ]
}

changedUnexpected == 0 is the green light. If it's > 0, the regressed entry in outcomes carries the new cited span — that span becomes the errorSpanId for the next rj.attribute_trace call, and the loop repeats.


Why this is better than "just fix the test"

  • It fixes the cause, not the symptom. rj.attribute_trace tells the agent which span is actually responsible, so the patch lands upstream instead of papering over the surfacing span.
  • Every fix becomes a permanent guard. rj.suggest_snapshot turns the one-off bug into a snapshot that the suite replays forever — the same suite your GitHub Actions gate runs on every PR.
  • The loop terminates on evidence, not vibes. changedUnexpected is a hard signal. The agent doesn't decide it's "probably fixed" — it re-runs the suite and reads the count.

What next

Related articles

Try it