Wire RJ into GitHub Actions as a PR gate
Add ProveAI Origin as a CI check that blocks merges when a snapshot suite detects an unexpected regression.
Wire RJ into GitHub Actions as a PR gate
Every PR check you add is a promise: this thing you changed hasn't broken the behaviour we locked in. ProveAI Origin's GitHub Action makes that promise about your agent's failure patterns — it runs your snapshot suite on every PR and exits non-zero the moment a snapshot's verdict shifts unexpectedly.
By the end of this guide you'll have a workflow that:
- Triggers on every pull request to
main - Runs your snapshot suite against the live RJ pipeline
- Prints a verdict summary to the Actions log
- Blocks merge on
regressionverdicts, passes onpassanddrift
Before you begin
Read What are traces and why do they matter if you're not sure what a snapshot suite is — it explains the vocabulary (spans, verdicts, L1/L4 labels) that the output below refers to.
You also need at least one suite to gate on. If you don't have one yet, follow Build a snapshot suite for regression gating first — that guide covers the full paste-to-snapshot-to-suite flow in ~15 minutes.
Step 1: Generate an API token
Open /app/settings and generate a token. Tokens are prefixed rj_live_.
What you should see: A one-time reveal of the token string, e.g. rj_live_01HZABC.... Copy it immediately — it won't be shown again.
Tokens appear under API Keys on the settings page. They are prefixed rj_live_ and shown once — copy it before closing the dialog.
Step 2: Add the token as a repo secret
In your GitHub repository:
- Go to Settings → Secrets and variables → Actions
- Click New repository secret
- Name:
RJ_API_TOKEN - Value: the
rj_live_*token you just copied
What you should see: The secret appears in the list as RJ_API_TOKEN with a masked value. Never inline the token in the workflow YAML.
Step 3: Find your suite ID
Open /app/snapshot-suites and click the suite you want to gate on. The URL becomes:
https://runtime-judgement-app.vercel.app/app/snapshot-suites/01HZSUITE...
The long alphanumeric string at the end is the suite ULID. Copy it.
What you should see: A page showing the suite name, the list of snapshots inside it, and previous run history. If the suite has no snapshots, the Action will exit with verdict=empty (non-zero) — add at least one snapshot before gating.
Step 4: Add the workflow file
Create .github/workflows/runtime-judgement.yml in your repo. This is adapted from the sample at .github/workflows/runtime-judgement-ci.sample.yml in the RJ source:
name: ProveAI Origin gate
on:
pull_request:
branches: [main]
permissions:
contents: read
jobs:
rj-gate:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout
uses: actions/checkout@v4
- name: ProveAI Origin suite run
id: rj
uses: runtime-judgement/rj-action@v0
with:
suite-id: 01HZSUITE_REPLACE_WITH_YOURS
rj-api-token: ${{ secrets.RJ_API_TOKEN }}
- name: Print verdict
if: always()
run: |
echo "Verdict: ${{ steps.rj.outputs.verdict }}"
echo "Total: ${{ steps.rj.outputs.total }}"
echo "Passed: ${{ steps.rj.outputs.passed }}"
echo "Changed-intentional: ${{ steps.rj.outputs.changed-intentional }}"
echo "Changed-unexpected: ${{ steps.rj.outputs.changed-unexpected }}"
echo "Errored: ${{ steps.rj.outputs.errored }}"
echo "Run IDs: ${{ steps.rj.outputs.run-ids }}"
Replace 01HZSUITE_REPLACE_WITH_YOURS with the ULID from Step 3.
Versioning:
runtime-judgement/rj-action@v0is the published marketplace coordinate (a floating major tag — it tracks the latest0.x). Pin to@v0.1.0if you want an immutable reference.
Step 5: Configure inputs
The action accepts five inputs. Two are required; the rest are optional:
| Input | Required | Default | When to change it |
|---|---|---|---|
suite-id | Yes | — | Set once; matches your suite ULID |
rj-api-token | Yes | — | Always via ${{ secrets.RJ_API_TOKEN }} |
rj-base-url | No | https://runtime-judgement-app.vercel.app | Self-hosted or staging deployments |
fail-on-unexpected | No | true | Set to "false" for advisory-only runs |
timeout-ms | No | 300000 (5 min) | Reduce for fast suites; increase for large ones |
Advisory-only mode (fail-on-unexpected: "false") is useful when you're first adding the gate — it logs the verdict without blocking merge, so you can observe what the suite catches before enforcing it.
Step 6: Understand the verdict
The action sets a verdict output and exits accordingly:
| Verdict | Meaning | Exit code |
|---|---|---|
pass | Every snapshot matched its baseline | 0 |
drift | Some snapshots changed, but the judge ruled the change intentional | 0 |
regression | At least one snapshot changed unexpectedly — something broke | 1 |
error | At least one snapshot errored (infra failure, not a regression) | 1 |
empty | Suite has no snapshots — misconfigured suite-id | 1 |
What you should see when the check runs: In the Actions log for the "Print verdict" step, you'll see something like:
Verdict: pass
Total: 4
Passed: 4
Changed-intentional: 0
Changed-unexpected: 0
Errored: 0
Run IDs: 01HZRUN1...,01HZRUN2...,01HZRUN3...,01HZRUN4...
On a regression the "ProveAI Origin suite run" step turns red, and changed-unexpected will be non-zero. Use the run IDs in the output to deep-link directly to /app/snapshot-suites and inspect which snapshot shifted.
Troubleshooting
Token missing or invalid
Error: HTTP 401 — unauthorized
Check that the secret is named exactly RJ_API_TOKEN (case-sensitive) and that the token value starts with rj_live_. Re-generate the token from /app/settings if you've lost it — old tokens are revoked on regeneration.
Suite not found
verdict: empty
or
Error: HTTP 404 — suite not found
Confirm the suite ULID in the workflow matches the one in the URL at /app/snapshot-suites. ULIDs are case-sensitive. The empty verdict also fires if the suite exists but has no snapshots — add at least one snapshot from the snapshot guide.
Timeout
Error: suite run exceeded timeout (300000ms)
The default 5-minute timeout covers suites up to ~60 snapshots at Standard pipeline. Increase timeout-ms for larger suites, or split the suite into multiple smaller ones and run them in parallel jobs.
False positive (verdict=regression on a change you intended)
Accept the new run as the baseline. Open /app/snapshot-suites, click into the run that flagged, find the snapshot that changed, and click Accept to roll the baseline forward. The next CI run will pass. This is the designed workflow: a regression flag is a conversation, not always a block.
Alternatively, use fail-on-unexpected: "false" temporarily while you review whether the change is intentional, then re-enable once the baseline is updated.
What next
- Build the suite the gate runs against: Build a snapshot suite for regression gating
- Use RJ directly inside your coding agent: Use RJ from Claude Code via MCP — the same verification loop, without leaving your editor
- Understand why spans fail the way they do: Common failure modes in multi-agent systems