What's an AI-native agent page?

When you click into an agent at /app/agents/[id], you land on a page composed of six sections and held together by ten principles. This is not a dashboard with widgets — every surface is generated from current state, every claim is cited, and every read has a write equivalent in YAML, CLI, and MCP.

This cookbook walks both — the six sections, then the ten principles — so the rest of the product feels coherent rather than incidental.

The two ADRs behind the design:

dev-docs/strategy/2026-05-26-adr-agent-page-ia-evolution.md — why six sections, not five tabs
dev-docs/strategy/2026-05-26-adr-ai-native-principles.md — the ten principles every section follows

The six sections

The per-agent page is Overview · Tests · Suites · Runs · Pipeline · Settings. Each is a task-shaped destination — a job-to-be-done — not a noun-shaped tab.

Overview

"Tell me where this agent stands and what to do next — in one screen."

Composed of: judge headline (P1) · failure clusters (P2) · suggested actions (P4) · lifecycle ring (status dial) · drift forecast · activity stream.

This is the landing tab. If you only check one section a day, check this one.

Tests

"Curate the regression library. Capture what matters; retire what's stale."

Composed of: AI-suggested captures · topic-clustered groups (P2 with fallback) · per-test health ring · test history as narrative (not table) · NL filter · YAML round-trip.

A test in RJ is a snapshot — an "expected" verdict on a recorded trace. The Tests tab is where the regression library lives. See Build a snapshot suite for regression gating for the capture flow.

Suites

"Define what regressions are unacceptable in which context. Keep them in shape."

Composed of: policy badges (MUST-PASS / ALERT / INFORMATIONAL) · AI composition suggestions · coverage map · per-suite SLO · suite-vs-suite diff with time-travel · YAML round-trip.

Suites are groups of tests with a gating policy. One suite per logical component, one policy per suite.

Runs

"See every suite run. Understand verdicts. Drill the cause. Replay if needed."

Composed of: live streaming (P5) · pre-emptive verdict prediction · why-diff vs last passing run · run clustering (P2) · counterfactual replay (P6) · cited evidence per test.

Every PR, every scheduled run, every replay shows up here. See Counterfactual replay and pipeline comparison for the time-travel surface.

Pipeline

"Choose and tune the judge. Compare configs on real traces."

Composed of: visual graph editor · versioning with diff · two-pipeline replay on history (P6) · cost-quality frontier · AI-tuned candidates · per-test sensitivity for pending switches.

Per-agent pipeline override with workspace-level default. The cost-quality frontier shows you which pipeline is the cheapest you can use without regressing.

Settings

"Identity, policies, integrations, access — without ceremony."

Composed of: NL command bar + declarative YAML (bidirectional) · policy-as-code blocks · integrations with live last-event · access with usage-aware right-sizing · audit trail (P10).

See Configure with natural language or YAML for the NL/YAML round-trip.

The ten principles

Every section ships against the same ten contracts. Each principle names a concrete shipped primitive (component, route, or convention) — they are not stylistic.

P1 — RJ writes the headline. Every section opens with a judge-authored 1-line summary that regenerates on state change. Not "92% pass rate" — "cx-support is regressing on 2 of 14 tests since Cursor PR #243, 12 min ago." Primitive: <JudgeHeadline> + /api/judge/headline.

P2 — Cluster, don't list. Auto-group by root cause / topic / pattern, never alphabetically. Threshold = 5 by default; below threshold a fallback banner explains why clustering didn't engage. Primitive: clusterByCause() in lib/clustering/.

P3 — Cite or it didn't happen. Every claim, verdict, summary, or suggestion links to the span, trace, commit, prompt diff, or run that justifies it. Enforced by an eslint rule — <Verdict> and <JudgeHeadline> must contain at least one <Citation>.

P4 — Anticipate the next action. Buttons are generated from state, not menu items. Show "Reject PR #243", not "Open PR queue". Max 3 suggestions per surface, each with confidence + a reasoning expander. Primitive: <SuggestedActions> + /api/suggestions.

P5 — Stream, don't poll. Live runs, live drift, live verdicts. SSE infrastructure at /api/runs/{id}/stream, /api/agents/{id}/drift/stream, /api/agents/{id}/activity/stream. Never make the user wait for a refresh.

P6 — Time-travel by default. Every view has an "as of" slider. Replay any test against any historical pipeline / prompt / commit. Default scope is the current section. Primitive: <AsOfSlider> + 90-day full-fidelity retention.

P7 — Composable via chat. ⌘+/ from any section opens a global chat panel scoped to the current section + selection. The chat can filter, diff, export, and trigger — not just answer. Primitive: <ChatPanel> singleton with auto-injected section context.

P8 — Programmable in both directions. UI ↔ CLI ↔ YAML ↔ MCP. Same operations, multiple surfaces. If you can click it, you can curl it. The CLI (Use the rj CLI) and MCP server (Use RJ from Claude Code via MCP) wrap the same primitives the UI uses.

P9 — Confidence-aware. Every prediction / cluster / suggestion shows confidence. User correction (👍 / 👎 / pin / override) POSTs feedback that RJ visibly learns from on next render. The confidence value is capped at 0.99 — no false certainty.

P10 — Agent-as-collaborator, not tool. RJ takes small autonomous actions (auto-capture suggestions, auto-cluster grouping) with full audit trail. Larger actions are proposed with one-click commit. Per-action permission tiers (auto-allowed / propose-only / read-only) gate what's auto vs proposed vs read-only. Primitive: audit_log table (migration 0029) + <ProposedAction> component.

What this rules out

The principles are a contract. A PR that violates one requires an explicit waiver in the description. Specifically, the following are rejected by default:

Static metric cards as primary device (fails P1)
Alphabetical or strictly-chronological default lists for failures (fails P2)
Tooltips claiming explanations without cited evidence (fails P3)
Generic ... menus where state-specific actions could be generated (fails P4)
Refresh-to-see-new-data UX (fails P5)
"Latest only" views with no historical lookback (fails P6)
Chat as a separate page rather than a global panel (fails P7)
UI-only operations with no CLI / YAML / MCP equivalent (fails P8)
Predictions or suggestions with no confidence display (fails P9)
Mystery autonomy — actions taken without audit, or with no permission tier (fails P10)

If you're contributing UI to RJ and a principle blocks the change you want to make, write up the case in your PR description and a maintainer will sign off (or push back). The waiver process is the principles' release valve — not their loophole.

How to read the rest of the cookbook

Every other cookbook article maps to one or two principles + one or two sections:

Bridge LangSmith traces into RJ → feeds the Runs section, uses P5 streaming for live ingestion
Build a snapshot suite for regression gating → lives in Tests + Suites, uses P8 YAML round-trip
Counterfactual replay and pipeline comparison → lives in Runs, the canonical example of P6 time-travel
Configure with natural language or YAML → lives in Settings, is the canonical example of P7 + P8
Use the rj CLI → P8 in terminal form
Use RJ from Claude Code via MCP → P8 in MCP form
Wire RJ into GitHub Actions → gates the Runs section's PR-triggered runs

Next steps

Configure your first agent: Configure with natural language or YAML
Lock in your first regression baseline: Build a snapshot suite for regression gating
The full IA ADR: dev-docs/strategy/2026-05-26-adr-agent-page-ia-evolution.md — Overview · Tests · Suites · Runs · Pipeline · Settings
The full principles ADR: dev-docs/strategy/2026-05-26-adr-ai-native-principles.md — P1 through P10

What's an AI-native agent page?

What's an AI-native agent page?

The six sections

Overview

Tests

Suites

Runs

Pipeline

Settings

The ten principles

What this rules out

How to read the rest of the cookbook

Next steps

Related articles

Try it