Try it
cookbookintermediate7 min readUpdated 2026-05-20

Bridge LangSmith traces into RJ for attribution

Pull failed runs from a LangSmith project, attribute root causes in RJ, and write verdicts back as feedback — zero code change.

Bridge LangSmith traces into RJ for attribution

LangSmith tells you that a run failed. ProveAI Origin tells you which span caused it and why, using span-level attribution against a fixed vocabulary of 14 failure categories.

The @runtime-judgement/rj-langsmith bridge connects the two in a single command: it pulls recent traces from a LangSmith project, normalises them to OTEL gen-ai semconv, ingests them into RJ, runs your snapshot suite, and writes the verdict back to LangSmith as feedback — so the attribution shows up in your existing LangSmith dashboards without any pipeline change.

Read What are traces and why do they matter for background on how RJ models spans and verdicts before continuing.


Step 1: Install the bridge

npm install @runtime-judgement/rj-langsmith
# or
pnpm add @runtime-judgement/rj-langsmith

What you should see: The package installs with zero peer dependencies. If you see a peer-dependency warning for @langchain/core or similar, you're using a conflicting package.json in a monorepo — install in a standalone directory or use --ignore-peer-deps.


Step 2: Set the environment

You need four values. Get them from the two dashboards:

# LangSmith — from https://smith.langchain.com/settings
export LANGSMITH_API_KEY="lsv2_pt_..."

# ProveAI Origin — from https://runtime-judgement-app.vercel.app/app/settings
export RJ_API_URL="https://runtime-judgement-app.vercel.app"
export RJ_API_KEY="rj_live_..."

# The RJ snapshot suite to run on each cycle
export RJ_SUITE_ID="01HZ..."

LANGSMITH_API_URL is optional and defaults to https://api.smith.langchain.com. Set it if you're on a self-hosted LangSmith deployment.

Generate an rj_live_ token from the API Keys section on the settings page. Tokens are shown once — copy the value before closing the dialog.

Put these in a .env file for local use and load them with dotenv or direnv. For production cron jobs (Inngest, Trigger.dev, Vercel Cron), set them as environment secrets in whatever job runner you use — treat them the same as a database password.


Step 3: Run your first cycle

npx rj-langsmith run --project my-agent --since 24h

Replace my-agent with the project name exactly as it appears in LangSmith.

What you should see on stdout:

rj-langsmith: ingested=12 passed=10 changed=2 feedback=12 elapsed=4318ms

If ingested=0, LangSmith found no traces with errors in the last 24 hours — that's fine. Try --since 7d to widen the window, or verify the project name.


What the cycle does

The command executes six steps automatically:

  1. Query LangSmith for every root run in the named project with status=error or a non-empty error field, within the --since window.

  2. Hydrate each trace's full run graph — root run plus all descendants linked by parent_run_id.

  3. Normalise each run into an OTEL gen-ai semconv span. The key mappings are:

    LangSmith fieldOTEL field
    run.idspan_id
    run.parent_run_idparent_span_id
    run.namespan_name
    run.errorstatus_message
    run_type=llm extra.invocation_paramsgen_ai.request.{model,temperature}
    run_type=tool name/inputs/outputstool.name/tool.parameters/tool.output
  4. POST each normalised trace to /api/traces. RJ deduplicates by SHA-256 hash per user, so re-running the same window is safe.

  5. Trigger your snapshot suite via POST /api/snapshot-suites/{suite-id}/run.

  6. Write one LangSmith feedback row per ingested trace with key runtime_judgement.<suite-id> and score=1.0 so the verdict appears in LangSmith's eval views.


Step 4: Watch attributions land

Open /app/history in RJ. Each ingested trace appears as an attribution with:

  • An L1 axis (execution or semantic)
  • An L4 category (one of 14 TRAIL labels, e.g. Tool-related, Context Handling Failures)
  • A cited span — the specific run the judge pinpointed as the cause
  • An explanation and suggested fix

Click any attribution to see the cited evidence quote and the full span DAG that led to the verdict.

What you should see: Attributions grouped by the trace IDs from LangSmith. If you see traces without L4 categories, the judge returned low confidence — increase precision by switching the pipeline preset to standard or consistent (set via POST /api/attributions with "pipeline": "consistent"; the CLI currently uses the default preset).


Step 5: Set up continuous ingest

Run the bridge on a schedule. Here's a Vercel Cron job definition:

{
  "crons": [
    {
      "path": "/api/cron/rj-langsmith-cycle",
      "schedule": "0 * * * *"
    }
  ]
}

And the route handler:

// app/api/cron/rj-langsmith-cycle/route.ts
import { LangSmithBridge } from "@runtime-judgement/rj-langsmith"

export async function GET() {
  const bridge = new LangSmithBridge({
    langsmithApiKey: process.env.LANGSMITH_API_KEY!,
    rjApiUrl: process.env.RJ_API_URL!,
    rjApiKey: process.env.RJ_API_KEY!,
    rjSuiteId: process.env.RJ_SUITE_ID!,
  })

  const summary = await bridge.cycle({
    project: "my-agent",
    since: "1h",
    limit: 100,
  })

  return Response.json(summary)
}

The LangSmithBridge class also exposes a decomposed API if you need to interleave steps with your own logic:

// Pull and ingest traces, then run the suite separately
const ingested = await bridge.pullAndIngest({ project: "my-agent", since: "1h" })
const run = await bridge.runSnapshotSuite()
const feedback = await bridge.writeBackFeedback(run.suiteRunId)

Common pitfalls

Span name conflicts

If multiple runs share the same name in LangSmith (e.g. "LLMChain" appearing dozens of times), the RJ compressor has less signal to distinguish the causal span. Rename your chains and tools to be unique and descriptive — "search-tool" beats "tool", "rerank-chain" beats "chain".

Missing parent_run_id

LangSmith traces are a flat list of runs linked by parent_run_id. If a run has parent_run_id: null and is not the root run, the normalizer cannot reconstruct the DAG correctly — RJ will attribute against a flattened tree and accuracy degrades. Verify your LangChain instrumentation sets parent_run_id on every child run. The run.trace_id field (the root run's id) is used as a fallback when parent_run_id is missing, but it's lossy.

OTEL-vs-LangSmith schema differences

OTEL gen-ai semconv tracks LLM calls with gen_ai.request.model and token counts under gen_ai.usage.*. LangSmith embeds these inside extra.invocation_params and {prompt,completion,total}_tokens. The normalizer maps the 80% case automatically. If your traces use a custom model wrapper that doesn't populate invocation_params, token counts will be missing from the attributed span — attribution still works, but the judge has less context on model-level signals.

Tool calls inside outputs

OpenAI-shaped tool_calls arrays embedded inside an llm run's outputs.generations are stringified verbatim today (v0.1 gap). They're captured in the OTEL span but not split into separate child spans. If your failure pattern is a malformed tool call, attribution will still work — the evidence quote will be inside the stringified gen_ai.completion attribute. A future parser update will split them into first-class spans.

Large traces with inputs_s3_url

If LangSmith stores your trace payloads in S3 (used by LangSmith Hub for large runs), the normalizer surfaces the S3 URL as an attribute but does not auto-fetch it. Pre-hydrate those runs before ingesting:

const runs = await bridge.pullAndIngest({ project: "my-agent", since: "24h" })
// bridge.pullAndIngest does NOT fetch S3 payloads — do that yourself
// before calling the bridge if your runs use inputs_s3_url.

CLI reference

rj-langsmith run --project <name> --since <duration|ISO> [--suite <id>] [--limit <n>]
FlagEnv varDefaultNotes
--projectLangSmith project name (required)
--suiteRJ_SUITE_IDSnapshot suite ULID
--since24h24h, 7d, 30m, or ISO 8601
--limit100Max traces per cycle (hard cap: 1000)

Exit codes: 0 = success, 1 = config error, 2 = runtime error.


What next

Related articles

Try it