SentryML
F.lux on computer screen stitched

Photo: Paowee (CC BY-SA 4.0)

tooling

SmithDB and Five Other Things LangChain Shipped at Interrupt 2026

LangChain's Interrupt 2026 surfaced a purpose-built trace database, a context version-control system, and an automated failure-triage engine. What ML platform teams should act on first.

By SentryML Editorial · · 8 min read

LangChain wrapped Day 1 of Interrupt 2026 with a dense set of announcements anchored by SmithDB, a purpose-built trace database, and several adjacent additions to the LangSmith platform. If you run agents in production, at least three of these warrant a second look before your next oncall rotation.

The signal

The headliner is SmithDB — a new data layer under LangSmith, built in Rust with Apache DataFusion and the Vortex file toolkit. It is in production today: 100% of US Cloud ingestion and all tracing UI queries already run on it. Self-hosted deployment is coming.

The case for a dedicated database is straightforward once you have dealt with real agent traces. A tool-using agent doing a few-shot RAG loop can generate hundreds of nested spans per trace. Spans stay open for hours if the agent is waiting on a human interrupt. Payloads are multimodal — images, audio, large tool responses. Start and end events arrive out of order because async callbacks land minutes apart. General-purpose OLTP databases grind on this. Postgres was not designed for tree-aware queries over gigabyte traces, and the performance shows.

Mechanics: how SmithDB is built

SmithDB’s architecture is three layers: object storage for durable trace data, a small PostgreSQL metastore holding segment metadata, and stateless ingestion/query/compaction services that scale horizontally by adding compute. There is no local disk to shard; stateless services merge segments on read.

Under the hood it is an object-storage-backed log-structured merge tree (LSM). Writes buffer in memory and flush as immutable sorted batches. Periodic compaction collapses segments. Queries stream-merge multiple segments into an ordered result. This is the same structural bet that Snowflake and Delta Lake made on object storage over attached disks. The difference is the query engine — DataFusion plus Vortex instead of SQL over Parquet — optimized for the random-access and tree-traversal patterns that agent traces require.

Performance numbers from the announcement:

WorkloadP50P99
Trace tree load92 ms595 ms
Single run load71 ms358 ms
Runs filtering82 ms434 ms
Full-text search400 ms870 ms
Trace ingestion630 ms1.47 s

The 400 ms P50 for full-text search is the number that will matter at 3am when you need to grep across traces for a specific tool response string. The claimed headline speedup is up to 12x on core LangSmith operations versus the previous infrastructure.

The rest of the stack

SmithDB got the engineering attention it deserved, but four other announcements are operationally relevant.

Context Hub is a versioned store for the files that govern agent behavior: AGENTS.md, skill definitions, policies, reference materials. Think of it as Git for agent context, with environment tagging (dev/staging/prod), collaborative commenting, and CLI push/pull via langsmith hub push and langsmith hub pull. The practical problem it solves: when a production agent starts misbehaving, you have to grep across code repos and conversation logs to figure out which version of a system prompt was running. Context Hub pins that. It also lets non-engineers iterate on instructions without a deployment cycle, which shifts context ownership in ways your platform team will need to think through.

LangSmith Engine is an automated failure triage loop. It monitors production traces, clusters failures into named issues (not individual traces), reads your source code to diagnose root causes, and proposes fixes as pull requests, new evaluators, or additions to your regression dataset. It is in public beta. The claim is that it replaces the Monday-morning manual trace review most agent teams are doing today. Whether it delivers depends heavily on clustering quality — but the architecture is sound: failures as a signal source, evaluation dataset as the accumulator, PR as the output.

LLM Gateway adds runtime governance at the API call layer: hard spend caps, PII and secrets redaction on requests and responses, and policy event integration into the trace workflow. If you run multi-tenant agents with different spend budgets per customer or project, this is the guardrail you would otherwise build yourself. The redaction is bidirectional — it strips PII from what the model sees and from what gets logged, which matters if you are tracing user interactions in a regulated industry.

LangSmith Sandboxes hit general availability. They are hardware-virtualized microVM environments for code execution. The relevant feature for agent teams is snapshot/fork: you can checkpoint a sandbox mid-run and branch it for parallel evaluation paths. Authentication proxies inject credentials at the network layer, so secrets do not enter the agent runtime.

Operational takeaway

Three things worth adding to your runbook or platform backlog:

1. If you self-host LangSmith, track the SmithDB rollout. On-prem availability is listed as “coming soon.” The performance gap between SmithDB and the previous Postgres-backed stack is large enough that if you run high-volume agent workloads on self-hosted LangSmith, upgrading should be high priority. The object storage plus stateless architecture also makes it significantly easier to scale trace ingest without managing disk sharding.

2. Wire a drift alert on trace structure, not just content. SmithDB makes tree-aware queries practical at interactive latencies. That unlocks a class of monitoring that was too slow before: span count per trace, tool call cardinality per agent invocation, trace depth distribution over time. If your agent starts hallucinating tool use or gets stuck in a retry storm, those patterns show up in the trace structure before they show up in output quality metrics.

You can approximate this today with the LangSmith Python SDK against any project already on SmithDB:

from langsmith import Client
from datetime import datetime, timedelta
import statistics

client = Client()

def check_tool_call_cardinality(project_name: str, p99_threshold: int = 20):
    runs = client.list_runs(
        project_name=project_name,
        run_type="chain",
        start_time=datetime.utcnow() - timedelta(hours=1),
    )

    tool_counts = []
    for run in runs:
        children = list(client.list_runs(
            project_name=project_name,
            run_type="tool",
            trace_id=run.trace_id,
        ))
        tool_counts.append(len(children))

    if not tool_counts:
        return {}

    sorted_counts = sorted(tool_counts)
    p50 = statistics.median(sorted_counts)
    p99 = sorted_counts[int(len(sorted_counts) * 0.99)]

    if p99 > p99_threshold:
        print(f"[ALERT] tool-call p99={p99} exceeds threshold={p99_threshold}")

    return {"p50": p50, "p99": p99, "n": len(tool_counts)}

A normal agent run hits two to four tools. A jailbroken or looping agent hits forty. This is trivially detectable from span structure and almost no one has it wired up.

3. Evaluate Context Hub as a replacement for ad hoc system prompt versioning. Most teams today track system prompts in a git repo, a spreadsheet, or a Notion page — none of which have environment-aware deployment or rollback. Context Hub’s environment tagging and version pinning give you reproducible agent runs and a clear audit trail for “which prompt was live when this incident happened.”

Sources

Sources

  1. We built SmithDB, the data layer for agent observability
  2. Introducing LangSmith Context Hub
  3. Introducing LangSmith Engine
  4. Everything we shipped at Interrupt
#agent-observability #tracing #langsmith #mlops #infra
Subscribe

SentryML — in your inbox

ML observability & MLOps — model monitoring, drift detection, debugging in production. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments