SmithDB and Five Other Things LangChain Shipped at Interrupt 2026

LangChain wrapped Day 1 of Interrupt 2026 with a dense set of announcements anchored by SmithDB, a purpose-built trace database, and several adjacent additions to the LangSmith platform. If you run agents in production, at least three of these warrant a second look before your next oncall rotation. The throughline is the one we keep coming back to: the agent authority gap is an observability problem, and better trace infrastructure is what makes that observability real.

The signal

The headliner is SmithDB — a new data layer under LangSmith, built in Rust with Apache DataFusion and the Vortex file toolkit. It is in production today: 100% of US Cloud ingestion and all tracing UI queries already run on it. Self-hosted deployment is coming.

The case for a dedicated database is straightforward once you have dealt with real agent traces. A tool-using agent doing a few-shot RAG loop can generate hundreds of nested spans per trace. Spans stay open for hours if the agent is waiting on a human interrupt. Payloads are multimodal — images, audio, large tool responses. Start and end events arrive out of order because async callbacks land minutes apart. General-purpose OLTP databases grind on this. Postgres was not designed for tree-aware queries over gigabyte traces, and the performance shows.

Mechanics: how SmithDB is built

SmithDB’s architecture is three layers: object storage for durable trace data, a small PostgreSQL metastore holding segment metadata, and stateless ingestion/query/compaction services that scale horizontally by adding compute. There is no local disk to shard; stateless services merge segments on read.

Under the hood it is an object-storage-backed log-structured merge tree ↗ (LSM). Writes buffer in memory and flush as immutable sorted batches. Periodic compaction collapses segments. Queries stream-merge multiple segments into an ordered result. This is the same structural bet that Snowflake and Delta Lake made on object storage over attached disks. The difference is the query engine — DataFusion plus Vortex instead of SQL over Parquet — optimized for the random-access and tree-traversal patterns that agent traces require.

Performance numbers from the announcement:

Workload	P50	P99
Trace tree load	92 ms	595 ms
Single run load	71 ms	358 ms
Runs filtering	82 ms	434 ms
Full-text search	400 ms	870 ms
Trace ingestion	630 ms	1.47 s

The 400 ms P50 for full-text search is the number that will matter at 3am when you need to grep across traces for a specific tool response string. The claimed headline speedup is up to 12x on core LangSmith operations versus the previous infrastructure.

The rest of the stack

SmithDB got the engineering attention it deserved, but four other announcements are operationally relevant.

Context Hub is a versioned store for the files that govern agent behavior: AGENTS.md, skill definitions, policies, reference materials. Think of it as Git for agent context, with environment tagging (dev/staging/prod), collaborative commenting, and CLI push/pull via langsmith hub push and langsmith hub pull. The practical problem it solves: when a production agent starts misbehaving, you have to grep across code repos and conversation logs to figure out which version of a system prompt was running. Context Hub pins that. It also lets non-engineers iterate on instructions without a deployment cycle, which shifts context ownership in ways your platform team will need to think through.

LangSmith Engine is an automated failure triage loop. It monitors production traces, clusters failures into named issues (not individual traces), reads your source code to diagnose root causes, and proposes fixes as pull requests, new evaluators, or additions to your regression dataset. It is in public beta. The claim is that it replaces the Monday-morning manual trace review most agent teams are doing today. Whether it delivers depends heavily on clustering quality — but the architecture is sound: failures as a signal source, evaluation dataset as the accumulator, PR as the output.

LLM Gateway adds runtime governance at the API call layer: hard spend caps, PII and secrets redaction on requests and responses, and policy event integration into the trace workflow. If you run multi-tenant agents with different spend budgets per customer or project, this is the guardrail you would otherwise build yourself. The redaction is bidirectional — it strips PII from what the model sees and from what gets logged, which matters if you are tracing user interactions in a regulated industry.

LangSmith Sandboxes hit general availability. They are hardware-virtualized microVM environments for code execution. The relevant feature for agent teams is snapshot/fork: you can checkpoint a sandbox mid-run and branch it for parallel evaluation paths. Authentication proxies inject credentials at the network layer, so secrets do not enter the agent runtime.

Operational takeaway

Three things worth adding to your runbook or platform backlog:

1. If you self-host LangSmith, track the SmithDB rollout. On-prem availability is listed as “coming soon.” The performance gap between SmithDB and the previous Postgres-backed stack is large enough that if you run high-volume agent workloads on self-hosted LangSmith, upgrading should be high priority. The object storage plus stateless architecture also makes it significantly easier to scale trace ingest without managing disk sharding.

2. Wire a drift alert on trace structure, not just content. SmithDB makes tree-aware queries practical at interactive latencies. That unlocks a class of monitoring that was too slow before: span count per trace, tool call cardinality per agent invocation, trace depth distribution over time. If your agent starts hallucinating tool use or gets stuck in a retry storm, those patterns show up in the trace structure before they show up in output quality metrics. This is structural drift, a sibling of the input and prediction drift in our drift, data-quality, and model-decay metrics taxonomy.

You can approximate this today with the LangSmith Python SDK against any project already on SmithDB:

from langsmith import Client
from datetime import datetime, timedelta
import statistics

client = Client()

def check_tool_call_cardinality(project_name: str, p99_threshold: int = 20):
    runs = client.list_runs(
        project_name=project_name,
        run_type="chain",
        start_time=datetime.utcnow() - timedelta(hours=1),
    )

    tool_counts = []
    for run in runs:
        children = list(client.list_runs(
            project_name=project_name,
            run_type="tool",
            trace_id=run.trace_id,
        ))
        tool_counts.append(len(children))

    if not tool_counts:
        return {}

    sorted_counts = sorted(tool_counts)
    p50 = statistics.median(sorted_counts)
    p99 = sorted_counts[int(len(sorted_counts) * 0.99)]

    if p99 > p99_threshold:
        print(f"[ALERT] tool-call p99={p99} exceeds threshold={p99_threshold}")

    return {"p50": p50, "p99": p99, "n": len(tool_counts)}

A normal agent run hits two to four tools. A jailbroken or looping agent hits forty. This is trivially detectable from span structure and almost no one has it wired up. If you would rather build on an open standard than a vendor schema, the same span-level signal maps onto the OpenTelemetry GenAI semantic conventions, and a sudden tool-call spike is exactly the kind of anomaly the MITRE ATLAS detection-engineering runbook turns into an alert.

3. Evaluate Context Hub as a replacement for ad hoc system prompt versioning. Most teams today track system prompts in a git repo, a spreadsheet, or a Notion page — none of which have environment-aware deployment or rollback. Context Hub’s environment tagging and version pinning give you reproducible agent runs and a clear audit trail for “which prompt was live when this incident happened.”

Sources

We built SmithDB, the data layer for agent observability ↗ — LangChain engineering blog with full architecture details, LSM design rationale, and the complete performance benchmark table.
Introducing LangSmith Context Hub ↗ — Announcement and technical overview of the versioned context management system, including CLI integration and environment tagging.
Introducing LangSmith Engine ↗ — Details on the automated failure clustering, root cause analysis, and PR-drafting system currently in public beta.
Everything we shipped at Interrupt ↗ — Full Day 1 summary covering Sandboxes GA, LLM Gateway, and Managed Deep Agents, authored by Jacob Talbot.

SmithDB and Five Other Things LangChain Shipped at Interrupt 2026

The signal

Mechanics: how SmithDB is built

The rest of the stack

Operational takeaway

Sources

Sources

SentryML — in your inbox

Related

The Agent Authority Gap Is an Observability Problem

Federated Learning in Production: What Substra Actually Does for Privacy-Preserving ML

ML Model Monitoring Tools & Frameworks (2026)

Comments