Stele
A new category · Provenance for AI agents

The system of context for your AI agents.

Capture every input, every retrieval, every tool call, every output — the full context behind every agent decision. In your own warehouse. Replayable on demand. The substrate beneath audit, debugging, reproduction, erasure, and bias detection.

agent.py
PYTHON
from stele.sdk import Tracer
from stele.writer import IcebergWriter

writer = IcebergWriter.from_env()
tracer = Tracer(sink=writer, agent_id="sales-bot")

with tracer.trace(input_text=user_msg) as t:
    docs = retriever.invoke(user_msg)
    t.record_retrieval(source="kb", documents=docs)
    response = llm.complete(prompt(docs, user_msg))
    t.set_output(response.text)
Every input, retrieval, and output becomes a queryable row.+ ~10 lines
The questions you'll be asked

Pick any agent decision from yesterday. Now answer:

Five questions that come from a board review, a regulator letter, or a Monday-morning incident. Each one needs a row, not a guess.

Q1
What context did the agent see — and was that context the same yesterday as it is now?
Q2
Which model produced this output? At what temperature? With what prompt template version?
Q3
Would the new model you're about to deploy have answered differently?
Q4
A regulator asked for the audit log of every decision this agent made on data we deprecated last week. Where do you start?
Q5
A customer asked for their data to be erased. Can you prove it's gone — including from every captured trace?
What changes

Every decision becomes a queryable row in your warehouse.

For every agent invocation we capture the full context the model saw — the user input, every retrieval (with the source-data version pinned), every tool call (with response hash), the prompt and parameters, the model output, and any post-hoc outcome — into structured tables in your own warehouse. Replayable from any point.

The answers stop being “we'd have to dig” and start being one SQL query.

Every agent decision
Every data lookup
Every tool invoked
Every model output
Every outcome
Every replay
Every divergence
Every PII finding
What this unlocks

Every role reaches into the same warehouse for different answers.

Every item below ships today. Each one is a SQL query, a CLI command, or a one-line oc invocation against the warehouse the SDK has been writing to.

AI / ML engineering
Reproduce, attribute, test before rollout.
  • Reproduce any production decision.Same inputs, same model params, same tool responses (looked up by response_hash), same source data (read AS-OF the captured Iceberg snapshot).
  • Per-call replay for agentic loops.Target a specific intermediate model call inside a think → tool → think → respond loop with --call-index N.
  • Counterfactual divergence attribution.Replay one trace with each differing layer reverted to another's value; the layer whose reversion converges the output gets the attribution weight.
  • Test new model / prompt against last week's real production traces — before rollout.A/B replay across multiple LLM providers (OpenAI, Anthropic, any OpenAI-protocol endpoint).
Compliance + Legal
  • Per-decision audit reports.6 regulator templates: EU AI Act, HIPAA, SOX, GDPR, NIST RMF, plain. Markdown or PDF. Optional PKCS#7 signature.
  • Subject erasure with signed receipt.GDPR Article 17 / FTC consent orders. Row-level deletes + signed JSON in 4 minutes.
  • Disparate-impact analysis.EEOC 4/5ths, demographic parity, equalized odds. Demographics never enter our substrate.
Finance + FinOps
  • Per-trace cost attribution.Token usage (input, output, cache-read, cache-write, reasoning) captured per trace. Roll up by agent, team, customer.
  • Cost-quality Pareto.Replay against a cheaper model; quantify the quality delta via equivalence_to_source. Find the substitution that maintains quality.
Product + Quality
  • Catch hallucinated tool use.Find traces where the model claimed tool use that didn't happen. One SQL query against agent.tool_calls.
  • Outcome capture, closing the loop.Record post-hoc signals (regression, incident, customer complaint, audit finding). Aggregate over time to find quality drift.
Security + Procurement
  • BYO Iceberg catalog.Use your existing lakehouse — Polaris, AWS S3 Tables, Databricks Unity, Snowflake. Iceberg REST protocol; any compliant catalog.
  • Zero sub-processor agreements.Trace data never crosses your VPC boundary. Smaller security review surface; no DPA / SCC / BAA to negotiate for trace content.
Marquee use case — for your AI quality team

The question observability cannot answer.

Production agents routinely produce confident output citing data they never looked up and tool calls they never invoked. The page-of-prose answer reads correct; the receipts behind it don't exist.

Observability dashboards log a “successful” trace with output tokens and a green status. They count events — they don't read what the events contain, and they don't compare them against the work the model actually performed. With a system of context, the work performed is one row away. Count the tool calls actually captured for the trace; the answer is zero.

Observability vendors structurally cannot run this query against your operational warehouse — they don't write to your warehouse, only theirs.

Tool calls actually captured
SELECT trace_id
FROM agent.traces t
WHERE NOT EXISTS (
  SELECT 1 FROM agent.tool_calls
  WHERE trace_id = t.trace_id
);
What's coming next

Sorted into what's actually new, what's an extension, and what's operational.

No frozen promises. The split below is honest about which items already have a working version today and which are net-new in the warehouse.

01
Extending what ships today
Q3 2026

Embedder + Tool layer reversibility

Counterfactual attribution covers 5 of 7 layers cleanly today. EMBEDDER and TOOL reversal close the gap, completing the seven-layer attribution surface.

Q4 2026

Real-time anomaly alerting

Webhook / Slack alert when a hallucination flag or output anomaly fires. Today the same query runs on demand against your warehouse.

Q4 2026

Cost-quality Pareto, packaged

Doable today via replay + SQL. Coming: a single command that returns “here's the model substitution that holds your quality at lower cost.”

02
New capabilities in development
Q3 2026

Reasoning content capture + drift detection

Capture the model's reasoning text alongside the answer. Detect when reasoning style drifts at the cluster level. None of the agent-observability vendors do this.

Q1 2027

Web console

Trace search, divergence reports, audit-report viewer, cost dashboards. The operator surface today is the CLI; the console reads from the same warehouse.

03
Operational maturity
In progress

SOC 2 Type II attestation

Most of our compliance surface is structural — trace data never leaves your VPC — but enterprise procurement still asks for the badge. We'll publish the attestation date when the audit window opens.

Honest answers

Questions a security review will actually open with.

Including the answers that say not yet. Your DPO will reach for these on the first call.

Into your Apache Iceberg tables, in your S3 bucket, in your AWS account. The Stele SDK runs in your agent's process; the writer talks to your Iceberg REST catalog (Apache Polaris, AWS S3 Tables, Databricks Unity, Snowflake-managed Iceberg). Trace content never crosses your VPC boundary.

Make every agent decision queryable.

30 minutes. We'll walk through the architecture, the compliance surface, and how a pilot would land in your VPC.

Read docs