The Data Context Layer: What AI Agents Need to Understand Your Data

Imagine asking an AI agent: “What happens downstream if I change the join key in stg_subscriptions?”

A good code agent (Claude Code, Codex, any of the current crop) will read your downstream SQL files, trace the ref(), and tell you it’s used by int_subscription_periods and fct_arr_reporting_monthly. If you push it, it might even reason about the join structure.

But that’s not what you actually need to know. What you need to know is that the downstream fact table’s grain is per month per subscription, that the join key flows through to the GROUP BY, that arr is a SUM() aggregation, which means a grain change would silently inflate your most important revenue metric. You need to know that this metric feeds the board-level ARR dashboard, and that the LEFT JOIN could fan out with a different key, producing duplicate rows that look like legitimate data.

The code agent’s answer isn’t wrong, it’s just not complete. The gap between what agents can derive on the fly and what they need to know to be trustworthy is the central challenge of applying AI for data work.

The agents we have today

AI agents are entering the data stack in force. Teams are deploying them for analytics, debugging and writing transformations, and automating BI workflows. The trajectory is clear: agents will become core operators of data systems, not just query tools.

The capabilities are genuinely impressive. Point Claude Code or Codex at a dbt project and watch it parse SQL, Jinja, and YAML configurations, understand file dependencies, and refactor models across multiple files in one pass. The natural next step is to plug in database access via MCP servers and give the same agent the ability to query your warehouse directly. Now it can inspect schemas, check row counts, retrieve execution history, and run diagnostic queries against live data.

Now these agents have code understanding and database access. You’d think data engineering would be solved.

The re-derivation problem

Even with database access, a code agent’s understanding of your data system is ad-hoc and ephemeral. Every time you ask a question, it re-reads the files, re-parses the SQL, and re-derives whatever understanding it needs. That analysis is probabilistic (it might miss a lineage path or hallucinate one), limited by context window (500 models won’t fit in a single conversation), and gone the moment the session ends.

For a single model, this works fine. For understanding how a change propagates through 50 interconnected models, across source systems and into BI reports, it breaks down. The agent would need to read every SQL file, re-parse every join, re-derive every grain, and re-trace every lineage path for every question, each time. No amount of model intelligence compensates for doing this from scratch each session.

The database access doesn’t fill the gap the way you’d expect. It gives the agent the current state of your warehouse (row counts, column types, query results) but not the intent behind it. Ask it why fct_pipeline row count spiked 40% yesterday and it will report that the count went from 12,400 to 17,360, perhaps noting that “more data is flowing in.”

The real answer is that someone added a substage column upstream, the join went from 1:1 to 1:many, and the model’s grain changed. It’s not a data spike, it’s a schema evolution that redefined what each row represents. But diagnosing that requires knowing the model’s grain, seeing upstream lineage, understanding join cardinality, and correlating with a deployment that happened at 2am. None of this context lives in the code or the database.

The agent understands code but starts from scratch every time, and it can see data but it doesn’t truly understand what the data means. It has no persistent knowledge of how data flows, what it means, or how it behaves across all of the different data systems.

What data engineering actually requires

The examples above aren’t edge cases. They represent the everyday work of data engineering: understanding impact, diagnosing root causes, and making changes safely in systems where everything is connected.

Doing this well requires answering four questions:

How does data flow? Not just which files reference which, but actual data flow at the column level, from source systems through streaming platforms, through transformations to the reports that consume the output.
What does it mean? The dataset identity of every model, meaning what uniquely identifies each row. Which columns are measures and what aggregation they use. It needs to know which columns are dimensions and what the business logic is doing and why it’s structured the way it is.
How does it behave? Runtime behavior: job execution patterns, failure modes, performance characteristics. Observed behavior linked back to the models that produce it, so you can correlate a Monday morning failure with the upstream change that caused it.
Why is it built this way? The logical intent behind each model, separated from the physical reality of how it’s materialized. A single dbt model might produce tables in dev, staging, and production. Understanding the difference between “the logic is wrong” and “the prod table is stale” requires keeping these layers distinct.

No amount of ad-hoc SQL parsing or database querying answers these questions reliably. They require a versioned, cross-platform understanding of the entire system. One that’s maintained and updated as the system evolves, and available instantly when an agent needs it.

The data context layer

This is what we’ve spent the past year building at Typedef: a data context layer that provides agents with the system-level understanding they need to operate effectively.

Context isn’t something you extract in a single pass. It’s layered, where each phase of analysis builds on the one before it, adding a different kind of understanding. It starts with parsing every SQL model to build column-level lineage from the AST. Then it grounds that against warehouse reality, because a dbt project says a model exists, but the warehouse shows what it actually looks like. These are different things, and keeping them distinct is what lets an agent tell “the logic is wrong” from “the infrastructure is broken.”

Then the graph extends beyond transformations entirely. Source systems upstream and consumption tools downstream are stitched into the same structure. A single traversal can trace impact from a Salesforce field change through ingestion, through dbt, all the way to every dashboard and reverse sync that consumes the output.

Only after all this groundwork can you do the hardest part: propagating understanding through the graph. What uniquely identifies each row transforms through every join, aggregation, and filter. Understanding a model's identity structure requires understanding every model upstream of it. This is the kind of analysis that can't be done ad-hoc in a context window. It has to be computed across the full graph, maintained as the system evolves, and available instantly when an agent needs to answer a question.

The result is that the join key question from the opening becomes answerable, instantly and correctly. The agent traverses pre-computed lineage, checks grain and aggregation types, identifies the fan-out risk, and connects the impact to the dashboards that consume the output. All from persistent, structured context that was computed once and maintained incrementally.

Measuring the difference

We built a rigorous evaluation framework comprised of dozens of real data tasks drawn from production projects, each run multiple times across five difficulty levels in isolated environments with zero information leakage.

The results:

	Task Success Rate
Baseline: Coding agents + data access, no context layer	46.4%
Typedef Agents: with data context layer	81.5%

The only variable here is context. The data context layer nearly doubled the per-attempt pass rate and added 35 percentage points to overall task success. It doesn’t make the model smarter. It gives the model the information it needs to apply its intelligence effectively.

Principles that generalize

Beyond the specifics of our implementation, we’ve found several principles that apply broadly to making agents effective in data systems.

Deterministic analysis first, LLM where needed. Structural analysis like joins, column lineage, and grain propagation can be extracted deterministically from SQL. Reserving LLM inference for what genuinely requires judgment (business semantics, intent classification) makes the system faster, cheaper, and more reliable.
Separate logical intent from physical reality. A dbt model defines what you want data to look like. A warehouse table is where it actually lives. These are different things, and conflating them makes it impossible for an agent to reason about why something is broken.
Graph traversal over prompt stuffing. Instead of dumping your entire dbt project into a prompt and hoping the model finds what it needs, let agents traverse structured relationships on demand. A targeted graph query is more precise, more scalable, and far more token-efficient.
Evaluate relentlessly. Every architectural decision, every new data source, every change to agent behavior should be validated against real task performance. If you can’t measure whether a change helps, you can’t know whether it helps.

The core insight

AI agents don’t fail at data engineering because they can’t write SQL. They fail because they lack persistent, system-level understanding of the data they’re operating on. How it flows, what it means, how it behaves at runtime, and why it’s structured the way it is.

The data context layer provides that understanding. It transforms agents from tools that can read your code into tools that genuinely understand your data. And the difference, as our evaluations show, is dramatic.

Reach out if you're interested to dig in further and geek out on data context!