On June 2, 2026, at Snowflake Summit, Snowflake announced a new service called Cortex Sense. It collects the business definitions and background knowledge an AI agent needs, and it gives that information to the agent at the moment the agent answers a question. Cortex Sense is in private preview. This post explains what it does and reports the improvement Snowflake published. It then explains the one problem that better context cannot fix.
1. What Snowflake announced
Cortex Sense is not a chatbot, and it is not a product you talk to. It is a background service that two other new Snowflake agents use. The first is CoWork, a work assistant for everyday employees, which was called Snowflake Intelligence before. The second is CoCo, a coding assistant, which was called Cortex Code before. Both agents use the same Cortex Sense service when they answer a question.
The selling point is automatic context with no manual setup. Cortex Sense collects a shared set of business definitions from four sources that already exist in your Snowflake account. When an agent answers a question, Cortex Sense finds the relevant definitions and gives them to the agent. The four sources are:
- Query history, which is the record of past queries people have run.
- Object metadata, which is the names and descriptions of tables and columns.
- Dashboard definitions from BI tools, including Power BI and Tableau.
- Semantic views from Horizon Context, which are Snowflake's stored, governed definitions of metrics.
2. How it works
Snowflake describes the two parts as a library and a librarian. Here is what each part does in plain terms. Horizon Context stores the definitions. It holds the metric definitions and the glossary. It also holds the record of how each metric was built from the raw tables, which is called the lineage. Cortex Sense is the part that reads this store when a query comes in and picks out the definitions that fit the question.
What makes Cortex Sense different from a fixed set of definitions is that it picks the definitions for each question as the question arrives. It does not read from a copy prepared in advance. The agent does not have to be told ahead of time which definitions matter. Cortex Sense works that out when the agent runs the query.
Snowflake describes the retrieval as hybrid. It matches definitions in two ways at once, by meaning and by exact keyword, then reorders the results so the most relevant definitions come first and gives the top ones to the agent before the agent answers. This is the same kind of hybrid search with reranking that Snowflake documents elsewhere in its platform. Snowflake refreshes this search automatically as the underlying data changes.
Cortex Sense is in private preview, and Snowflake has not published its full internal design. So we will not assert which engine runs underneath, or exactly how it combines and weighs the four sources. What is clear is the shape of the work. Cortex Sense finds definitions, ranks them, and serves the top ones to the agent.
3. The problem it targets
Cortex Sense aims at a common problem, and it is the right problem to work on. An AI agent can read your table layout perfectly and still not know what your business means by it. Suppose you ask an agent for third-quarter revenue. The agent can give the wrong answer if it does not know that your financial year starts in February. Finance and sales may define revenue differently from each other. Snapshot tables store values for a specific date, and a simple query can read them wrong. None of this is written in the column names.
Cortex Sense fixes part of this problem when the agent runs the query. It surfaces the business meaning that already exists in your account but that the agent would otherwise never see. This is useful work.
4. What Snowflake's numbers show
Snowflake supports the claim with its own benchmark on hard enterprise questions. The benchmark reports three numbers. Here is what each one measures.
| Setup | Accuracy on hard enterprise questions |
|---|---|
| A general coding agent on its own, connected to Snowflake through its standard connector (MCP) | about 23% |
| Snowflake's own agents, CoWork and CoCo, without Cortex Sense | about 47% |
| The same agents with Cortex Sense | about 83% (about 86% in a separate measure) |
The headline jump from 23% to 83% compares a general agent with no governed context against Snowflake's agents with the full service on. The most direct comparison is the same agent with the service off and then on, which is 47% to 83%. Either way, the improvement is large. Governed context makes these agents much better at hard questions. That is a clear gain, and we should say so. The figures are Snowflake's own benchmark, which is linked in the sources below.
5. Why some answers are still wrong
Now look at what the 83% leaves out. If 83% of answers are correct, then about one in six is wrong. The agent does not flag these answers as uncertain, and it does not refuse them. It gives them in the same confident voice as the answers that are right. Better context helps the agent produce a better answer. It does not check whether the answer is correct.
You can see why from how the service works. Cortex Sense finds definitions, ranks them, and gives the top ones to the agent. It never runs the agent's query against the code that built the data, and it never checks the result. Cortex Sense can give the agent exactly the right definition and still have no way to know whether the answer that comes back is computed correctly.
Some answers are still wrong for three reasons, and the reasons add up.
First, the business context that Snowflake gives the agent is not complete. The semantic view holds a metric's definition, but it does not record every rule that decides whether a calculation on that metric is valid.
Second, the retrieval is not perfect. The hybrid search ranks definitions by how well they match the question, so it can surface the wrong definition, or miss the right one, even when the right one exists.
Third, and this is the hardest reason, some errors do not come from the context at all. They come from how the query combines the data. You can give the agent the correct and complete definition of a metric, and the agent can still compute the wrong number, because that metric gets combined across time or across levels of detail in a way that is not valid. A definition tells you what a metric means. It does not tell you what math is safe to do with it. Whether a calculation is valid depends on how the metric was built in the transformation code upstream.
Snowflake's Horizon Context does build lineage. For Snowflake-native data it goes down to the column level, stitched from query logs and other feeds. But having lineage is not the same as catching this error, for two reasons. First, that lineage records which columns feed which. It does not carry the rule that a distinct count cannot be added across time, which is the fact you need to catch this error. Second, Cortex Sense serves this context to inform the agent's guess. Nothing reads an additivity rule and blocks the invalid answer before it goes out. Serving context is not checking the answer.
So more context helps with the first two problems. It does not solve the third. And the context that does solve the third is a different kind. It is not more business definitions, and it is not lineage mined from query logs. It is a record of how each metric is calculated, compiled from the transformation code and carrying the rule for what math is valid on it, together with a check that reads that rule before the answer goes out.
6. A worked example, and a disclosure
A disclosure first. Cortex Sense is in private preview, so we have not tested it, and we make no claim about it. The example below comes from a Cortex Analyst query, which Snowflake ships today, running against a governed Snowflake semantic view. We use it only to show the kind of error that any context layer leaves unfixed.
The question was ordinary, and the agent did the safest possible thing. Someone asked how many active users there were in August. The agent wrote no custom SQL. It just asked the governed semantic view for a governed metric, total_daily_active_users, for August. The governed answer came back as 363,021. The true figure was 11,221, counting each user once. The answer was about 32 times too high.
Here is why, and why it is so hard to see. The metric total_daily_active_users is defined as a sum of daily_active_users, and a sum looks safe to add up. But daily_active_users is itself a per-day count of distinct users, built upstream in the dbt code with a COUNT(DISTINCT user_id). Summing it across the month adds each day's distinct count together, so a user who was active on twenty days is counted twenty times. The metric is correct for a single day. It is not valid to add across days.
Now notice where that fact is, and is not. It is not in the agent's request, which only named a governed metric. It is not in the metric's definition, which is a plain sum. It is not in the data in the tables, where one table has a row per user and the other has a daily count, and neither shows that adding the counts double counts people. The COUNT(DISTINCT) that makes the sum invalid is a step in the dbt code between the two tables. You cannot tell whether the answer is valid from the metric or from the tables. You can tell only from the transformation that built the data.
This is the kind of error a context layer does not catch on its own. It is not rare. It is a common and easily missed error when an AI agent queries data, and a governed layer produces it as readily as an ungoverned one.
7. What actually turns a guess into a checked answer
The only thing that turns a good guess into a checked answer is to verify the answer against the code that built the data. This is what we mean by a compiler in the loop for data agents.
In practice, this means working out the properties of each metric from its lineage, and checking the answer against those properties before it goes out. Take the active-users metric above. Its definition is a plain sum, so nothing on the surface warns you. Typedef reads the metric's lineage back to the upstream COUNT(DISTINCT user_id) that the sum is built on, sees that adding a per-day distinct count across days re-counts each user every day, and marks the metric as not safe to add across time. A check that knows this can block the sum and recompute the distinct count over the whole month instead, which returns the true 11,221.
Two parts do this work. Typedef ships a classifier, that decides from a metric's definition whether it can be added across time. When that fact is buried upstream, as it is here, the check follows the metric's lineage to the distinct count to find it, which is metric provenance. The property it checks has a name, time-additivity, and you can test it. A compiler can enforce a property you can test. A context layer cannot.
So here is the one idea to keep. Context is not correctness. Business definitions help the agent answer, but they cannot tell you whether the calculation is valid. To know that, you have to read how the data was built and check the query against it.
8. Why we care about this here at Typedef
Typedef is the compiler for data agents. It re-derives trusted metrics from your transformation code on every commit. Our goal is to feel the validation loop gap that today exists with data agents and allow data professionals to reach the same levels of productivity that software engineers have been experiencing for a while now.
9. FAQ
Is Cortex Sense available yet? No. As of June 2026 it is in private preview, it has no announced date for general release, and it works only with Snowflake. The 83% figure describes the preview, not a product you can buy today.
How is Cortex Sense different from Horizon Context? Horizon Context stores the governed definitions and the lineage. Cortex Sense reads that store when a query comes in and picks the definitions that fit the question. It also draws on query history, metadata, and BI dashboards.
| Horizon Context | Cortex Sense | |
|---|---|---|
| What it is | The store of governed definitions and lineage | The service that retrieves and serves that context |
| When it works | Maintained ahead of time | Runs at the moment the agent answers |
| Sources | Semantic views, glossary, lineage | Horizon Context plus query history, metadata, BI dashboards |
In short, Horizon Context stores the definitions, and Cortex Sense delivers them. For the store on its own, including what it can and cannot guarantee about a metric, see What Is Horizon Context?.
How does Cortex Sense compare to Databricks Genie Ontology? Both are governed context layers for agents, and both were announced in the same week. Each one works with only one vendor. Cortex Sense works only with Snowflake, and Genie Ontology works only with Databricks. For the full Databricks side, see our explainers What Is Genie Ontology? for the context layer and What Are Metrics in Unity Catalog? for the governed metric store it reads from. The deeper question is whether either one checks its answers or only adds context.
Did Typedef test Cortex Sense? No. Cortex Sense is in private preview, and we make no claim about it. The worked example in this post is a Cortex Analyst query against a governed semantic view. We use it to show the kind of error that any context layer leaves unfixed.
What is time-additivity? Time-additivity is whether you can validly add a metric across time. You cannot add a distinct count of users or servers across months, because the same user or server appears in more than one month, so adding the counts double-counts them.
Sources
- Snowflake, CoWork powers the agentic enterprise (press release, the agents Cortex Sense feeds)
- Snowflake, Snowflake CoWork: the personal work agent (blog, the Cortex Sense benchmark)
- Snowflake, Horizon Context: the governed context layer (blog, where definitions and lineage are stored)
- Snowflake, Cortex Search: high-quality enterprise AI retrieval (engineering blog, how Snowflake's hybrid search with reranking works)
- Atlan, Snowflake Cortex Sense and the enterprise context layer (explainer)
- SiliconANGLE, Snowflake moves up the AI stack (analysis, accuracy claim and skepticism)
