<< goback()

How to Migrate from Snowflake Semantic Views to Databricks Unity Catalog Metric Views

Typedef Team

How to Migrate from Snowflake Semantic Views to Databricks Unity Catalog Metric Views

Organizations building lakehouse architectures often need to move their semantic models from warehouse-based systems to unified platforms. The migration from Snowflake Semantic Views to Databricks Unity Catalog Metric Views represents a strategic shift—consolidating metrics across SQL analytics, data science notebooks, and ML pipelines under a single governance framework.

What Semantic Layer Migration Means

A semantic layer migration involves transferring business logic definitions from one platform's metric system to another's. This isn't copying data—it's translating how metrics are defined, calculated, and exposed to consuming applications.

Snowflake Semantic Views exist as native database objects within Snowflake schemas. Teams define them through SQL DDL or YAML specifications, and Snowflake's query engine processes them using the SEMANTIC_VIEW() clause. The definitions include logical tables, relationships between tables, and metric formulas.

Databricks Unity Catalog Metric Views operate as catalog objects in the lakehouse. Teams define them through YAML or SQL, and Spark SQL processes them using the MEASURE() clause. The definitions include source tables, joins, measures, and dimensions.

The migration translates these definitions from Snowflake's model to Databricks' model while preserving the business logic each metric represents.

How Teams Migrate Semantic Layers Today

Current migration approaches follow manual, step-by-step processes that require significant technical translation work.

Export and Document Phase

Teams start by extracting Snowflake semantic model definitions through information schema queries:

sql
SELECT semantic_view_name, schema_name
FROM INFORMATION_SCHEMA.SEMANTIC_VIEWS;

SELECT metric_name, aggregation, expression
FROM INFORMATION_SCHEMA.SEMANTIC_METRICS
WHERE semantic_view_name = 'SALES_ANALYTICS';

They document each semantic view's structure—which tables it references, what relationships it defines, which metrics it contains, and how those metrics calculate values.

Manual Translation Phase

Analysts manually rewrite Snowflake definitions into Databricks format. A Snowflake semantic view:

sql
CREATE SEMANTIC VIEW sales_analytics AS
  LOGICAL TABLES
    orders (
      FACTS order_amount, tax_amount
      DIMENSIONS customer_id, order_date
    )
  RELATIONSHIPS
    orders.customer_id = customers.customer_id
  METRICS
    total_revenue AS SUM(order_amount)

Becomes a Databricks metric view:

yaml
metric_view:
  name: sales_metrics
  source_table: main.sales.orders
  joins:
    - table: main.sales.customers
      on: orders.customer_id = customers.customer_id
  measures:
    - name: total_revenue
      expr: SUM(order_amount)
  dimensions:
    - name: order_date
      expr: DATE(order_timestamp)

Validation Phase

Teams run parallel queries against both platforms to verify results match:

Snowflake query:

sql
SELECT customer_region, total_revenue
FROM SEMANTIC_VIEW(sales_analytics
  DIMENSIONS customer_region
  METRICS total_revenue)

Databricks query:

sql
SELECT customer_region, MEASURE(total_revenue)
FROM main.sales.sales_metrics
GROUP BY customer_region

They compare results row-by-row, investigating discrepancies caused by time zones, rounding behavior, or NULL handling differences.

Cutover Phase

After validation, teams update BI tool connections and queries to point at Databricks. Tableau custom SQL changes from SEMANTIC_VIEW() syntax to MEASURE() syntax. Dashboard queries get rewritten with explicit GROUP BY clauses.

Access controls migrate separately. Snowflake role grants become Unity Catalog grants. Row-level security policies get recreated in Unity Catalog's format.

Problems with Current Migration Approaches

These manual processes introduce risks and inefficiencies that slow adoption and create errors.

Semantic Drift During Translation

Manual translation introduces subtle logic changes. A Snowflake metric defined as AVG(order_amount) might get rewritten as SUM(order_amount) / COUNT(order_id) in Databricks. While mathematically equivalent for simple cases, they diverge when NULL values exist or when aggregating across different dimension grains.

The problem compounds with derived metrics. If a base metric's formula changes slightly during translation, every metric that references it inherits that drift. Teams discover these discrepancies weeks later when dashboards show different numbers than before.

Incomplete Dependency Mapping

Snowflake semantic views can reference other metrics in their definitions. Databricks measures can reference other measures. But mapping these dependencies manually leads to errors.

A metric like profit_margin = profit / revenue depends on both profit and revenue being defined first. If translation happens out of order, the metric view creation fails. Teams spend hours tracing dependency chains to find the correct ordering.

Lost Context and Intent

Snowflake's synonym feature helps AI systems map natural language to metrics:

sql
METRICS total_revenue AS SUM(order_amount)
  SYNONYM ('sales', 'revenue', 'total sales')

Databricks lacks native synonym support. During migration, these semantic hints disappear unless teams manually encode them in descriptions or create duplicate metric definitions—both approaches create maintenance overhead.

Time grain handling differs between platforms. Snowflake allows implicit time conversions. Databricks requires explicit dimension definitions at each grain. Migrating a single order_date dimension might require creating separate order_day, order_month, order_quarter, and order_year dimensions. Teams often miss these requirements until users complain about missing functionality.

Validation Gaps

Comparing query results between platforms catches arithmetic errors but misses structural issues. A metric that accidentally double-counts values due to incorrect join logic might still produce plausible-looking numbers. Without schema-level validation, these errors persist until business users notice dashboard anomalies.

Row-level security and column masking policies need separate testing. A metric view might return correct aggregate values but expose unauthorized row-level detail because access policies weren't properly recreated in Unity Catalog.

Query Syntax Fragmentation

Snowflake infers grouping dimensions from the SEMANTIC_VIEW() clause parameters. Databricks requires explicit GROUP BY statements. This syntax difference breaks all existing BI tool queries, notebooks, and custom applications.

Teams face a choice: rewrite every query during cutover (causing disruption) or maintain dual query templates during transition (creating confusion about which syntax to use).

Governance Gaps During Transition

During the migration window, some metrics exist in Snowflake, some in Databricks, and some in both. Users don't know which system holds the "official" version. Dashboards pull from different sources, creating metric inconsistency.

Access control policies applied in one platform don't automatically transfer to the other. A user restricted from seeing revenue metrics in Snowflake might gain access during migration if Unity Catalog grants aren't precisely replicated.

Making Semantic Layer Migrations Better

Better migration strategies reduce manual translation errors and maintain semantic consistency across platforms.

Schema-Driven Translation

Rather than manually rewriting each metric definition, build translation logic that maps Snowflake's schema to Databricks' schema programmatically.

Create mapping rules for common patterns:

  • Snowflake FACTS → Databricks measures with SUM(), COUNT(), or AVG() expressions
  • Snowflake DIMENSIONS → Databricks dimensions with appropriate column expressions
  • Snowflake RELATIONSHIPS → Databricks join specifications
  • Snowflake TIME_DIMENSION → Multiple Databricks dimensions at different grains

Apply these rules to generate Databricks YAML from Snowflake information schema metadata:

python
snowflake_metrics = query_snowflake_info_schema()
databricks_yaml = translate_schema(snowflake_metrics, mapping_rules)
validate_yaml_syntax(databricks_yaml)

This programmatic approach ensures consistent translation logic across all metrics and catches structural errors before deployment.

Dependency Graph Analysis

Build a dependency graph of all metrics before translation. Parse each metric's formula to identify which other metrics or measures it references. Sort metrics topologically so base metrics get created before derived metrics reference them.

python
dependency_graph = build_metric_dependencies(snowflake_metrics)
creation_order = topological_sort(dependency_graph)

for metric in creation_order:
    databricks_yaml = translate_metric(metric)
    create_metric_view(databricks_yaml)

This prevents creation failures from missing dependencies and ensures derived metrics reference the correct base metrics.

Semantic Validation Beyond Results

Query result comparison catches calculation errors but misses semantic drift. Add validation that compares metric definitions directly:

Check aggregation types match:

  • If Snowflake uses SUM(), Databricks should use SUM(), not AVG()

Verify join cardinality preservation:

  • One-to-many joins in Snowflake should remain one-to-many in Databricks
  • Join keys should match exactly

Validate dimension expressions:

  • Time dimensions should use equivalent date functions
  • Categorical dimensions should reference the same underlying columns

This semantic processing validation catches issues that identical query results might hide.

Parallel Running with Reconciliation

Rather than cutover all at once, run both semantic layers in parallel with automated reconciliation. Schedule jobs that query identical metrics from both platforms and alert on discrepancies larger than acceptable thresholds (accounting for expected rounding differences).

python
for metric in critical_metrics:
    sf_result = query_snowflake_semantic_view(metric)
    db_result = query_databricks_metric_view(metric)

    diff_pct = calculate_difference(sf_result, db_result)

    if diff_pct > threshold:
        alert_team(f"Metric {metric} shows {diff_pct}% difference")
        log_sample_rows(sf_result, db_result)

This catches calculation drift early while both systems are available for comparison.

Metadata Preservation

Capture semantic metadata from Snowflake and transfer it to Databricks even when direct equivalents don't exist.

Snowflake synonyms become Databricks metric descriptions:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)
    description: "Total revenue. Also known as: sales, total sales, gross revenue"

This preserves the semantic intent for AI systems that parse metric descriptions even without native synonym support.

Certification status and ownership metadata transfer through Unity Catalog properties:

sql
ALTER METRIC VIEW main.sales.sales_metrics
SET TBLPROPERTIES (
  'certified' = 'true',
  'owner' = 'finance_team',
  'migrated_from' = 'snowflake.analytics.sales_analytics'
);

Access Control Mapping

Build explicit mappings between Snowflake roles and Unity Catalog groups before migrating access policies. Don't assume role names match directly—verify which users belong to each role and create equivalent Unity Catalog groups.

python
snowflake_roles = get_snowflake_roles()
unity_catalog_groups = get_unity_catalog_groups()

role_mapping = create_role_group_mapping(snowflake_roles, unity_catalog_groups)

for metric_view in migrated_views:
    sf_grants = get_snowflake_grants(metric_view)
    uc_grants = translate_grants(sf_grants, role_mapping)
    apply_unity_catalog_grants(metric_view, uc_grants)

Test access controls with sample users from each role to verify permissions work as expected.

Query Rewriting Assistance

Provide query rewriting tools rather than requiring manual updates. Build converters that transform Snowflake SEMANTIC_VIEW() queries into Databricks MEASURE() queries:

Input:

sql
SELECT * FROM SEMANTIC_VIEW(
  sales_analytics
  DIMENSIONS customer_region, product_category
  METRICS total_revenue
)
WHERE product_category = 'Electronics'

Output:

sql
SELECT
  customer_region,
  product_category,
  MEASURE(total_revenue)
FROM main.sales.sales_metrics
WHERE product_category = 'Electronics'
GROUP BY customer_region, product_category

Integrate these converters into migration guides and documentation so teams can quickly translate their existing queries.

Governance Continuity

Maintain a single source of truth for which metrics are authoritative during migration. Use a metadata registry that tracks:

  • Which semantic layer (Snowflake or Databricks) holds the official version of each metric
  • Migration status (planned, in progress, validated, cutover complete)
  • Validation results and known discrepancies
  • Scheduled cutover dates for each metric group

This registry prevents confusion about which system to trust and provides audit trails for compliance requirements.

Future Direction: Cross-Platform Semantic Standards

The Open Semantic Interchange initiative aims to create vendor-neutral semantic layer definitions. Multiple vendors are collaborating on standard YAML formats that would theoretically work across any compatible platform.

If successful, semantic layer migrations would become export-import operations rather than manual translations. Teams would export standard YAML from Snowflake and import it directly into Databricks with no transformation logic required.

Current state: OSI remains in specification development. Working groups are defining standard formats for simple metrics—aggregations and basic ratios. Advanced features like time grain shortcuts, conversion metrics, and platform-specific optimizations will likely remain vendor-specific.

Expected timeline: Simple metric portability in 2026-2027. Full cross-platform compatibility will take longer as vendors balance standardization with differentiation.

The broader trend: Semantic layers are becoming standard infrastructure in data platforms. As LLM adoption accelerates, the need for governed metric definitions grows. Platforms will continue investing in native semantic layer capabilities, making migrations between systems more common and more critical to get right.

Teams migrating today should build with future portability in mind—use standard SQL expressions where possible, maintain YAML definitions in version control, and avoid platform-specific features unless they provide clear value.

Building Reliable Migration Infrastructure

Semantic layer migrations introduce data quality risks that require robust validation infrastructure. Organizations need systems that can verify metric consistency across platforms, validate semantic operations during translation, and maintain governance controls throughout the transition.

Typedef helps teams build data infrastructure for AI systems that need reliable semantic processing. When migrating metric definitions between platforms, maintaining data quality and semantic correctness becomes essential—not just for the migration itself, but for ongoing governance as your lakehouse evolves.

the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.