Organizations building lakehouse architectures often need to move their semantic models from warehouse-based systems to unified platforms. The migration from Snowflake Semantic Views to Databricks Unity Catalog Metric Views represents a strategic shift—consolidating metrics across SQL analytics, data science notebooks, and ML pipelines under a single governance framework.
What Semantic Layer Migration Means
A semantic layer migration involves transferring business logic definitions from one platform's metric system to another's. This isn't copying data—it's translating how metrics are defined, calculated, and exposed to consuming applications.
Snowflake Semantic Views exist as native database objects within Snowflake schemas. Teams define them through SQL DDL or YAML specifications, and Snowflake's query engine processes them using the SEMANTIC_VIEW() clause. The definitions include logical tables, relationships between tables, and metric formulas.
Databricks Unity Catalog Metric Views operate as catalog objects in the lakehouse. Teams define them through YAML or SQL, and Spark SQL processes them using the MEASURE() clause. The definitions include source tables, joins, measures, and dimensions.
The migration translates these definitions from Snowflake's model to Databricks' model while preserving the business logic each metric represents.
How Teams Migrate Semantic Layers Today
Current migration approaches follow manual, step-by-step processes that require significant technical translation work.
Export and Document Phase
Teams start by extracting Snowflake semantic model definitions through information schema queries:
sqlSELECT semantic_view_name, schema_name FROM INFORMATION_SCHEMA.SEMANTIC_VIEWS; SELECT metric_name, aggregation, expression FROM INFORMATION_SCHEMA.SEMANTIC_METRICS WHERE semantic_view_name = 'SALES_ANALYTICS';
They document each semantic view's structure—which tables it references, what relationships it defines, which metrics it contains, and how those metrics calculate values.
Manual Translation Phase
Analysts manually rewrite Snowflake definitions into Databricks format. A Snowflake semantic view:
sqlCREATE SEMANTIC VIEW sales_analytics AS LOGICAL TABLES orders ( FACTS order_amount, tax_amount DIMENSIONS customer_id, order_date ) RELATIONSHIPS orders.customer_id = customers.customer_id METRICS total_revenue AS SUM(order_amount)
Becomes a Databricks metric view:
yamlmetric_view: name: sales_metrics source_table: main.sales.orders joins: - table: main.sales.customers on: orders.customer_id = customers.customer_id measures: - name: total_revenue expr: SUM(order_amount) dimensions: - name: order_date expr: DATE(order_timestamp)
Validation Phase
Teams run parallel queries against both platforms to verify results match:
Snowflake query:
sqlSELECT customer_region, total_revenue FROM SEMANTIC_VIEW(sales_analytics DIMENSIONS customer_region METRICS total_revenue)
Databricks query:
sqlSELECT customer_region, MEASURE(total_revenue) FROM main.sales.sales_metrics GROUP BY customer_region
They compare results row-by-row, investigating discrepancies caused by time zones, rounding behavior, or NULL handling differences.
Cutover Phase
After validation, teams update BI tool connections and queries to point at Databricks. Tableau custom SQL changes from SEMANTIC_VIEW() syntax to MEASURE() syntax. Dashboard queries get rewritten with explicit GROUP BY clauses.
Access controls migrate separately. Snowflake role grants become Unity Catalog grants. Row-level security policies get recreated in Unity Catalog's format.
Problems with Current Migration Approaches
These manual processes introduce risks and inefficiencies that slow adoption and create errors.
Semantic Drift During Translation
Manual translation introduces subtle logic changes. A Snowflake metric defined as AVG(order_amount) might get rewritten as SUM(order_amount) / COUNT(order_id) in Databricks. While mathematically equivalent for simple cases, they diverge when NULL values exist or when aggregating across different dimension grains.
The problem compounds with derived metrics. If a base metric's formula changes slightly during translation, every metric that references it inherits that drift. Teams discover these discrepancies weeks later when dashboards show different numbers than before.
Incomplete Dependency Mapping
Snowflake semantic views can reference other metrics in their definitions. Databricks measures can reference other measures. But mapping these dependencies manually leads to errors.
A metric like profit_margin = profit / revenue depends on both profit and revenue being defined first. If translation happens out of order, the metric view creation fails. Teams spend hours tracing dependency chains to find the correct ordering.
Lost Context and Intent
Snowflake's synonym feature helps AI systems map natural language to metrics:
sqlMETRICS total_revenue AS SUM(order_amount) SYNONYM ('sales', 'revenue', 'total sales')
Databricks lacks native synonym support. During migration, these semantic hints disappear unless teams manually encode them in descriptions or create duplicate metric definitions—both approaches create maintenance overhead.
Time grain handling differs between platforms. Snowflake allows implicit time conversions. Databricks requires explicit dimension definitions at each grain. Migrating a single order_date dimension might require creating separate order_day, order_month, order_quarter, and order_year dimensions. Teams often miss these requirements until users complain about missing functionality.
Validation Gaps
Comparing query results between platforms catches arithmetic errors but misses structural issues. A metric that accidentally double-counts values due to incorrect join logic might still produce plausible-looking numbers. Without schema-level validation, these errors persist until business users notice dashboard anomalies.
Row-level security and column masking policies need separate testing. A metric view might return correct aggregate values but expose unauthorized row-level detail because access policies weren't properly recreated in Unity Catalog.
Query Syntax Fragmentation
Snowflake infers grouping dimensions from the SEMANTIC_VIEW() clause parameters. Databricks requires explicit GROUP BY statements. This syntax difference breaks all existing BI tool queries, notebooks, and custom applications.
Teams face a choice: rewrite every query during cutover (causing disruption) or maintain dual query templates during transition (creating confusion about which syntax to use).
Governance Gaps During Transition
During the migration window, some metrics exist in Snowflake, some in Databricks, and some in both. Users don't know which system holds the "official" version. Dashboards pull from different sources, creating metric inconsistency.
Access control policies applied in one platform don't automatically transfer to the other. A user restricted from seeing revenue metrics in Snowflake might gain access during migration if Unity Catalog grants aren't precisely replicated.
Making Semantic Layer Migrations Better
Better migration strategies reduce manual translation errors and maintain semantic consistency across platforms.
Schema-Driven Translation
Rather than manually rewriting each metric definition, build translation logic that maps Snowflake's schema to Databricks' schema programmatically.
Create mapping rules for common patterns:
- Snowflake FACTS → Databricks measures with
SUM(),COUNT(), orAVG()expressions - Snowflake DIMENSIONS → Databricks dimensions with appropriate column expressions
- Snowflake RELATIONSHIPS → Databricks join specifications
- Snowflake TIME_DIMENSION → Multiple Databricks dimensions at different grains
Apply these rules to generate Databricks YAML from Snowflake information schema metadata:
pythonsnowflake_metrics = query_snowflake_info_schema() databricks_yaml = translate_schema(snowflake_metrics, mapping_rules) validate_yaml_syntax(databricks_yaml)
This programmatic approach ensures consistent translation logic across all metrics and catches structural errors before deployment.
Dependency Graph Analysis
Build a dependency graph of all metrics before translation. Parse each metric's formula to identify which other metrics or measures it references. Sort metrics topologically so base metrics get created before derived metrics reference them.
pythondependency_graph = build_metric_dependencies(snowflake_metrics) creation_order = topological_sort(dependency_graph) for metric in creation_order: databricks_yaml = translate_metric(metric) create_metric_view(databricks_yaml)
This prevents creation failures from missing dependencies and ensures derived metrics reference the correct base metrics.
Semantic Validation Beyond Results
Query result comparison catches calculation errors but misses semantic drift. Add validation that compares metric definitions directly:
Check aggregation types match:
- If Snowflake uses
SUM(), Databricks should useSUM(), notAVG()
Verify join cardinality preservation:
- One-to-many joins in Snowflake should remain one-to-many in Databricks
- Join keys should match exactly
Validate dimension expressions:
- Time dimensions should use equivalent date functions
- Categorical dimensions should reference the same underlying columns
This semantic processing validation catches issues that identical query results might hide.
Parallel Running with Reconciliation
Rather than cutover all at once, run both semantic layers in parallel with automated reconciliation. Schedule jobs that query identical metrics from both platforms and alert on discrepancies larger than acceptable thresholds (accounting for expected rounding differences).
pythonfor metric in critical_metrics: sf_result = query_snowflake_semantic_view(metric) db_result = query_databricks_metric_view(metric) diff_pct = calculate_difference(sf_result, db_result) if diff_pct > threshold: alert_team(f"Metric {metric} shows {diff_pct}% difference") log_sample_rows(sf_result, db_result)
This catches calculation drift early while both systems are available for comparison.
Metadata Preservation
Capture semantic metadata from Snowflake and transfer it to Databricks even when direct equivalents don't exist.
Snowflake synonyms become Databricks metric descriptions:
yamlmeasures: - name: total_revenue expr: SUM(order_amount) description: "Total revenue. Also known as: sales, total sales, gross revenue"
This preserves the semantic intent for AI systems that parse metric descriptions even without native synonym support.
Certification status and ownership metadata transfer through Unity Catalog properties:
sqlALTER METRIC VIEW main.sales.sales_metrics SET TBLPROPERTIES ( 'certified' = 'true', 'owner' = 'finance_team', 'migrated_from' = 'snowflake.analytics.sales_analytics' );
Access Control Mapping
Build explicit mappings between Snowflake roles and Unity Catalog groups before migrating access policies. Don't assume role names match directly—verify which users belong to each role and create equivalent Unity Catalog groups.
pythonsnowflake_roles = get_snowflake_roles() unity_catalog_groups = get_unity_catalog_groups() role_mapping = create_role_group_mapping(snowflake_roles, unity_catalog_groups) for metric_view in migrated_views: sf_grants = get_snowflake_grants(metric_view) uc_grants = translate_grants(sf_grants, role_mapping) apply_unity_catalog_grants(metric_view, uc_grants)
Test access controls with sample users from each role to verify permissions work as expected.
Query Rewriting Assistance
Provide query rewriting tools rather than requiring manual updates. Build converters that transform Snowflake SEMANTIC_VIEW() queries into Databricks MEASURE() queries:
Input:
sqlSELECT * FROM SEMANTIC_VIEW( sales_analytics DIMENSIONS customer_region, product_category METRICS total_revenue ) WHERE product_category = 'Electronics'
Output:
sqlSELECT customer_region, product_category, MEASURE(total_revenue) FROM main.sales.sales_metrics WHERE product_category = 'Electronics' GROUP BY customer_region, product_category
Integrate these converters into migration guides and documentation so teams can quickly translate their existing queries.
Governance Continuity
Maintain a single source of truth for which metrics are authoritative during migration. Use a metadata registry that tracks:
- Which semantic layer (Snowflake or Databricks) holds the official version of each metric
- Migration status (planned, in progress, validated, cutover complete)
- Validation results and known discrepancies
- Scheduled cutover dates for each metric group
This registry prevents confusion about which system to trust and provides audit trails for compliance requirements.
Future Direction: Cross-Platform Semantic Standards
The Open Semantic Interchange initiative aims to create vendor-neutral semantic layer definitions. Multiple vendors are collaborating on standard YAML formats that would theoretically work across any compatible platform.
If successful, semantic layer migrations would become export-import operations rather than manual translations. Teams would export standard YAML from Snowflake and import it directly into Databricks with no transformation logic required.
Current state: OSI remains in specification development. Working groups are defining standard formats for simple metrics—aggregations and basic ratios. Advanced features like time grain shortcuts, conversion metrics, and platform-specific optimizations will likely remain vendor-specific.
Expected timeline: Simple metric portability in 2026-2027. Full cross-platform compatibility will take longer as vendors balance standardization with differentiation.
The broader trend: Semantic layers are becoming standard infrastructure in data platforms. As LLM adoption accelerates, the need for governed metric definitions grows. Platforms will continue investing in native semantic layer capabilities, making migrations between systems more common and more critical to get right.
Teams migrating today should build with future portability in mind—use standard SQL expressions where possible, maintain YAML definitions in version control, and avoid platform-specific features unless they provide clear value.
Building Reliable Migration Infrastructure
Semantic layer migrations introduce data quality risks that require robust validation infrastructure. Organizations need systems that can verify metric consistency across platforms, validate semantic operations during translation, and maintain governance controls throughout the transition.
Typedef helps teams build data infrastructure for AI systems that need reliable semantic processing. When migrating metric definitions between platforms, maintaining data quality and semantic correctness becomes essential—not just for the migration itself, but for ongoing governance as your lakehouse evolves.
