How to Migrate from Omni Analytics to Databricks Unity Catalog Metric Views

Organizations standardizing on Databricks face a common question: should metrics live in an external semantic layer or within the lakehouse itself? As Unity Catalog Metric Views reached production readiness in late 2025, teams began evaluating whether to consolidate their semantic layer into their data platform.

This migration represents more than a technical change. It's a shift from maintaining metrics as external abstractions to treating them as native catalog objects that work seamlessly across SQL, notebooks, and ML pipelines.

What Is Migration to Databricks Unity Catalog Metric Views?

Unity Catalog Metric Views are semantic layer objects built directly into Databricks. They define business metrics—revenue, customer counts, conversion rates—as first-class catalog entities alongside tables and views.

A metric view contains three core components:

Measures: Aggregations and calculations that answer business questions (total sales, average order value, churn rate)

Dimensions: Attributes for slicing data (customer region, product category, time periods)

Join specifications: Relationship definitions between fact and dimension tables in your lakehouse

The architecture differs fundamentally from external semantic layers. Metric views execute entirely within Spark SQL and Photon, with no middleware translation:

Unity Catalog metric view → Spark SQL optimizer → Photon execution → Delta Lake storage

Users query metrics with the MEASURE() SQL clause:

sql
SELECT
  customer_region,
  MEASURE(total_revenue),
  MEASURE(order_count)
FROM main.analytics.sales_metrics
WHERE order_date >= '2025-01-01'
GROUP BY customer_region;

Migration means moving metric definitions from an external platform into Unity Catalog, then updating data consumers (dashboards, notebooks, ML pipelines) to query metrics directly from Databricks.

How Teams Migrate Semantic Layers Today

Teams currently using external semantic layers follow several common patterns when consolidating into Databricks:

Manual Metric Translation

Most migrations start with exporting existing metric definitions—often in YAML or JSON format—then manually rewriting them as Unity Catalog metric views. Data teams review each metric's aggregation logic, join requirements, and filter conditions, then translate them into the Unity Catalog YAML schema.

Common workflow:

Export semantic model from current platform
Document each metric's calculation logic
Map source tables to Delta Lake paths
Rewrite join relationships for Unity Catalog syntax
Create metric views using SQL DDL or YAML files
Test metric values match the original system

Parallel Operation Period

Teams rarely cut over immediately. Instead, they run both systems in parallel for weeks or months. Analysts query metrics from both sources and compare results to verify correctness. This parallel phase catches calculation errors, missing dimensions, or subtle differences in how aggregations handle nulls and edge cases.

The parallel period also gives business users time to adjust. Dashboards gradually switch from the external semantic layer to direct Databricks connections, allowing teams to validate each dashboard before decommissioning the old system.

Dashboard Reconstruction

Most external semantic layers provide pre-built dashboard interfaces or embedded analytics. When migrating, teams must rebuild these visualizations using Databricks SQL dashboards or by connecting BI tools directly to Databricks SQL warehouses.

This reconstruction phase often reveals hidden dependencies—reports that query multiple semantic models, custom filters that don't map cleanly to metric view dimensions, or embedded analytics that require API changes in applications.

Access Control Remapping

External semantic layers maintain their own permission models. Migration requires translating these permissions into Unity Catalog's RBAC system. Data teams audit who can access which metrics, then grant appropriate SELECT permissions on metric views while ensuring users don't accidentally gain access to underlying raw tables.

Row-level security adds another layer. If the external semantic layer filtered data based on user attributes (sales reps seeing only their territory), teams must implement equivalent row filters in Unity Catalog on the base tables.

BI Tool Reconfiguration

Tableau, Power BI, and other BI tools connected to the external semantic layer need new data source configurations. Teams replace semantic layer API connections with direct JDBC/ODBC connections to Databricks SQL warehouses, then update all custom SQL to use MEASURE() syntax instead of the previous query format.

Problems with Current Migration Approaches

Standard migration methods introduce friction that slows adoption and creates risk:

Metric Definition Drift

When teams manually translate metrics, subtle errors creep in. A ratio calculated as SUM(revenue) / SUM(orders) in the old system might accidentally become AVG(revenue / orders) in the new metric view—mathematically different results that won't be obvious until someone questions the numbers.

Conversion metrics that track cohort behavior over time windows are particularly error-prone. The external semantic layer might handle date logic one way, while the manual Unity Catalog translation handles it differently, leading to metrics that look correct but produce different cohort membership.

Incomplete Semantic Model Coverage

Teams typically migrate their most important metrics first, leaving less-used metrics for later phases. This creates a hybrid state where some teams query Unity Catalog metric views while others still use the external semantic layer for niche metrics. The split creates confusion about which system represents the source of truth.

Worse, some metrics depend on others. If "profit margin" requires both "revenue" and "cost" metrics, but only revenue has been migrated, teams face a choice: delay using the new system until all dependencies migrate, or duplicate logic across both systems.

Lost Business Context

External semantic layers often accumulate metadata that doesn't transfer cleanly: certifications indicating which metrics are production-ready, detailed descriptions written for business users, synonym mappings for natural language queries, and organizational tags that help users find relevant metrics.

When migrating, this context gets lost or requires manual recreation in Unity Catalog. Teams end up with technically correct metric views that lack the surrounding documentation and trust signals that made the original semantic layer valuable.

Performance Regression During Transition

Direct queries to Databricks should perform better than routing through external middleware, but the transition period often sees worse performance. Why? Teams haven't yet optimized their Delta tables for the new query patterns. The external semantic layer might have cached results or used specific aggregation strategies that don't automatically carry over.

Queries that worked fine through the external layer suddenly timeout or consume excessive compute credits when run directly against Unity Catalog metric views, forcing teams to tune SQL warehouse sizes, adjust clustering, or add query optimization hints.

Broken Integration Chains

Applications embedding analytics from the external semantic layer face integration challenges. The embedding API, authentication methods, and result formats differ between the external platform and Databricks SQL. Development teams must rewrite integration code, update authentication flows, and handle different error conditions—work that delays the migration timeline.

ML pipelines pulling metrics for feature engineering face similar issues. If data scientists used the external semantic layer's Python library to fetch metrics, they must rewrite those calls to use Databricks' SQL connector, changing code in notebooks and production ML jobs.

No Rollback Safety Net

Once teams start updating dashboards and applications to query Unity Catalog directly, rolling back becomes difficult. If a critical metric produces wrong values in production, teams can't simply switch back to the external semantic layer—they've already modified too many downstream consumers.

This lack of rollback safety creates pressure to get everything perfect before migration, which paradoxically slows the process. Teams spend weeks validating edge cases rather than migrating incrementally with confidence.

Making Migration Better

Effective migration to Unity Catalog Metric Views requires treating metrics as code with automated validation, incremental rollout strategies, and governance from day one.

Automated Metric Translation and Validation

Rather than manually rewriting metric definitions, build automated translation scripts that convert existing semantic models into Unity Catalog YAML format. These scripts should handle common patterns: simple aggregations, ratio metrics, derived metrics, and join specifications.

Translation approach:

Export the current semantic model's YAML or JSON definitions. Parse each metric to extract the aggregation type, source columns, filters, and dimension relationships. Map these to Unity Catalog's schema:

yaml
metric_view:
  name: revenue_metrics
  source_table: prod.sales.orders

  measures:
    - name: total_revenue
      expr: SUM(order_amount)
      description: "Total revenue from all completed orders"

    - name: avg_order_value
      expr: SUM(order_amount) / COUNT(DISTINCT order_id)
      description: "Average revenue per order"

  dimensions:
    - name: order_date
      expr: DATE(order_timestamp)

    - name: customer_region
      expr: customers.region

  joins:
    - table: prod.sales.customers
      on: orders.customer_id = customers.customer_id

After translation, automate validation. Write SQL queries that fetch the same metric for identical time periods and dimensions from both systems, then compare results. Flag any discrepancies for manual review:

sql
-- Query old system
SELECT customer_region, total_revenue
FROM old_semantic_layer.revenue_metrics
WHERE order_date = '2025-06-01'

-- Query new metric view
SELECT customer_region, MEASURE(total_revenue) AS total_revenue
FROM prod.analytics.revenue_metrics
WHERE order_date = '2025-06-01'

-- Compare results
-- Flag rows where values differ by more than 0.01%

Automated validation catches calculation errors before they reach production dashboards.

Incremental Metric Migration with Feature Flags

Migrate metrics domain by domain, not all at once. Start with a single business area—marketing metrics, for example—and fully transition those before moving to finance or operations.

Within each domain, use feature flags to control which consumers use Unity Catalog metric views versus the existing system. Dashboards can toggle between data sources based on a configuration flag, allowing quick rollback if issues arise.

Feature flag pattern in dashboards:

python
# Dashboard code checks feature flag
if feature_flags.use_unity_catalog_metrics:
    query = """
        SELECT customer_segment, MEASURE(conversion_rate)
        FROM prod.marketing.campaign_metrics
        WHERE campaign_date >= '2025-01-01'
    """
    connection = databricks_connection
else:
    query = """
        SELECT customer_segment, conversion_rate
        FROM legacy_semantic_layer.campaign_metrics
        WHERE campaign_date >= '2025-01-01'
    """
    connection = legacy_connection

results = connection.execute(query)

This approach lets teams migrate individual dashboards, validate results with business users, then expand the rollout. If a metric produces unexpected values, disable the feature flag for that dashboard while investigating—no need to roll back the entire migration.

Preserve Semantic Metadata

Migrate more than just calculation logic. Transfer the organizational metadata that makes metrics trustworthy and discoverable:

Certification status: Mark production-ready metrics explicitly in Unity Catalog:

sql
ALTER METRIC VIEW prod.finance.revenue_metrics
SET TBLPROPERTIES (
  'certified' = 'true',
  'owner' = 'finance_team@company.com',
  'review_date' = '2025-06-15'
);

Business descriptions: Include detailed explanations that business users need:

yaml
measures:
  - name: net_revenue
    expr: SUM(order_amount * (1 - discount_pct) - refund_amount)
    description: |
      Total order revenue after discounts and refunds.
      Used for all financial reporting. Excludes tax and shipping.
      Calculation reviewed quarterly by Finance team.
      Contact: finance_team@company.com if values look incorrect.

Domain tags: Organize metrics by business function for easier search:

sql
ALTER METRIC VIEW prod.marketing.campaign_metrics
SET TAGS (
  'domain' = 'marketing',
  'pii' = 'false',
  'update_frequency' = 'hourly',
  'data_classification' = 'internal'
);

These tags enable self-service metric discovery. Analysts searching Unity Catalog can filter by domain, find relevant metrics, and understand their intended use without asking data teams.

Optimize for New Query Patterns

Unity Catalog metric views query Delta tables directly, bypassing the external semantic layer's caching or pre-aggregation strategies. This means query patterns change, and tables need different optimization.

Z-ordering for common dimensions: If most queries filter by date and region, add Z-ordering:

sql
OPTIMIZE prod.sales.orders
ZORDER BY (order_date, customer_region);

This co-locates related data within Delta files, making range scans faster.

Liquid clustering for high-cardinality dimensions: For dimensions with thousands of distinct values (customer_id, product_sku), use liquid clustering:

sql
ALTER TABLE prod.sales.orders
CLUSTER BY (customer_id);

Liquid clustering automatically maintains optimal data layout as new data arrives.

Appropriate SQL warehouse sizing: External semantic layers often hid the underlying compute. With direct queries to Databricks, teams must right-size SQL warehouses. Start with Serverless SQL for auto-scaling, or provision warehouses based on query concurrency:

Small warehouse: Up to 10 concurrent analysts
Medium warehouse: 10-50 concurrent users
Large warehouse: 50+ concurrent users or complex aggregations

Caching strategies: Enable Databricks' result cache and disk cache for frequently accessed metrics. The first query pays full computation cost; subsequent identical queries return instantly from cache.

Build Testing Framework

Treat metric views like application code. Write automated tests that verify metric calculations stay correct as underlying tables evolve.

Schema validation tests: Ensure metric views reference tables and columns that exist:

sql
-- Test that source table exists
SELECT COUNT(*) FROM prod.sales.orders;

-- Test that required columns exist
DESCRIBE prod.sales.orders;
-- Verify order_amount, order_timestamp, customer_id columns present

Value range tests: Check that metrics produce reasonable values:

sql
-- Revenue should be positive
SELECT COUNT(*) FROM (
  SELECT MEASURE(total_revenue) AS revenue
  FROM prod.analytics.revenue_metrics
  WHERE order_date = CURRENT_DATE - 1
) WHERE revenue < 0;
-- Should return 0

Consistency tests: Verify metrics match known totals:

sql
-- Monthly revenue should match finance system's reported total
SELECT SUM(MEASURE(total_revenue)) AS databricks_revenue
FROM prod.analytics.revenue_metrics
WHERE order_date >= '2025-06-01' AND order_date < '2025-07-01';
-- Compare to finance system's June revenue

Run these tests in CI/CD pipelines before deploying metric view changes to production. Failed tests block deployment, preventing broken metrics from reaching dashboards.

Enable ML and Notebook Integration

One of Unity Catalog metric views' strengths: data scientists can use the same metrics in notebooks and ML pipelines that analysts use in dashboards.

Feature engineering with metric views:

python
# Data scientist in notebook
df = spark.sql("""
  SELECT
    customer_id,
    MEASURE(lifetime_revenue) AS customer_ltv,
    MEASURE(purchase_frequency) AS purchase_freq,
    MEASURE(days_since_last_purchase) AS recency
  FROM prod.analytics.customer_metrics
  WHERE first_purchase_date < '2025-01-01'
""")

# Use metrics as ML features
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import GBTClassifier

assembler = VectorAssembler(
    inputCols=["customer_ltv", "purchase_freq", "recency"],
    outputCol="features"
)

feature_df = assembler.transform(df)
model = GBTClassifier().fit(feature_df)

This ensures ML models and business reports use identical metric definitions. Before migration, data scientists often wrote their own SQL to calculate features, leading to subtle differences between model features and dashboard metrics. Unity Catalog metric views eliminate this inconsistency.

Implement Proper Access Controls

Grant permissions at the metric view level, not the underlying table level. This enables governed self-service: business users query certified metrics without accessing raw data.

sql
-- Grant metric access to analyst role
GRANT SELECT ON METRIC VIEW prod.finance.revenue_metrics
TO `finance_analysts`;

-- Revoke access to underlying tables
REVOKE SELECT ON TABLE prod.sales.orders
FROM `finance_analysts`;

Analysts can now query revenue metrics, slice by any dimension, and filter by date ranges—but they can't see individual order records or customer PII in the raw orders table.

Row-level security: If different user groups should see different data subsets (regional managers seeing only their region), implement row filters on base tables:

sql
CREATE OR REPLACE FUNCTION filter_by_region(region STRING)
RETURNS BOOLEAN
RETURN region = current_user_region();

ALTER TABLE prod.sales.orders
SET ROW FILTER filter_by_region(customer_region) ON (customer_region);

Metric view queries automatically respect these filters. A regional manager querying MEASURE(total_revenue) sees only revenue from their region, without needing special query syntax.

Establish Change Management Process

Metric definitions should change through formal processes, not ad-hoc updates. Store Unity Catalog metric view YAML files in version control:

metrics/
  finance/
    revenue_metrics.yaml
    cost_metrics.yaml
    profitability_metrics.yaml
  marketing/
    campaign_metrics.yaml
    attribution_metrics.yaml
  operations/
    fulfillment_metrics.yaml

Changes flow through pull requests with required approvals. When a data analyst proposes adding a new metric or modifying an existing calculation, the pull request triggers:

Automated tests validate the metric definition syntax
Integration tests compare values to expected results
Peer review ensures calculation logic matches business requirements
Approval from metric owner (Finance team for revenue metrics, for example)
Deployment to dev environment for stakeholder validation
Promotion to production after sign-off

This process prevents the "surprise metric changes" that erode trust in analytics. Business users know that production metrics won't change without review and notification.

Monitor Metric Usage and Performance

Track which metrics get queried, by whom, and how often. Unity Catalog's audit logs capture this information:

sql
SELECT
  request_params.target_name AS metric_view,
  user_identity.email AS user,
  COUNT(*) AS query_count,
  AVG(execution_duration) AS avg_duration_ms
FROM system.access.audit
WHERE action_name = 'SELECT'
  AND request_params.target_type = 'METRIC_VIEW'
  AND event_date >= CURRENT_DATE - 30
GROUP BY metric_view, user
ORDER BY query_count DESC;

Usage analytics reveal several insights:

Popular metrics that warrant performance optimization through better table clustering or dedicated compute resources

Rarely-used metrics that might be candidates for deprecation, reducing maintenance burden

Heavy users who might benefit from training on more efficient query patterns

Slow queries that need optimization through better SQL warehouse sizing, additional indexes, or query rewriting

Performance degradation over time suggesting that tables need optimization maintenance

Set up alerts for metric queries that exceed latency thresholds. If a previously fast metric suddenly takes 10x longer to compute, investigate whether data volume increased, table optimization degraded, or a schema change affected query plans.

Future Direction for Lakehouse Semantics

Unity Catalog metric views continue to evolve. Several trends will shape the next generation of lakehouse semantics:

Real-Time Metric Computation

Current metric views query batch Delta tables, with data freshness limited by the ETL schedule. Future versions will support streaming aggregations, enabling near-real-time metrics without separate stream processing infrastructure.

Marketing teams could query campaign metrics that update as events stream in, rather than waiting for hourly or daily batch loads. Operations teams could monitor fulfillment metrics with sub-minute latency, spotting issues before they cascade.

Fiscal Calendar and Time Intelligence

Native support for fiscal calendars, custom hierarchies, and period-over-period comparisons without manual date logic. Organizations with fiscal years that don't align to calendar years currently build complex SQL to handle fiscal quarters and year-over-year comparisons.

Built-in fiscal calendar support would let teams define "fiscal year starts in July" once, then automatically get correct fiscal quarter aggregations, fiscal YoY comparisons, and fiscal period-to-date calculations.

Metric Dependency Graphs

Explicit dependency tracking between metrics. If "profit margin" derives from "revenue" and "cost" metrics, Unity Catalog would visualize this relationship and prevent breaking changes to upstream metrics without updating dependents.

Analysts could explore metric lineage: "Show me all metrics that depend on customer segmentation logic" or "Which dashboards would break if I change this revenue calculation?"

Enhanced Natural Language Interfaces

Databricks Assistant will consume metric views as primary context for natural language queries. Instead of writing SQL, analysts ask "What was revenue growth by product category last quarter?" and the LLM generates correct MEASURE() queries.

The key difference from generic SQL generation: the LLM references certified metric definitions rather than guessing table joins and aggregation logic. This increases accuracy from ~40% (LLM writing its own SQL) to 85%+ (LLM using metric views).

Cross-Platform Semantic Standards

The Open Semantic Interchange initiative aims to standardize semantic layer definitions across data platforms. If OSI succeeds, exporting Unity Catalog metric definitions to other systems—or importing metrics defined elsewhere—would become seamless.

Teams could define metrics once and query them from Databricks, Snowflake, or BigQuery through a standard API. This portability reduces vendor lock-in and makes it easier to adopt best-of-breed tools without redefining business logic.

Current reality: metric views are Databricks-specific. Future possibility: define metrics in an open format that works across platforms.

MLOps Integration

Tighter coupling between metric views and MLflow feature stores. Define a metric once, use it as both a BI metric and an ML feature with automatic consistency tracking.

Data scientists could register metrics as features without duplicating calculation logic. When a metric definition changes, both dashboards and ML models update automatically, eliminating the "dashboard says X but the model uses Y" problem that plagues ML teams.

How Typedef Can Help

Migrating to Unity Catalog Metric Views requires clean, well-structured data feeding your lakehouse. Typedef provides semantic data processing that prepares unstructured data for analysis, ensuring your metric calculations run on reliable inputs.

For teams building AI-powered data pipelines, Typedef handles the data transformation and classification that sits upstream of your semantic layer. Clean data in means trustworthy metrics out.

Consolidating your semantic layer into Unity Catalog aligns analytics with the lakehouse model. Metrics become catalog objects that work seamlessly across SQL, notebooks, and ML pipelines—eliminating the fragmentation that external semantic layers create. For teams committed to Databricks, this consolidation reduces operational overhead while making metrics more accessible to both analysts and data scientists.