How to Migrate from Holistics to Databricks Unity Catalog Metric Views

The semantic layer landscape has shifted. As lakehouse architectures mature, teams face a critical decision: maintain metrics in external BI platforms or consolidate them into the data platform itself. For organizations running Databricks, this means evaluating whether to continue with standalone semantic tools or adopt Unity Catalog Metric Views as the native metrics layer.

What This Migration Involves

This migration moves metric definitions from an external semantic modeling platform into Databricks Unity Catalog as first-class catalog objects. Instead of metrics living in a separate tool that queries your lakehouse, they become native Databricks assets—queryable from SQL, notebooks, ML pipelines, and AI agents.

Holistics provides a semantic layer that sits between data warehouses and business users. It offers metric definitions, relationship modeling, and visualization in one integrated platform. Teams define business logic through a combination of YAML configurations and SQL expressions managed in the Holistics interface.

Unity Catalog Metric Views take a different approach. Metrics are catalog objects registered directly in Databricks, executed by Spark SQL and Photon, and governed through Unity Catalog's security model. They work across all Databricks workloads without middleware.

The migration transforms this:

Databricks Tables → External Tool Connection → External Semantic Layer → Dashboards

Into this:

Databricks Tables → Unity Catalog Metric Views → All Consumers (SQL, Notebooks, ML, AI)

How Teams Handle Semantic Layers Today

Most organizations operate semantic layers as standalone systems separate from their data platforms.

The External Tool Pattern

Teams adopt specialized semantic layer platforms to create a governed metrics layer. These tools connect to data warehouses, provide modeling interfaces, and expose metrics through proprietary APIs or embedded dashboards. Analysts define metrics in the tool's environment, which then generates queries against the underlying warehouse.

The workflow typically involves:

Data engineers build tables in the lakehouse
Analytics engineers model those tables in the semantic platform
Business analysts create metrics using the platform's syntax
End users query metrics through the platform's interface

Metric Definition Approaches

Semantic platforms use various modeling paradigms. Some employ YAML-based configurations where analysts declare measures, dimensions, and relationships. Others use SQL-like languages with extensions for semantic concepts. A few provide visual modeling interfaces that generate underlying code.

Regardless of syntax, the pattern remains consistent: metric logic lives outside the data warehouse, in a separate metadata layer managed by the semantic tool.

Multi-System Reality

Data teams rarely use just one tool. Data scientists work in notebooks, analysts use SQL editors, ML engineers build pipelines, and increasingly, AI agents query data programmatically. Each system needs metric access, creating several patterns:

Some teams maintain the semantic platform as the single source of truth, forcing all metric queries through its API. This centralizes definitions but creates a bottleneck.

Others duplicate metric logic across systems. The "revenue" calculation exists in the semantic platform, in notebook libraries, in ML feature pipelines, and in custom applications. Teams accept the duplication burden to avoid API dependencies.

A third group attempts synchronization—extracting metric definitions from the semantic platform and translating them into code for other systems. This requires custom tooling and constant maintenance as definitions evolve.

Problems with External Semantic Layers on Lakehouses

Operating semantic layers outside the data platform creates friction that compounds as organizations scale.

Metric Fragmentation

When metric definitions live in one system while data science, ML, and analytics happen in another, drift becomes inevitable. A data scientist builds a churn prediction model using "monthly recurring revenue" calculated in a notebook. The finance dashboard shows different MRR numbers because the semantic platform applies different filters or handles edge cases differently.

This isn't a documentation problem. Even with perfect runbooks, maintaining identical complex calculations across Python, SQL, and semantic platform syntax requires constant vigilance. As soon as one calculation updates without synchronized changes everywhere else, numbers diverge and trust erodes.

Governance Boundaries

Unity Catalog provides comprehensive data governance: row-level security that filters data based on user identity, column masking that protects sensitive fields, audit logs that track every access, and certification workflows that mark trusted assets. None of this extends to external semantic tools.

If a table has row-level security restricting users to their own region's data, that protection only applies when querying the table directly. When users query through the semantic platform, engineers must recreate those access rules in that platform's permission system. Now the same security policy exists in two places, maintained separately, with the risk they diverge.

Audit and compliance become fragmented. Unity Catalog knows which tables were accessed but not which metrics were computed. The semantic platform knows which metrics were queried but not the underlying data lineage. Pulling a complete picture requires correlating logs across systems.

Performance Gaps

Semantic platforms generate SQL that runs on Databricks, but they can't fully optimize for Databricks-specific capabilities. Photon's vectorized execution engine expects certain query patterns. Delta Lake provides Z-ordering and file statistics that enable aggressive data skipping. Adaptive Query Execution dynamically adjusts plans based on runtime statistics.

External query generators produce generic SQL that works across multiple warehouses. This portability comes at a cost—queries don't take full advantage of Databricks optimizations. A native metric view leverages Spark SQL's Catalyst optimizer with deep knowledge of Delta Lake statistics and cluster configurations.

More critically, metrics computed through external tools aren't reusable by other Databricks workloads. That carefully tuned customer lifetime value metric powering executive dashboards can't feed an ML model training pipeline without recreating the calculation.

Integration Complexity

Modern data stacks involve notebooks for exploration, SQL editors for ad-hoc analysis, orchestration tools for pipelines, BI platforms for visualization, and increasingly, AI agents for natural language queries. Each tool needing metric access faces a choice:

Connect through the semantic platform's API, adding network latency, authentication complexity, and another failure point. Or duplicate the metric logic locally, accepting drift risk and maintenance overhead.

Building agentic applications becomes particularly challenging. AI agents need programmatic access to metric definitions to generate accurate queries. External semantic platforms often lack APIs designed for agentic workflows, forcing workarounds.

Operational Overhead

Each additional system in the stack adds operational burden. The semantic platform requires:

Separate authentication and connection management
Dedicated monitoring and alerting
Platform-specific troubleshooting when performance degrades
Coordination between Databricks upgrades and semantic tool compatibility
Vendor relationship management and license tracking

When Databricks releases new capabilities—lakehouse federation, AI functions, enhanced security features—adopting them requires waiting for vendor support rather than immediate utilization.

Making It Better: Lakehouse-Native Metrics

Unity Catalog Metric Views consolidate the semantic layer into the data platform, eliminating the boundaries that create fragmentation.

Unified Metric Definitions

Metrics become catalog objects alongside tables, views, and volumes. Register them with SQL DDL:

sql
CREATE METRIC VIEW main.finance.revenue_metrics AS
SELECT
  SUM(order_amount) AS total_revenue,
  COUNT(DISTINCT order_id) AS order_count,
  customers.region AS customer_region
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id;

Or define them in YAML and register from files:

yaml
metric_view:
  name: revenue_metrics
  source_table: main.sales.orders
  measures:
    - name: total_revenue
      expr: SUM(order_amount)
  dimensions:
    - name: customer_region
      expr: customers.region

These definitions are versioned, documented, and discoverable through Unity Catalog's metadata APIs—the same way teams already manage tables and views.

Query Consistency Across Workloads

SQL editors, notebooks, BI tools, and ML pipelines all reference the same metric definitions using MEASURE() syntax:

sql
SELECT
  customer_region,
  MEASURE(total_revenue),
  MEASURE(order_count)
FROM main.finance.revenue_metrics
WHERE order_month >= '2024-01-01'
GROUP BY customer_region;

Data scientists training models use identical logic:

python
features_df = spark.sql("""
  SELECT
    customer_id,
    MEASURE(lifetime_revenue) as ltv,
    MEASURE(purchase_frequency) as frequency
  FROM main.ml.customer_metrics
""")

BI dashboards, notebook analyses, and ML features share the same calculation logic, eliminating drift.

Integrated Governance

Unity Catalog's security model applies automatically. If the underlying orders table has row-level security filtering by region, those filters apply to metric view queries without additional configuration. Column masking on sensitive fields flows through transparently.

Certification workflows extend to metrics:

sql
ALTER METRIC VIEW main.finance.revenue_metrics
SET TBLPROPERTIES ('certified' = 'true', 'owner' = 'finance_analytics');

Audit logs track metric usage alongside table access, providing complete lineage from raw data through metrics to business decisions.

Performance Optimization

Metric view queries execute through Spark SQL with full access to Databricks optimizations. Photon accelerates aggregations through vectorized execution. Delta Lake's file statistics enable data skipping that eliminates reading irrelevant partitions. Adaptive Query Execution adjusts join strategies based on runtime data distribution.

For frequently queried metrics at fixed dimensions, materialized views can pre-compute results:

sql
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
  order_month,
  MEASURE(total_revenue)
FROM main.finance.revenue_metrics
GROUP BY order_month;

But the flexibility of metric views means most use cases don't require materialization—ad-hoc slicing performs well enough.

Migration Pathway

Moving from external semantic layers to Unity Catalog follows a pattern:

Export current metric definitions from the existing platform. Most provide YAML or JSON exports of their semantic models. Translate these to Unity Catalog metric view syntax—simple aggregations map directly, while derived metrics may require expression adjustments.

Register metric views in Unity Catalog and run parallel validation. Query the same metrics through both systems and compare results. Discrepancies usually stem from subtle differences in filter logic or join behavior that need reconciliation.

Update downstream consumers to query Unity Catalog directly. SQL-based tools can connect via JDBC and use MEASURE() syntax. Notebooks reference metric views through standard Spark SQL. Applications call Databricks SQL APIs instead of semantic platform APIs.

The migration can happen incrementally, domain by domain. Start with finance metrics, validate thoroughly, then move to marketing, product, and operations. This phased approach reduces risk while maintaining business continuity.

The Lakehouse Semantic Layer Future

The industry is moving toward platform-native semantic layers rather than standalone tools.

AI-Native Metrics

Large language models need structured access to metric definitions to answer business questions accurately. When metrics exist as catalog objects, AI agents can query metadata to understand available metrics, their calculations, and appropriate usage:

User: "Show Q4 revenue growth by product category"
AI Agent:
  → Queries system.information_schema.metrics
  → Identifies main.finance.revenue_metrics
  → Finds MEASURE(total_revenue) definition
  → Generates query with correct time filters and grouping
  → Returns accurate results

This grounding prevents hallucinations that occur when LLMs attempt to write SQL without knowledge of metric semantics. Real-time context engineering becomes straightforward when metrics are catalog-accessible.

Cross-Workload Metric Reuse

As organizations consolidate data science, ML, and analytics onto unified platforms, metric reuse across workloads becomes essential. The "customer lifetime value" metric should mean the same thing whether it's:

Displayed on an executive dashboard
Used as a feature in a churn prediction model
Referenced in a Python notebook during exploratory analysis
Returned by a chatbot answering business questions

Lakehouse-native metrics enable this consistency without custom synchronization infrastructure.

Federated Semantic Layers

Unity Catalog's data sharing capabilities will extend to metrics. Organizations will share metric definitions with partners and customers without replicating underlying data:

sql
CREATE SHARE partner_metrics;
ALTER SHARE partner_metrics ADD METRIC VIEW main.finance.public_revenue_metrics;

Business partners query your metrics through their own Databricks environments, seeing only what you've explicitly shared, with all access logged for audit.

Governance at Scale

As metric catalogs grow to thousands of definitions across dozens of domains, automated governance becomes critical. Metrics inherit access controls from source tables automatically. Lineage visualization shows the path from raw data through transformations to metrics to dashboards. Certification workflows ensure only validated metrics appear in production environments.

Intelligent tagging systems will categorize metrics by domain, sensitivity, and usage pattern automatically, making discovery intuitive even in large catalogs.

Convergence with Feature Stores

The line between business metrics and ML features is blurring. Both represent curated, versioned calculations over raw data. Databricks Feature Store and metric views will likely merge conceptually, with unified interfaces for defining calculations that serve both analytics and ML use cases.

This convergence eliminates the current duplication where data scientists maintain separate feature logic that duplicates business metrics already defined elsewhere.

How Typedef Can Help

Migrating semantic layers requires careful data transformation and validation to ensure downstream systems continue functioning correctly. Typedef provides infrastructure for building reliable AI pipelines that handle the data processing workflows involved in semantic layer migrations and ongoing metric operations.

The shift from external semantic platforms to lakehouse-native metric views represents architectural consolidation. Instead of maintaining metric definitions in a separate system with its own security model, query interface, and operational requirements, metrics become catalog objects governed through the same mechanisms as tables and views. Teams that complete this transition report improved metric consistency, reduced operational overhead, and faster adoption of new platform capabilities.