How to Migrate from Looker (LookML Semantic Layer) to Databricks Unity Catalog Metric Views

The semantic layer has become critical infrastructure for modern data organizations. As companies consolidate analytics workloads onto lakehouse platforms, the question shifts from "Do we need a semantic layer?" to "Where should our semantic layer live?" For organizations running Databricks as their primary data platform, moving semantic logic from LookML to Unity Catalog Metric Views represents an architectural realignment that brings metrics closer to data and unlocks capabilities impossible in BI-centric approaches.

What is a Semantic Layer Migration?

A semantic layer migration involves moving business metric definitions, dimensional models, and data relationships from one system to another while maintaining analytical continuity. This isn't simply copying code—it requires translating proprietary modeling languages, restructuring governance policies, and retraining users on new query patterns.

LookML semantic layers operate as modeling code that sits within a BI platform. The semantic layer lives as git-versioned files defining views (table abstractions), dimensions (attributes for slicing), measures (aggregations like SUM or COUNT), and explores (pre-joined analytical surfaces). When users query through the platform's interface, LookML generates SQL that executes on the connected warehouse.

Databricks Unity Catalog Metric Views take a different architectural approach by embedding the semantic layer directly into the data catalog as first-class objects. Metric views are registered to Unity Catalog alongside tables and views, defined in YAML format, and queried using SQL with a special MEASURE clause. Unlike LookML's BI-platform coupling, metric views integrate natively with the lakehouse architecture, making metrics accessible to SQL queries, Python notebooks, ML pipelines, and AI assistants without routing through a BI layer.

The migration challenge: translating years of accumulated business logic from LookML's proprietary syntax into Unity Catalog's structure while ensuring dashboards, reports, and analytical workflows continue functioning without disruption.

How Teams Handle LookML Semantic Layers Today

Organizations running LookML typically operate with a three-layer stack architecture. At the bottom sits the data warehouse—Snowflake, BigQuery, or Databricks—storing raw and transformed tables. The middle layer contains LookML definitions: view files mapping to warehouse tables, model files organizing explores by business domain, and derived tables implementing complex transformations. The top layer consists of the BI platform itself, where business users interact through graphical interfaces to build dashboards and explore data.

The typical workflow follows a clear division of responsibilities. Data engineers build tables using SQL or transformation frameworks, creating the foundation layer. Analytics engineers then model these tables in LookML, defining how non-technical users should interact with data. They specify which columns are dimensions versus measures, how tables join together, what aggregations make sense, and which fields require special formatting or calculations.

Business analysts consume this semantic layer through explore interfaces. They drag dimensions and measures into a workspace, apply filters, and visualize results—all without writing SQL. Behind the scenes, LookML generates optimized SQL that runs on the warehouse, returns results to the BI platform, and renders visualizations.

Version control and deployment happen through git workflows. LookML files live in repositories where changes go through pull request reviews before merging to production. Development branches allow testing new metrics or explores without affecting production dashboards. CI/CD pipelines deploy changes across environments, maintaining separation between development, staging, and production instances.

For multi-platform organizations, the pattern becomes more complex. A company using Snowflake for structured data and Databricks for ML workloads might maintain LookML pointing at Snowflake tables while data scientists in Databricks notebooks redefine the same metrics in Python. This creates parallel semantic layers that drift over time, with analysts and data scientists using different definitions for identical business concepts.

Data science teams work around the separation by maintaining their own metric definitions. They can't programmatically access LookML logic, so they reverse-engineer metric definitions by reading LookML files or asking analytics engineers to translate business rules into Python. This leads to duplicated code where the same "monthly recurring revenue" calculation exists in LookML for dashboards and in PySpark for ML feature engineering—maintained separately, often diverging in subtle but meaningful ways.

Problems with LookML-Based Semantic Layers

While LookML established semantic layer best practices and remains powerful within its domain, architectural limitations surface as organizations scale analytics beyond traditional BI use cases.

Platform Lock-in Creates Migration Friction

LookML syntax is proprietary and non-portable. The years of accumulated business logic encoded in view files, derived tables, and explore definitions exist in a format that only works within one ecosystem. Organizations can't export LookML to open standards or translate it automatically to other semantic layer formats.

The lock-in extends beyond syntax. LookML's abstractions—liquid templating, conditional logic in measures, extends syntax for reusable components—don't map cleanly to other systems. Even seemingly simple translations encounter edge cases: a LookML measure using "type: count_distinct" with a custom SQL expression requires careful restructuring when moving to systems that handle distinct counts differently.

This creates substantial switching costs. Companies evaluating different BI tools or data platforms must consider that changing requires rebuilding the entire semantic layer from scratch. The business logic that took years to accumulate must be manually reconstructed, with high risk of losing institutional knowledge embedded in LookML comments, naming conventions, and organizational patterns.

Disconnected from Lakehouse Workloads

The BI-platform architecture fundamentally separates semantic definitions from the data platform. Data scientists working in Databricks notebooks can't reference LookML metrics. They see tables and views in Unity Catalog, but the business logic defining "active user," "churn risk," or "customer lifetime value" lives elsewhere—inside the BI platform's proprietary format.

This creates metric inconsistencies across the organization. The executive dashboard shows Q3 revenue using one definition (LookML measure summing order totals), while the sales forecasting model uses another (Python code summing different columns with different filters). Both claim to represent "revenue," but subtle differences in logic lead to discrepancies that erode trust in data.

ML pipelines face particular challenges. Feature engineering often requires the exact aggregations and calculations already defined in LookML—average order value, purchase frequency, time since last order. But there's no programmatic way to invoke LookML logic from PySpark jobs. Data scientists must either reimplement metrics in Python (risking divergence) or extract pre-aggregated features through the BI platform's API (introducing performance overhead and architectural complexity).

The result: fragmented semantic layers. Analytics engineers maintain business logic in LookML for BI. Data scientists maintain parallel logic in Python for ML. Engineers maintain yet another version in SQL for operational pipelines. All three claim to implement the same metrics, but without shared definitions, they drift apart.

Query Latency and Architecture Overhead

Every LookML query follows a multi-hop path: user request hits the BI platform, LookML generator produces SQL, SQL transmits to the warehouse, warehouse executes and returns results, results transfer back through the BI platform, platform formats and displays data. For interactive dashboards this works acceptably, but for high-volume analytics or embedded applications, the round-trip overhead compounds.

The architecture can't leverage platform-specific optimizations. LookML generates generic SQL that works across multiple warehouse types, but this means missing Databricks-specific features like adaptive query execution, Delta Lake's data skipping, or Photon's vectorized processing. Queries that could run faster with platform-native optimizations instead execute as generic SQL, leaving performance on the table.

Caching becomes fragmented across layers. The BI platform caches some results, the warehouse caches others, but there's no unified caching strategy. A data scientist running a similar query in a notebook can't benefit from results cached by the BI platform, and vice versa. Each system maintains its own cache, duplicating work and storage.

Governance Complexity Across System Boundaries

Access control spans two systems: the BI platform and the warehouse. Users need permissions in both places, creating administrative overhead and potential security gaps. A user might have access to explore a model in the BI platform but lack warehouse permissions, or vice versa—leading to confusing "access denied" errors that require coordination between teams to resolve.

Row-level security becomes duplicated. If different users should see different data subsets, this must be implemented twice: once in warehouse row-level security policies and again in LookML access grants. Maintaining consistency between these parallel implementations requires careful coordination, with risk of mismatched filters exposing sensitive data or incorrectly restricting access.

Lineage tracking stops at the warehouse boundary. The BI platform can show which explores use which LookML views, but it can't trace data back through transformation pipelines to source systems. Organizations running transformation frameworks see two disconnected lineage graphs: one showing table dependencies in the warehouse, another showing explore relationships in LookML. Understanding end-to-end data flow—from source extraction through transformations to final dashboard—requires manually stitching together metadata from multiple systems.

Audit requirements become fragmented. Query activity appears in both BI platform logs (who queried which explore) and warehouse audit tables (what SQL executed). Compliance reporting requires combining data from both systems, with different logging formats, retention periods, and access patterns complicating analysis.

AI Integration Gaps

Modern AI features like natural language querying struggle with architecturally separated semantic layers. An AI assistant needs access to both semantic metadata (to understand that "revenue" means SUM of order_amount field) and the data platform (to execute queries efficiently). When these live in different systems, integration requires awkward bridges.

Databricks AI features—Assistant for query generation, Genie for conversational analytics—are optimized for Unity Catalog objects. They understand tables, views, and metric views registered in the catalog, reading metadata to generate accurate queries. But they can't natively consume LookML definitions living in a separate BI platform. Organizations wanting AI-powered analytics must either maintain duplicate semantic metadata or build custom integrations.

The problem compounds with retrieval-augmented generation patterns. LLMs generating SQL need access to accurate schema information, relationship definitions, and business logic. When this metadata lives in LookML rather than the data catalog, AI systems can't reliably reference it, leading to hallucinated table names, incorrect joins, or misunderstood business rules.

Making It Better: Migration to Unity Catalog Metric Views

Moving from LookML to Unity Catalog Metric Views eliminates architectural friction by consolidating the semantic layer into the data platform itself. Rather than maintaining business logic in a separate BI system, metrics become first-class catalog objects governed alongside tables and views.

Understand the Architectural Shift

The migration represents more than syntax translation—it's an architectural realignment. LookML lives in the BI layer, generating SQL that executes elsewhere. Metric views live in the data catalog, executing natively within Spark SQL with full access to platform optimizations.

This changes where semantic logic resides. Instead of YAML files in a git repository external to the data platform, metric definitions register to Unity Catalog as queryable objects. The governance model shifts from BI platform access controls to catalog-level RBAC, with row-level security and column masking flowing directly from table policies.

The query interface changes fundamentally. LookML users interact through graphical explore interfaces that hide SQL complexity. Metric view users write SQL with a special MEASURE clause that explicitly invokes metric computations. This trade-off exchanges point-and-click simplicity for programmatic flexibility—SQL queries work in notebooks, scheduled jobs, ML pipelines, and embedded applications without requiring BI platform connectivity.

Translate Core Semantic Concepts

LookML views map to metric view sources. A view file referencing a database table becomes a metric view with that table as its source. Dimensions defined in the view translate to dimension definitions in YAML format. The syntax changes but the concept remains: dimensions represent attributes for slicing and grouping data.

Measures require careful translation. LookML's measure types—sum, count, average, count_distinct—translate to SQL aggregate expressions. A LookML measure with "type: sum" on a field becomes a metric view measure with "expr: SUM(field_name)". More complex measures using custom SQL require preserving the exact logic while adapting syntax.

Explores with multiple joined views become metric views with join definitions. The join specifications need translation from LookML's relationship syntax to Unity Catalog's join format. Critical details include preserving join conditions, join types (left, inner, full), and the relationship cardinality that determines how aggregations behave.

Derived tables present special challenges. LookML allows defining views based on custom SQL rather than direct table references. These either need migration as Unity Catalog views first (then referenced by metric views) or direct embedding of the SQL in metric view source definitions. The choice depends on whether other processes need to query the derived logic independently.

Establish Data Foundation

Unity Catalog must contain the tables currently referenced by LookML. For data already in Databricks, this means registering tables to the catalog if they aren't already. For data in other warehouses, Lakehouse Federation can query external data in place, or replication brings data into Databricks storage.

Foreign key relationships defined implicitly through LookML joins should be made explicit in Unity Catalog. Define foreign key constraints on fact-to-dimension relationships, providing metadata the query engine uses to understand join paths. This enables automatic join construction and helps with query optimization.

Governance policies need establishment before creating metric views. Row-level security, column masking, and access grants should be configured on base tables. These policies then flow through to metric view queries automatically, ensuring consistent access control regardless of how data is accessed.

Create and Validate Metric Views

Metric view creation happens through Unity Catalog's interface or SQL DDL statements. Each metric view definition includes source specification (which table or SQL query provides data), dimension definitions (attributes for grouping), measure definitions (aggregations to compute), and optional join specifications (for multi-table metrics).

The YAML structure mirrors LookML concepts but uses different syntax. Dimension expressions use SQL rather than LookML's field references. Measure expressions use standard aggregate functions. Join conditions use SQL comparison operators. The translation is mechanical but requires careful attention to edge cases around null handling, date functions, and string operations.

Validation ensures accuracy. For critical metrics, run parallel queries: one through LookML (generating SQL to the warehouse) and another directly querying the new metric view. Compare results to verify translations preserve business logic. Common translation errors include incorrect join conditions causing row multiplication, distinct counts not wrapped properly, or date truncation functions producing different results.

Update Query Patterns

Dashboards and reports need modification to query metric views instead of LookML explores. The query syntax shifts from the BI platform's interface to SQL with MEASURE clauses. This requires rewriting dashboard SQL or, for tools with native Unity Catalog support, reconnecting to metric views as data sources.

The MEASURE clause explicitly invokes metric computation:

sql
SELECT
  customer_region,
  MEASURE(total_revenue),
  MEASURE(order_count)
FROM catalog.schema.sales_metrics
WHERE order_date >= '2025-01-01'
GROUP BY customer_region;

This syntax ensures the query engine applies proper aggregation semantics. Unlike standard views where aggregations can be incorrectly re-aggregated, MEASURE clauses signal that the metric definition should be applied at the current grain.

Python and notebook users gain direct metric access without routing through BI platforms. The same metrics available in dashboards become queryable programmatically, enabling consistent definitions across analytical and ML workloads.

Reconfigure Governance

Access control migrates from BI platform permissions to Unity Catalog GRANT statements. User groups receiving explore access in LookML should receive SELECT privileges on corresponding metric views. The advantage: permissions apply universally whether querying through SQL, Python, notebooks, or BI tools.

Row-level security translates from LookML access grants to Unity Catalog row filters. If LookML enforced user-specific filters through model access grants, implement equivalent logic using Unity Catalog's row filter syntax. The filters then apply automatically to all queries regardless of query interface.

Audit reporting shifts to Unity Catalog's logging. Query activity previously tracked in BI platform logs now appears in catalog audit tables. Compliance reporting needs updates to reference new audit sources, but gains completeness—every query against metric views logs consistently, whether from dashboards, notebooks, or applications.

Enable User Adoption

Training becomes essential as teams shift from graphical interfaces to SQL-based querying. The transition from drag-and-drop explores to writing SQL with MEASURE clauses requires skill development, particularly for business analysts accustomed to point-and-click workflows.

Documentation must migrate from LookML comments to Unity Catalog metadata. Descriptions, business context, calculation logic, and usage guidelines should populate metric view properties. This metadata feeds into AI assistants and catalog search, helping users understand available metrics.

AI integration provides a bridge for less-technical users. Databricks Assistant and Genie consume Unity Catalog metadata to enable natural language querying. Metric views with clear names, descriptions, and synonyms become accessible through conversational interfaces, reducing SQL requirements for basic analytical questions.

Future Direction and Industry Trends

The semantic layer is consolidating into data platforms rather than existing as separate BI-tool add-ons. This shift reflects broader architectural trends toward unified data platforms that handle storage, processing, analytics, and AI workloads under one governance model.

AI-native semantics will become table stakes. As organizations deploy more LLM-powered analytics—chatbots answering business questions, automated report generation, AI-assisted data exploration—the semantic layer becomes the foundation for accuracy. LLMs require structured metadata to generate correct queries, and metric views provide that foundation directly in the data catalog where AI systems naturally integrate.

Real-time semantic layers will extend current capabilities. Today's metric views operate on batch data, but streaming analytics require the same semantic consistency. Future extensions will likely support metric definitions over streaming tables, enabling governed KPIs on live data feeds without separate semantic logic for batch versus streaming workloads.

Metrics as features will blur analytical boundaries. The same "customer lifetime value" metric used in executive dashboards should be directly accessible as an ML feature in predictive models. Tighter integration between semantic layers and ML platforms will make this seamless, ensuring consistent metric definitions whether supporting human analysis or powering automated decision-making.

Cross-platform standards may reduce migration friction. Initiatives like Open Semantic Interchange aim to create vendor-neutral semantic layer formats. If successful, future migrations might involve exporting to standard YAML and importing to any compliant platform, reducing the lock-in that currently makes semantic layer migrations challenging.

Governance will become more granular. Current metric view permissions operate at the view level—users either have access to all metrics in a view or none. Future capabilities may support metric-level access control, allowing organizations to grant access to revenue metrics but not cost metrics within the same view, or apply different row filters based on which measure is being queried.

How Typedef Can Help

Semantic layer migrations involve complex data transformations, validation workflows, and metadata processing. Typedef provides infrastructure for building reliable data pipelines that power migration workflows. For organizations modernizing their data stack, having robust data processing capabilities accelerates semantic layer transitions and ongoing data operations on the lakehouse.