<< goback()

How to Migrate from Beagle to Databricks Unity Catalog Metric Views

Typedef Team

How to Migrate from Beagle to Databricks Unity Catalog Metric Views

Migrating from Beagle's conversational BI semantic layer to Databricks Unity Catalog Metric Views centralizes metric definitions directly in your data platform. Unity Catalog Metric Views provide governed, flexible metric definitions with native Databricks integration, eliminating the need for external semantic layer tools while gaining benefits of lakehouse-native performance, governance, and AI readiness.


Why migrate to Unity Catalog Metric Views

The case for platform-native semantic layers

Beagle operates as a middleware semantic layer - sitting between your data warehouse and consumption tools like Microsoft Teams. While this approach works for conversational analytics, it introduces architectural complexity:

External dependency overhead - Beagle requires separate infrastructure, API management, and integration maintenance between your data platform and end-user tools.

Governance fragmentation - Business logic lives outside your data catalog, creating separate systems for metric definitions and data lineage.

Limited flexibility - Metrics are optimized for conversational queries in Teams rather than flexible analytical workflows across notebooks, SQL editors, and BI tools.

Unity Catalog Metric Views take a different approach by bringing the semantic layer directly into the data platform. This lakehouse-native architecture provides several advantages over middleware solutions.

Unity Catalog Metric Views advantages

Single governance layer - Metrics live in Unity Catalog alongside tables and views, providing unified lineage, access control, and data discovery.

Flexible consumption - Define metrics once and query them from SQL warehouses, notebooks, dashboards, Genie spaces, and external BI tools without middleware translation.

Runtime dimension flexibility - Unlike static views, Metric Views separate measure definitions from dimension groupings, allowing ad-hoc slicing without recreating metric definitions.

Native performance - Queries execute directly on Spark SQL with Photon acceleration, avoiding the latency of external API calls and data transfer.

AI and ML integration - Metrics become available as features in ML pipelines and through Databricks Assistant for natural language queries.

Understanding the architectural shift

Beagle's semantic layer architecture

Beagle implements semantic understanding through:

Metadata layer - Stores organization-specific nomenclature and business terminology trained on user queries.

NLP processing - Translates natural language questions from Teams into structured queries against connected data sources.

Row-level security - Applies user-based filters and logical constraints before returning results.

Proactive nudges - Sends personalized notifications based on data thresholds and usage patterns.

The architecture centers on conversational access through Microsoft Teams, with the semantic layer acting as an intelligent query translator.

Unity Catalog Metric Views architecture

Metric Views shift the semantic layer into the catalog itself:

YAML-based definitions - Metrics are defined declaratively in YAML format specifying measures, dimensions, joins, and semantic metadata.

Unity Catalog registration - Metric views become first-class catalog objects with permissions, lineage, and audit logging.

Query rewriting engine - When queried, Spark SQL's optimizer rewrites MEASURE() clauses into optimized aggregations and joins.

Multi-level joins - Support both star and snowflake schemas with automatic join path resolution.

Semantic metadata - Display names, formats, and synonyms provide context for visualization and AI tools.

This architecture enables flexible querying across any dimension at runtime while maintaining centralized governance.

Pre-migration assessment

Audit your Beagle implementation

Before migrating, document your current Beagle setup:

Identify connected data sources - List all databases, tables, and views that Beagle queries. Unity Catalog requires these sources to be accessible from Databricks.

Document metric definitions - Extract business logic for key metrics including aggregation formulas, filters, and business rules. These become measure definitions in Metric Views.

Map user personas - Document which teams and roles use which metrics. This informs Unity Catalog permission structuring.

Catalog use cases - Identify primary workflows: sales performance tracking, operational monitoring, executive dashboards, etc. This guides metric view organization.

Note integration points - Document how Beagle connects to Power BI, Teams, and other tools. Plan equivalent integrations with Databricks SQL warehouses.

Evaluate Unity Catalog readiness

Confirm your Databricks environment meets migration requirements:

Unity Catalog migration - If still using Hive Metastore, plan migration to Unity Catalog first. Metric Views require Unity Catalog.

Runtime version - Databricks Runtime 17.1+ required for snowflake schema joins. Check cluster configurations.

Permission model - Review Unity Catalog privilege requirements. Users need CREATE TABLE on schemas and SELECT on source tables.

Compute resources - Ensure SQL warehouses are sized appropriately for metric query workloads.

Define migration scope

Prioritize metrics for migration:

Start with high-value metrics - Begin with the most frequently queried metrics like revenue, user counts, and conversion rates.

Identify metric dependencies - Map which metrics build on others. Migrate foundational metrics before derived metrics.

Group by domain - Organize metrics by business domain (Finance, Sales, Marketing) matching your organizational structure.

Plan phased rollout - Migrate one domain at a time rather than big-bang replacement.

Migrating metric definitions

Transform business logic to measure definitions

Convert Beagle's metric logic into Unity Catalog measure syntax:

Simple aggregations

Beagle conceptual logic:

Metric: Total Revenue
Aggregation: SUM(order_amount)
Filters: WHERE status = 'completed'

Unity Catalog measure:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)
    comment: "Sum of completed order amounts"

Ratio metrics

Beagle logic:

Metric: Average Order Value
Formula: Total Revenue / Order Count

Unity Catalog measure:

yaml
measures:
  - name: avg_order_value
    expr: SUM(order_amount) / COUNT(DISTINCT order_id)
    comment: "Average revenue per order"

The measure correctly computes numerator and denominator separately before division, avoiding incorrect ratio aggregations.

Distinct count metrics

Beagle logic:

Metric: Unique Customers
Aggregation: COUNT(DISTINCT customer_id)

Unity Catalog measure:

yaml
measures:
  - name: unique_customers
    expr: COUNT(DISTINCT customer_id)
    comment: "Number of distinct customers"

Complex derived metrics

Beagle logic:

Metric: Customer Lifetime Value
Formula: (Total Revenue - Costs) / Unique Customers

Unity Catalog approach - define base measures first:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)

  - name: total_costs
    expr: SUM(cost_amount)

  - name: unique_customers
    expr: COUNT(DISTINCT customer_id)

  - name: customer_ltv
    expr: (SUM(order_amount) - SUM(cost_amount)) / COUNT(DISTINCT customer_id)
    comment: "Net revenue per customer"

Map dimensions and hierarchies

Transform Beagle's dimension handling to Unity Catalog:

Time dimensions

Beagle typical usage:

"Show me revenue by month for last quarter"

Unity Catalog time dimensions:

yaml
dimensions:
  - name: order_date
    expr: DATE(order_timestamp)
    comment: "Order date"

  - name: order_month
    expr: DATE_TRUNC('month', order_timestamp)
    comment: "Order month"

  - name: order_quarter
    expr: DATE_TRUNC('quarter', order_timestamp)
    comment: "Order quarter"

  - name: order_year
    expr: YEAR(order_timestamp)
    comment: "Order year"

Define multiple time granularities as separate dimensions to enable flexible querying.

Geographic hierarchies

Beagle hierarchy:

Country > Region > State > City

Unity Catalog dimensions:

yaml
dimensions:
  - name: customer_city
    expr: customers.city

  - name: customer_state
    expr: customers.state

  - name: customer_region
    expr: customers.region

  - name: customer_country
    expr: customers.country

Join logic handles the dimensional relationships. Users can aggregate at any level.

Product hierarchies

Unity Catalog dimensional expressions:

yaml
dimensions:
  - name: product_name
    expr: products.name

  - name: product_category
    expr: products.category

  - name: product_department
    expr: products.department

Define join relationships

Unity Catalog Metric Views require explicit join specifications:

Star schema example

yaml
source: main.sales.fct_orders

joins:
  - name: customers
    source: main.sales.dim_customers
    on: fct_orders.customer_id = customers.customer_id

  - name: products
    source: main.sales.dim_products
    on: fct_orders.product_id = products.product_id

  - name: dates
    source: main.sales.dim_dates
    on: DATE(fct_orders.order_timestamp) = dates.date_key

Snowflake schema example

Multi-hop joins for normalized dimensions:

yaml
source: main.sales.fct_orders

joins:
  - name: stores
    source: main.sales.dim_stores
    on: fct_orders.store_id = stores.store_id

  - name: regions
    source: main.sales.dim_regions
    on: stores.region_id = regions.region_id

  - name: countries
    source: main.sales.dim_countries
    on: regions.country_id = countries.country_id

Metric Views automatically resolve multi-hop join paths when querying dimensions from nested tables.

Creating your first metric view

Basic metric view structure

Complete YAML definition for a sales metric view:

yaml
version: 1.1

source: main.sales.fct_orders

joins:
  - name: customers
    source: main.sales.dim_customers
    on: fct_orders.customer_id = customers.customer_id

dimensions:
  - name: order_date
    expr: DATE(order_timestamp)
    comment: "Date of order"

  - name: customer_region
    expr: customers.region
    comment: "Customer geographic region"

  - name: customer_segment
    expr: customers.segment
    comment: "Customer business segment"

measures:
  - name: total_revenue
    expr: SUM(order_amount)
    comment: "Total order revenue"

  - name: order_count
    expr: COUNT(order_id)
    comment: "Number of orders"

  - name: unique_customers
    expr: COUNT(DISTINCT customer_id)
    comment: "Number of distinct customers"

  - name: avg_order_value
    expr: SUM(order_amount) / COUNT(DISTINCT order_id)
    comment: "Average revenue per order"

Register metric view in Unity Catalog

Create the metric view using SQL:

sql
CREATE METRIC VIEW main.sales.sales_metrics
COMMENT 'Core sales performance metrics'
AS $$
version: 1.1

source: main.sales.fct_orders

joins:
  - name: customers
    source: main.sales.dim_customers
    on: fct_orders.customer_id = customers.customer_id

dimensions:
  - name: order_date
    expr: DATE(order_timestamp)

  - name: customer_region
    expr: customers.region

measures:
  - name: total_revenue
    expr: SUM(order_amount)

  - name: unique_customers
    expr: COUNT(DISTINCT customer_id)
$$;

Alternatively, create via Catalog Explorer UI:

  1. Navigate to Catalog Explorer
  2. Select target schema
  3. Click "Create" > "Metric View"
  4. Use YAML editor or visual builder
  5. Save and register

Query metric views

Query using MEASURE() clause for aggregations:

sql
SELECT
  customer_region,
  MEASURE(total_revenue) as revenue,
  MEASURE(unique_customers) as customers
FROM main.sales.sales_metrics
WHERE order_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY customer_region
ORDER BY revenue DESC;

Key query requirements:

  • All measures must use MEASURE() clause
  • Dimensions specified in GROUP BY or filtered
  • SELECT * not supported - explicitly name dimensions and measures
  • JOINs not allowed at query time - define in YAML

Migrating complex metric patterns

Time-based metrics and windows

Beagle often handles period-over-period comparisons through natural language. In Unity Catalog, implement using SQL:

Period-over-period comparison

sql
WITH current_period AS (
  SELECT
    customer_region,
    MEASURE(total_revenue) as revenue
  FROM main.sales.sales_metrics
  WHERE order_month = DATE_TRUNC('month', CURRENT_DATE)
  GROUP BY customer_region
),
prior_period AS (
  SELECT
    customer_region,
    MEASURE(total_revenue) as revenue
  FROM main.sales.sales_metrics
  WHERE order_month = DATE_TRUNC('month', ADD_MONTHS(CURRENT_DATE, -1))
  GROUP BY customer_region
)
SELECT
  c.customer_region,
  c.revenue as current_revenue,
  p.revenue as prior_revenue,
  (c.revenue - p.revenue) / p.revenue * 100 as pct_change
FROM current_period c
JOIN prior_period p ON c.customer_region = p.customer_region;

Rolling window aggregations

sql
SELECT
  order_date,
  customer_region,
  MEASURE(total_revenue) as daily_revenue,
  SUM(MEASURE(total_revenue)) OVER (
    PARTITION BY customer_region
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as rolling_7day_revenue
FROM main.sales.sales_metrics
GROUP BY order_date, customer_region
ORDER BY customer_region, order_date;

Cohort and retention metrics

Beagle might provide cohort analysis through custom reports. Implement in Unity Catalog using measures and SQL logic:

Customer cohort definition

yaml
dimensions:
  - name: cohort_month
    expr: DATE_TRUNC('month', customers.first_order_date)
    comment: "Month of customer's first order"

measures:
  - name: cohort_customers
    expr: COUNT(DISTINCT customer_id)
    comment: "Customers in cohort"

  - name: cohort_revenue
    expr: SUM(order_amount)
    comment: "Revenue from cohort"

Query cohort retention:

sql
WITH cohorts AS (
  SELECT
    cohort_month,
    MEASURE(cohort_customers) as initial_customers
  FROM main.sales.customer_metrics
  WHERE cohort_month >= '2024-01-01'
  GROUP BY cohort_month
),
active_by_month AS (
  SELECT
    cohort_month,
    order_month,
    MEASURE(cohort_customers) as active_customers
  FROM main.sales.customer_metrics
  WHERE order_month >= cohort_month
  GROUP BY cohort_month, order_month
)
SELECT
  c.cohort_month,
  a.order_month,
  c.initial_customers,
  a.active_customers,
  a.active_customers / c.initial_customers as retention_rate
FROM cohorts c
JOIN active_by_month a ON c.cohort_month = a.cohort_month
ORDER BY c.cohort_month, a.order_month;

Conditional metrics

Beagle might implement conditional logic implicitly. Make explicit in Metric Views:

yaml
measures:
  - name: high_value_orders
    expr: COUNT(CASE WHEN order_amount > 1000 THEN order_id END)
    comment: "Orders exceeding $1000"

  - name: conversion_rate
    expr: COUNT(CASE WHEN purchased = 1 THEN customer_id END) / COUNT(DISTINCT customer_id)
    comment: "Percentage of customers who purchased"

  - name: avg_purchase_frequency
    expr: COUNT(order_id) / COUNT(DISTINCT customer_id)
    comment: "Average orders per customer"

Implementing governance and security

Unity Catalog permission model

Set permissions on metric views:

sql
-- Grant read access to analyst role
GRANT SELECT ON METRIC VIEW main.sales.sales_metrics TO analysts;

-- Grant modify access to data engineering role
GRANT MODIFY ON METRIC VIEW main.sales.sales_metrics TO data_engineers;

-- Revoke access
REVOKE SELECT ON METRIC VIEW main.sales.sales_metrics FROM contractors;

Users query metric views without accessing underlying tables directly, providing governed data access.

Row-level security integration

Unity Catalog row-level security policies apply to metric views:

sql
-- Create row access policy
CREATE FUNCTION main.sales.customer_region_filter(region STRING)
RETURN IF(
  IS_ACCOUNT_GROUP_MEMBER('regional_managers'),
  region = current_user_region(),
  true
);

-- Apply to source table
ALTER TABLE main.sales.fct_orders
SET ROW FILTER main.sales.customer_region_filter(customer_region);

Filters automatically apply when querying metric views, ensuring users see only authorized data.

Column masking for sensitive data

Apply masking policies to underlying tables:

sql
-- Create masking function
CREATE FUNCTION main.sales.mask_customer_email(email STRING)
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('pii_authorized')
    THEN email
  ELSE REGEXP_REPLACE(email, '(.{2})[^@]+(@.*)', '$1***$2')
END;

-- Apply to column
ALTER TABLE main.sales.dim_customers
ALTER COLUMN email
SET MASK main.sales.mask_customer_email;

Masked columns flow through to metric view dimensions.

Audit and compliance tracking

Monitor metric view usage:

sql
-- Query audit logs
SELECT
  user_name,
  request_params.metric_view,
  event_time,
  action_name
FROM system.access.audit
WHERE request_params.metric_view = 'main.sales.sales_metrics'
  AND event_date >= CURRENT_DATE - 30
ORDER BY event_time DESC;

Track query patterns and access frequency:

sql
-- View insights in Catalog Explorer
-- Navigate to: Catalog > sales_metrics > Insights tab
-- Shows:
-- - Most frequent queries
-- - Top users by query count
-- - Query performance metrics

Data processing with Fenic

While Unity Catalog provides the semantic layer, Fenic can handle the data preparation and transformation workflows that feed metric views.

Processing unstructured data for metrics

Fenic's semantic operators extract structured data from unstructured sources:

python
import fenic as fc
from pydantic import BaseModel
from typing import List, Literal

class CustomerFeedback(BaseModel):
    sentiment: Literal["positive", "neutral", "negative"]
    product_category: str
    issue_type: str
    satisfaction_score: int

session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="feedback_processing",
        semantic=fc.SemanticConfig(
            language_models={
                "gpt": fc.OpenAILanguageModel(
                    model_name="gpt-4o",
                    rpm=500,
                    tpm=100000
                )
            },
            default_language_model="gpt"
        )
    )
)

# Extract structured feedback from text
feedback = (
    session.read.docs("s3://data/feedback/*.md", content_type="markdown")
    .with_column(
        "structured",
        fc.semantic.extract(fc.col("content"), CustomerFeedback)
    )
    .unnest("structured")
)

# Write to table for metric view consumption
feedback.write.save_as_table("main.analytics.customer_feedback", mode="overwrite")

This structured data becomes a source for sentiment metrics in Unity Catalog.

Enriching dimensional data

Use Fenic to enrich dimension tables with semantic classification:

python
from pydantic import BaseModel
from typing import Literal

class ProductEnrichment(BaseModel):
    primary_use_case: str
    target_audience: Literal["consumer", "business", "enterprise"]
    price_tier: Literal["budget", "mid", "premium"]

# Enrich product descriptions
products = (
    session.read.table("main.sales.dim_products")
    .with_column(
        "enrichment",
        fc.semantic.extract(fc.col("description"), ProductEnrichment)
    )
    .unnest("enrichment")
)

# Update dimension table
products.write.save_as_table("main.sales.dim_products", mode="overwrite")

Enriched dimensions provide additional slicing options in metric views.

Cleaning and standardizing data

Fenic's semantic operators normalize inconsistent data:

python
# Standardize customer segments
examples = fc.ClassifyExampleCollection()
examples.create_example(fc.ClassifyExample(input="SMB", output="Small Business"))
examples.create_example(fc.ClassifyExample(input="MM", output="Mid-Market"))
examples.create_example(fc.ClassifyExample(input="ENT", output="Enterprise"))

customers = (
    session.read.table("main.sales.dim_customers")
    .with_column(
        "standardized_segment",
        fc.semantic.classify(
            fc.col("raw_segment"),
            classes=["Small Business", "Mid-Market", "Enterprise"],
            examples=examples
        )
    )
)

customers.write.save_as_table("main.sales.dim_customers", mode="overwrite")

Standardized segments ensure consistent metric reporting.

Testing and validation

Validate metric calculations

Compare Beagle results to Unity Catalog Metric View results:

sql
-- Test total revenue calculation
WITH beagle_results AS (
  -- Historical results from Beagle exports
  SELECT
    '2024-Q4' as period,
    'APAC' as region,
    5250000 as beagle_revenue
),
uc_results AS (
  SELECT
    DATE_TRUNC('quarter', order_date) as period,
    customer_region as region,
    MEASURE(total_revenue) as uc_revenue
  FROM main.sales.sales_metrics
  WHERE order_date BETWEEN '2024-10-01' AND '2024-12-31'
    AND customer_region = 'APAC'
  GROUP BY period, region
)
SELECT
  b.period,
  b.region,
  b.beagle_revenue,
  u.uc_revenue,
  ABS(b.beagle_revenue - u.uc_revenue) as difference,
  ABS(b.beagle_revenue - u.uc_revenue) / b.beagle_revenue * 100 as pct_diff
FROM beagle_results b
JOIN uc_results u ON b.period = u.period AND b.region = u.region;

Investigate discrepancies to ensure metric definitions match.

Validate dimension relationships

Test join logic:

sql
-- Verify customer counts match across joins
SELECT
  COUNT(DISTINCT customer_id) as unique_customers,
  COUNT(*) as total_orders
FROM main.sales.fct_orders o
JOIN main.sales.dim_customers c ON o.customer_id = c.customer_id;

-- Compare to metric view
SELECT
  MEASURE(unique_customers) as unique_customers,
  MEASURE(order_count) as total_orders
FROM main.sales.sales_metrics;

Results should match, confirming correct join definitions.

Performance testing

Measure query performance:

sql
-- Enable query profiling
SET spark.sql.adaptive.enabled = true;

-- Run test query with timing
SELECT
  customer_region,
  order_month,
  MEASURE(total_revenue) as revenue,
  MEASURE(order_count) as orders
FROM main.sales.sales_metrics
WHERE order_date >= '2024-01-01'
GROUP BY customer_region, order_month;

-- Review query profile in SQL editor
-- Check execution time and data scanned

Optimize warehouse size based on typical query latency requirements.

Migrating integrations and workflows

Microsoft Teams integration

Unity Catalog Metric Views don't provide built-in Teams integration like Beagle. Implement using Databricks SQL:

Create SQL warehouse endpoint

  1. Create serverless SQL warehouse for low-latency queries
  2. Generate personal access token for authentication
  3. Configure appropriate auto-termination settings

Teams integration options

Option 1: Databricks Assistant - Natural language queries through Databricks workspace integrated with Teams via links

Option 2: Custom bot - Build Teams bot using Databricks SQL API:

python
import requests
import os

DATABRICKS_HOST = os.environ["DATABRICKS_HOST"]
DATABRICKS_TOKEN = os.environ["DATABRICKS_TOKEN"]

def query_metrics(question):
    # Convert natural language to SQL (using LLM)
    sql = generate_sql_from_question(question)

    # Execute via SQL API
    response = requests.post(
        f"{DATABRICKS_HOST}/api/2.0/sql/statements/",
        headers={"Authorization": f"Bearer {DATABRICKS_TOKEN}"},
        json={
            "statement": sql,
            "warehouse_id": "warehouse_id_here"
        }
    )

    return response.json()

Option 3: Scheduled reports - Use Databricks workflows to generate and send reports to Teams channels

Power BI integration

Connect Power BI directly to Unity Catalog Metric Views:

Setup connection

  1. Install Databricks ODBC driver
  2. Configure data source with SQL warehouse endpoint
  3. Authenticate using token or OAuth

Import metric view

// Power Query M code
let
    Source = Odbc.DataSource(
        "dsn=Databricks",
        [
            HierarchicalNavigation=true
        ]
    ),

    Database = Source{[Name="main"]}[Data],
    Schema = Database{[Name="sales"]}[Data],
    MetricView = Schema{[Name="sales_metrics"]}[Data],

    Query = Value.NativeQuery(
        MetricView,
        "SELECT
            customer_region,
            order_month,
            MEASURE(total_revenue) as revenue,
            MEASURE(order_count) as orders
         FROM main.sales.sales_metrics
         WHERE order_date >= '2024-01-01'
         GROUP BY customer_region, order_month"
    )
in
    Query

Create measures in Power BI that reference the imported metrics.

Dashboard migration

Migrate Beagle dashboards to Databricks AI/BI dashboards:

Create dashboard from metric view

  1. Navigate to Databricks workspace
  2. Select "New" > "Dashboard"
  3. Add visualizations using metric view as source:
sql
-- Revenue trend
SELECT
  order_month,
  MEASURE(total_revenue) as revenue
FROM main.sales.sales_metrics
WHERE order_date >= ADD_MONTHS(CURRENT_DATE, -12)
GROUP BY order_month
ORDER BY order_month;
  1. Configure filters, drill-downs, and formatting
  2. Share with appropriate users or groups

AI/BI dashboard features

  • Natural language query via Genie spaces
  • Automatic visualization suggestions
  • Dynamic filtering and drill-down
  • Scheduled refresh and email delivery
  • Embedded in external applications

Notebook integration

Access metric views from Databricks notebooks:

python
# Query metric view with PySpark
revenue_by_region = spark.sql("""
  SELECT
    customer_region,
    MEASURE(total_revenue) as revenue,
    MEASURE(unique_customers) as customers
  FROM main.sales.sales_metrics
  WHERE order_date >= CURRENT_DATE - INTERVAL 90 DAYS
  GROUP BY customer_region
""")

# Convert to Pandas for analysis
import pandas as pd
df = revenue_by_region.toPandas()

# Visualize
import matplotlib.pyplot as plt
df.plot(kind='bar', x='customer_region', y='revenue')
plt.show()

Notebooks provide exploratory analysis while maintaining metric consistency.

Post-migration optimization

Optimize metric view performance

Partition source tables

sql
-- Partition fact table by date
ALTER TABLE main.sales.fct_orders
PARTITION BY (order_date);

Queries with date filters automatically skip irrelevant partitions.

Create table statistics

sql
-- Analyze table for query optimization
ANALYZE TABLE main.sales.fct_orders COMPUTE STATISTICS;

-- Analyze specific columns
ANALYZE TABLE main.sales.dim_customers COMPUTE STATISTICS FOR COLUMNS
  (customer_id, region, segment);

Statistics help the query optimizer choose efficient execution plans.

Optimize join order

Unity Catalog automatically optimizes joins, but you can provide hints:

yaml
# In metric view YAML
joins:
  - name: customers  # Small dimension table - broadcast join
    source: main.sales.dim_customers
    on: fct_orders.customer_id = customers.customer_id

  - name: products  # Larger dimension - may shuffle
    source: main.sales.dim_products
    on: fct_orders.product_id = products.product_id

Monitor query patterns

Track metric view usage to identify optimization opportunities:

sql
-- Most expensive queries
SELECT
  query_text,
  execution_duration_ms,
  rows_produced,
  data_read_bytes / 1024 / 1024 as data_read_mb
FROM system.query.history
WHERE query_text LIKE '%sales_metrics%'
  AND start_time >= CURRENT_DATE - 7
ORDER BY execution_duration_ms DESC
LIMIT 20;

Optimize frequently-run expensive queries through table design or query rewriting.

Implement caching strategies

Result caching

Unity Catalog automatically caches identical queries. Encourage reuse:

sql
-- Materialize common aggregations as views
CREATE VIEW main.sales.monthly_revenue AS
SELECT
  order_month,
  customer_region,
  MEASURE(total_revenue) as revenue
FROM main.sales.sales_metrics
GROUP BY order_month, customer_region;

Subsequent queries benefit from cached computation.

Delta cache

Configure SQL warehouse with Delta caching for hot data:

Warehouse Settings:
- Enable Delta caching: Yes
- Cache size: Automatically managed

Frequently accessed data stays in local SSD cache for faster access.

Handling the transition period

Run parallel systems

During migration, run Beagle and Unity Catalog in parallel:

Validation period workflow

  1. Keep Beagle operational for production queries
  2. Migrate metric definitions to Unity Catalog
  3. Run validation queries comparing results
  4. Identify and resolve discrepancies
  5. Gradually shift users to Unity Catalog

Communication plan

  • Announce migration timeline to stakeholders
  • Provide training on Unity Catalog metric views
  • Share query syntax documentation
  • Offer office hours for questions

Gradual user migration

Migrate users by domain or role:

Phase 1: Data team migration

  • Analysts and data engineers use Unity Catalog first
  • Validate metric accuracy and performance
  • Refine based on feedback

Phase 2: Business intelligence migration

  • Migrate dashboard creators and report builders
  • Update scheduled reports and dashboards
  • Ensure BI tool connections work properly

Phase 3: End user migration

  • Transition business users from Beagle to new tools
  • Provide self-service query templates
  • Support natural language queries via Genie

Decommission Beagle

Once migration completes:

Pre-decommission checklist

  • All critical metrics migrated and validated
  • All users transitioned to new tools
  • Documentation updated with new query patterns
  • Integration points migrated or replaced
  • Historical data archived if needed

Decommission steps

  1. Announce final shutdown date
  2. Disable new user access to Beagle
  3. Monitor for any remaining usage
  4. Export historical audit logs
  5. Shut down Beagle infrastructure
  6. Cancel licenses and subscriptions

Common migration challenges and solutions

Challenge: Complex custom calculations

Issue - Beagle might implement proprietary calculation logic not easily translated to SQL.

Solution - Break down complex calculations into intermediate measures:

yaml
measures:
  # Base measures
  - name: gross_revenue
    expr: SUM(order_amount)

  - name: returns
    expr: SUM(CASE WHEN order_type = 'return' THEN order_amount ELSE 0 END)

  - name: discounts
    expr: SUM(discount_amount)

  # Derived measure
  - name: net_revenue
    expr: SUM(order_amount) -
          SUM(CASE WHEN order_type = 'return' THEN order_amount ELSE 0 END) -
          SUM(discount_amount)

Query derived measures or calculate at query time using CTEs.

Challenge: User-specific customizations

Issue - Beagle provides personalized metric views based on user roles.

Solution - Implement using Unity Catalog permissions and views:

sql
-- Create role-specific views
CREATE VIEW main.sales.regional_manager_metrics AS
SELECT
  order_date,
  customer_region,
  MEASURE(total_revenue) as revenue,
  MEASURE(order_count) as orders
FROM main.sales.sales_metrics
WHERE customer_region = current_user_region()
GROUP BY order_date, customer_region;

-- Grant access by role
GRANT SELECT ON VIEW main.sales.regional_manager_metrics TO regional_managers;

Challenge: Natural language query capability

Issue - Users accustomed to natural language queries in Beagle may resist SQL syntax.

Solution - Leverage Databricks Genie and AI/BI features:

Genie spaces provide natural language query interface:

  1. Create Genie space linked to metric view
  2. Configure display names and synonyms in metric view YAML
  3. Users ask questions in natural language
  4. Genie translates to SQL and returns results

Semantic metadata helps AI understand metrics:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)
    comment: "Total order revenue"
    metadata:
      display_name: "Revenue"
      format: "currency"
      synonyms: ["sales", "income", "total sales"]

Challenge: Integration gaps

Issue - Third-party tools integrated with Beagle may not support Unity Catalog.

Solution - Use Databricks SQL API as universal interface:

python
# Generic API wrapper for any tool
import requests

def execute_metric_query(sql, warehouse_id):
    response = requests.post(
        f"{DATABRICKS_HOST}/api/2.0/sql/statements/",
        headers={"Authorization": f"Bearer {DATABRICKS_TOKEN}"},
        json={
            "statement": sql,
            "warehouse_id": warehouse_id,
            "disposition": "INLINE"
        }
    )

    result = response.json()
    return result['result']['data_array']

Build custom integrations using REST API for tools lacking native connectors.

Measuring migration success

Define success metrics

Track migration progress and effectiveness:

Adoption metrics

  • Percentage of users querying Unity Catalog Metric Views
  • Query volume on new system vs Beagle
  • Number of migrated metric definitions
  • Number of connected BI tools and dashboards

Performance metrics

  • Average query latency
  • Query failure rate
  • Compute cost per query
  • Cache hit rate

Business metrics

  • Time to create new metrics
  • Metric definition inconsistencies (should decrease)
  • User satisfaction scores
  • Number of support tickets related to metrics

Establish monitoring

Set up alerts and dashboards:

sql
-- Create monitoring dashboard
CREATE DASHBOARD main.analytics.metric_migration_dashboard AS

-- Query volume trend
SELECT
  DATE(event_time) as date,
  COUNT(*) as query_count
FROM system.access.audit
WHERE request_params.metric_view IS NOT NULL
GROUP BY date
ORDER BY date;

-- Top metric views by usage
SELECT
  request_params.metric_view,
  COUNT(*) as usage_count
FROM system.access.audit
WHERE event_date >= CURRENT_DATE - 30
GROUP BY metric_view
ORDER BY usage_count DESC
LIMIT 10;

-- Performance by metric view
SELECT
  metric_view_name,
  AVG(execution_duration_ms) as avg_duration_ms,
  MAX(execution_duration_ms) as max_duration_ms
FROM system.query.history
WHERE metric_view_name IS NOT NULL
GROUP BY metric_view_name;

Continuous improvement

Iterate based on usage patterns:

Regularly review:

  • Slow-running queries for optimization
  • Frequently queried combinations for view materialization
  • Permission issues or access patterns
  • Feature requests from users

Quarterly assessment:

  • Evaluate new Unity Catalog features
  • Review metric definitions for accuracy
  • Audit permissions and governance policies
  • Update documentation and training materials

Conclusion

Migrating from Beagle to Unity Catalog Metric Views represents a shift from middleware-based semantic layers to lakehouse-native metric governance. Unity Catalog provides enterprise-grade governance, flexible query capabilities, and native integration with the Databricks ecosystem while eliminating the operational overhead of external semantic layer tools.

The migration requires careful planning, systematic metric translation, and thorough validation. By following the structured approach outlined here - auditing your current implementation, transforming metric definitions, implementing governance, and gradually transitioning users - you can successfully migrate while maintaining business continuity.

Unity Catalog Metric Views integrate naturally with the broader Databricks platform, enabling consistent metric definitions across SQL queries, notebooks, dashboards, and AI-powered analytics. This unified approach reduces metric sprawl, improves governance, and accelerates insight generation.

For organizations processing complex unstructured data as part of their metric pipelines, Fenic provides complementary capabilities for extraction, classification, and transformation, ensuring clean, structured data feeds your Unity Catalog metric views.

For additional resources:

the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.