<< goback()

How to Migrate from AtScale to Databricks Unity Catalog Metric Views (Databricks)

Typedef Team

How to Migrate from AtScale to Databricks Unity Catalog Metric Views (Databricks)

Organizations running AtScale on Databricks can eliminate middleware infrastructure by migrating to Unity Catalog Metric Views. This technical guide covers extraction of AtScale semantic models, translation to YAML specifications, deployment of metric views, and integration with BI tools while maintaining business logic consistency.


When to Migrate to Native Metric Views

Infrastructure Elimination Benefits

AtScale operates as middleware between BI tools and data platforms. For organizations running exclusively on Databricks, this architecture introduces unnecessary layers:

  • Separate semantic engine instances requiring dedicated compute
  • Query virtualization overhead with network hops between tools, AtScale, and Databricks
  • Dual governance models requiring permission synchronization
  • Additional monitoring infrastructure for semantic layer operations

Unity Catalog Metric Views eliminate middleware by registering semantic definitions as first-class database objects. Metrics execute through native Spark SQL optimization without external translation.

Native Execution Advantages

Metric views execute directly in Databricks SQL without external query rewriting:

Query optimization: Spark SQL's Catalyst optimizer processes metric view queries at compilation time, enabling predicate pushdown, join reordering, and partition pruning that external tools cannot access.

Governance integration: Row-level security, column masking, and RBAC flow from Unity Catalog directly to metric views. No synchronization between systems.

Zero data movement: Queries execute on Delta Lake tables without caching layers or aggregate tables. AtScale's aggregate management creates data copies; metric views query source tables with intelligent optimization.

AI integration: Databricks Assistant and Genie spaces consume metric views natively for natural language analytics.

Migration Decision Factors

Migrate when:

  • Operating exclusively on Databricks (multi-platform environments require different approaches)
  • AtScale infrastructure creates query performance bottlenecks
  • Maintaining dual governance models increases operational complexity
  • Costs from dedicated AtScale instances exceed expected savings
  • Integration with Databricks ML and data science workflows becomes critical
  • Real-time streaming metrics cannot flow through AtScale's batch-oriented architecture

Organizations using Snowflake, BigQuery, or Redshift alongside Databricks face constraints since metric views operate only within Databricks.


Pre-Migration Assessment

AtScale Implementation Inventory

Document current deployment:

Semantic models:

  • Cube and virtual cube count
  • Measures per cube (aggregations, calculated measures, ratios)
  • Dimension hierarchies and drill paths
  • Relationships between fact and dimension tables
  • Aggregate tables managed by AtScale's autonomous optimization

Query patterns:

  • Most frequently queried measure/dimension combinations
  • Average query response times by cube
  • Concurrent user loads during peak hours
  • Queries depending on AtScale-specific optimizations (aggregate awareness, caching)

BI tool connections:

  • MDX protocol consumers (Excel, Power BI)
  • SQL/JDBC connections
  • REST API applications
  • Custom applications on AtScale endpoints

Migration Complexity Analysis

Direct translation cases:

  • Simple SUM, AVG, COUNT, MIN, MAX measures
  • Standard dimension hierarchies
  • Star schema with direct fact-to-dimension joins
  • SQL-based BI tool connections
  • Models without heavy aggregate optimization dependencies

Adaptation requirements:

  • Calculated measures using AtScale-specific functions
  • Models optimized for AtScale's aggregate awareness
  • Snowflake schemas with multi-hop dimension chains (requires Databricks Runtime 17.1+)
  • MDX-dependent Excel pivot tables
  • Dynamic security filters in AtScale

Architectural changes:

  • Sub-second query requirements depending on pre-computed aggregates
  • Applications using AtScale's autonomous aggregate creation
  • Programmatic access through AtScale APIs
  • Cross-platform queries spanning multiple data sources

Data Readiness Verification

Confirm Databricks tables meet requirements:

Schema validation:

  • Tables registered in Unity Catalog (not legacy Hive metastore)
  • Stable schemas with documented column types
  • Clear primary/foreign key relationships between fact and dimension tables
  • Delta format for optimal performance

Data quality:

  • No orphaned foreign key references
  • Consistent date formats across time dimensions
  • Appropriate numeric precision for measure columns
  • Documented null handling for critical measures

Performance baseline:

  • Direct queries against source tables without AtScale
  • Current query performance through AtScale for comparison
  • Queries benefiting most from AtScale aggregates
  • Existing query performance issues

Technical Requirements

Verify environment readiness:

  • Unity Catalog enabled across workspaces
  • Databricks Runtime 13.3 or higher (14.3+ recommended for snowflake schema support)
  • SQL warehouses configured for metric view queries
  • Appropriate compute sizing
  • Unity Catalog permissions allowing CREATE VIEW on target schemas

Conceptual Translation Patterns

Semantic Model Mapping

AtScale cubes map to metric views with terminology differences:

AtScale ElementMetric View ElementTranslation Notes
Virtual CubeMetric ViewUnity Catalog objects, not virtualization constructs
MeasureMeasureBoth use aggregate expressions; syntax differs
DimensionDimensionDefined as YAML expressions
HierarchyDimension referencesNo explicit hierarchy objects
RelationshipJoin specificationDefined in YAML, not separate relationship editor
Aggregate TableQuery optimizationNo physical aggregates; Delta caching and optimization
Calculated MemberDerived measureMeasures reference other measures

Measure Expression Translation

Simple aggregations:

AtScale syntax:

SUM([sales_amount])
AVG([order_value])
COUNT(DISTINCT [customer_id])

Metric view YAML:

yaml
measures:
  - name: total_sales
    expr: SUM(sales_amount)

  - name: avg_order_value
    expr: AVG(order_value)

  - name: unique_customers
    expr: COUNT(DISTINCT customer_id)

Ratio measures:

AtScale calculated members:

[Total Revenue] / [Order Count]

Metric view measures:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)

  - name: order_count
    expr: COUNT(order_id)

  - name: avg_order_value
    expr: SUM(order_amount) / COUNT(order_id)

Conditional calculations:

AtScale logic:

IIF([region] = "US", [revenue] * 1.0, [revenue] * 0.85)

Metric view expression:

yaml
measures:
  - name: adjusted_revenue
    expr: |
      SUM(CASE
        WHEN region = 'US' THEN revenue * 1.0
        ELSE revenue * 0.85
      END)

Dimension Translation

AtScale hierarchies flatten into dimension definitions:

AtScale hierarchy structure (Product → Category → Department):

Product Dimension
├─ Product Name
├─ Category
└─ Department

Metric view dimensions:

yaml
dimensions:
  - name: product_name
    expr: product.name

  - name: product_category
    expr: product.category

  - name: product_department
    expr: product.department

BI tools handle drill-down by selecting different dimension granularities rather than traversing hierarchies.

Join Specification Translation

AtScale relationship definitions map to YAML join specifications:

AtScale relationships (star schema):

Orders (Fact)
├─ Customers
├─ Products
└─ Dates

Metric view joins:

yaml
version: 1.1
source: main.sales.orders
joins:
  - name: customers
    source: main.sales.customers
    on: customers.customer_key = source.customer_key

  - name: products
    source: main.sales.products
    on: products.product_key = source.product_key

  - name: dates
    source: main.sales.date_dim
    on: dates.date_key = source.order_date_key

Snowflake schemas require multi-hop joins (Databricks Runtime 17.1+):

yaml
joins:
  - name: stores
    source: main.sales.stores
    on: stores.store_key = source.store_key

  - name: regions
    source: main.sales.regions
    on: regions.region_key = stores.region_key

Migration Execution Process

Phase 1: Extract AtScale Metadata

Export semantic model definitions from AtScale Design Center:

Model extraction:

  1. Open Design Center
  2. Select cube for migration
  3. Navigate to Semantic Layer tab
  4. Export as JSON or XML (SML format if available)
  5. Document measures, dimensions, relationships

Security configuration documentation:

  • Row-level filters
  • User group permissions per cube
  • Dynamic security expressions

Query pattern capture:

  • Export query logs
  • Identify frequently used measure/dimension combinations
  • Note performance-optimized queries

Phase 2: Prepare Databricks Tables

Verify source table structure:

sql
-- Validate schema
DESCRIBE EXTENDED main.sales.orders;

-- Check properties
SHOW TBLPROPERTIES main.sales.orders;

Declare primary and foreign key constraints for query optimization:

sql
ALTER TABLE main.sales.orders
ADD CONSTRAINT pk_orders PRIMARY KEY (order_id);

ALTER TABLE main.sales.customers
ADD CONSTRAINT pk_customers PRIMARY KEY (customer_id);

ALTER TABLE main.sales.orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (customer_id) REFERENCES main.sales.customers(customer_id);

Create staging views for AtScale business rules:

sql
CREATE OR REPLACE VIEW main.sales.orders_clean AS
SELECT
  order_id,
  customer_id,
  order_date,
  order_amount,
  CASE
    WHEN order_status = 'cancelled' THEN 0
    ELSE order_amount
  END AS revenue_amount
FROM main.sales.orders_raw
WHERE order_date >= '2020-01-01';

Phase 3: Create YAML Specifications

Translate AtScale definitions to metric view YAML:

Basic structure:

yaml
version: 1.1
source: main.sales.orders_clean
comment: "Revenue metrics from AtScale sales cube"

dimensions:
  - name: order_date
    expr: order_date
    comment: "Transaction date"

  - name: order_month
    expr: DATE_TRUNC('month', order_date)
    comment: "Month of transaction"

  - name: order_year
    expr: YEAR(order_date)
    comment: "Year of transaction"

measures:
  - name: total_revenue
    expr: SUM(revenue_amount)
    comment: "Sum of all order revenue"

  - name: order_count
    expr: COUNT(DISTINCT order_id)
    comment: "Number of orders"

  - name: avg_order_value
    expr: SUM(revenue_amount) / COUNT(DISTINCT order_id)
    comment: "Average revenue per order"

Include joins:

yaml
version: 1.1
source: main.sales.orders_clean
joins:
  - name: customers
    source: main.sales.customers
    on: customers.customer_id = source.customer_id

  - name: products
    source: main.sales.products
    on: products.product_id = source.product_id

dimensions:
  - name: customer_region
    expr: customers.region

  - name: customer_segment
    expr: customers.segment

  - name: product_category
    expr: products.category

measures:
  - name: total_revenue
    expr: SUM(revenue_amount)

Derived measures:

yaml
measures:
  - name: gross_revenue
    expr: SUM(order_amount)

  - name: discount_amount
    expr: SUM(discount)

  - name: refund_amount
    expr: SUM(refund)

  - name: net_revenue
    expr: gross_revenue - discount_amount - refund_amount

  - name: net_margin_pct
    expr: (net_revenue / gross_revenue) * 100

Phase 4: Deploy Metric Views

Create views using SQL DDL:

sql
CREATE VIEW main.sales.revenue_metrics AS $$
version: 1.1
source: main.sales.orders_clean
comment: "Revenue metrics"

joins:
  - name: customers
    source: main.sales.customers
    on: customers.customer_id = source.customer_id

dimensions:
  - name: customer_region
    expr: customers.region

  - name: order_date
    expr: order_date

  - name: order_month
    expr: DATE_TRUNC('month', order_date)

measures:
  - name: total_revenue
    expr: SUM(revenue_amount)
    comment: "Sum of order revenue"

  - name: avg_order_value
    expr: SUM(revenue_amount) / COUNT(DISTINCT order_id)
    comment: "Average order value"
$$;

Verify deployment:

sql
-- Confirm creation
SHOW VIEWS IN main.sales LIKE 'revenue_metrics';

-- View definition
DESCRIBE TABLE EXTENDED main.sales.revenue_metrics AS JSON;

-- Test query
SELECT
  customer_region,
  MEASURE(total_revenue),
  MEASURE(avg_order_value)
FROM main.sales.revenue_metrics
WHERE order_month >= '2024-01-01'
GROUP BY customer_region;

Phase 5: Configure Governance

Apply Unity Catalog permissions:

sql
-- Grant to analyst role
GRANT SELECT ON VIEW main.sales.revenue_metrics TO `analyst_role`;

-- Grant to users
GRANT SELECT ON VIEW main.sales.revenue_metrics TO `user@company.com`;

-- Revoke access
REVOKE ALL PRIVILEGES ON VIEW main.sales.cost_metrics FROM `contractor_role`;

Implement row-level security on source tables:

sql
-- Create filter
CREATE FUNCTION main.sales.region_filter(region STRING)
RETURN IF(
  current_user() IN ('global_manager@company.com'),
  TRUE,
  region = (SELECT home_region FROM main.sales.user_regions WHERE user_email = current_user())
);

-- Apply to table
ALTER TABLE main.sales.orders
SET ROW FILTER main.sales.region_filter ON (region);

Row filters automatically apply to metric view queries.

Enable audit logging:

sql
-- Query history
SELECT
  user_name,
  statement_text,
  execution_duration_ms,
  start_time
FROM system.query.history
WHERE statement_text LIKE '%main.sales.revenue_metrics%'
  AND start_time >= current_date() - 7
ORDER BY start_time DESC;

Phase 6: Update BI Connections

Tableau:

  1. Remove AtScale data source
  2. Add Databricks SQL Warehouse connector
  3. Connect to Unity Catalog
  4. Select metric view as data source
  5. Recreate calculated fields from AtScale hierarchies

Power BI:

Configure DirectQuery:

  1. Get connection parameters from SQL warehouse
  2. Use DirectQuery mode
  3. Create DAX measures calling metric views

Example DAX:

Revenue by Region =
EVALUATE
CALCULATETABLE(
    ADDCOLUMNS(
        SUMMARIZE(
            SQL.DATABASE(
                "SELECT customer_region, MEASURE(total_revenue) as revenue
                 FROM main.sales.revenue_metrics
                 GROUP BY customer_region"
            )
        ),
        "Revenue", [revenue]
    )
)

Excel:

AtScale provided MDX endpoints. Databricks requires ODBC:

  1. Install Databricks ODBC driver
  2. Configure DSN to SQL warehouse
  3. Create pivot table from external data source
  4. Query metric views using SQL

Phase 7: Validate Consistency

Create validation queries comparing metric view results to direct queries:

sql
WITH metric_view_results AS (
  SELECT
    customer_region,
    MEASURE(total_revenue) as revenue_mv,
    MEASURE(order_count) as orders_mv,
    MEASURE(avg_order_value) as aov_mv
  FROM main.sales.revenue_metrics
  WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
  GROUP BY customer_region
),
direct_query_results AS (
  SELECT
    c.region as customer_region,
    SUM(o.revenue_amount) as revenue_dq,
    COUNT(DISTINCT o.order_id) as orders_dq,
    SUM(o.revenue_amount) / COUNT(DISTINCT o.order_id) as aov_dq
  FROM main.sales.orders_clean o
  JOIN main.sales.customers c ON o.customer_id = c.customer_id
  WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
  GROUP BY c.region
)
SELECT
  COALESCE(mv.customer_region, dq.customer_region) as region,
  mv.revenue_mv,
  dq.revenue_dq,
  ABS(mv.revenue_mv - dq.revenue_dq) as revenue_diff,
  mv.orders_mv,
  dq.orders_dq
FROM metric_view_results mv
FULL OUTER JOIN direct_query_results dq
  ON mv.customer_region = dq.customer_region
WHERE ABS(mv.revenue_mv - dq.revenue_dq) > 0.01;

Check discrepancies:

  • Filter logic in metric view source
  • Join conditions matching relationships
  • Null handling in measure expressions
  • Date range interpretations
  • Distinct count implementations

Post-Migration Optimization

Performance Tuning

Enable Delta Lake optimizations:

sql
ALTER TABLE main.sales.orders
SET TBLPROPERTIES ('delta.autoOptimize.optimizeWrite' = 'true');

ALTER TABLE main.sales.orders
SET TBLPROPERTIES ('delta.autoOptimize.autoCompact' = 'true');

Create Z-order indexes:

sql
OPTIMIZE main.sales.orders
ZORDER BY (order_date, customer_id);

Analyze statistics:

sql
ANALYZE TABLE main.sales.orders COMPUTE STATISTICS;
ANALYZE TABLE main.sales.orders COMPUTE STATISTICS FOR ALL COLUMNS;

Monitor query performance:

sql
SELECT
  statement_text,
  user_name,
  execution_duration_ms / 1000 as duration_seconds,
  total_task_duration_ms / 1000 as task_duration_seconds,
  read_bytes / 1024 / 1024 as read_mb
FROM system.query.history
WHERE statement_text LIKE '%revenue_metrics%'
  AND execution_duration_ms > 5000
ORDER BY execution_duration_ms DESC
LIMIT 20;

Caching Strategies

Create materialized views for high-traffic patterns:

sql
CREATE MATERIALIZED VIEW main.sales.monthly_revenue_by_region AS
SELECT
  customer_region,
  order_month,
  MEASURE(total_revenue) as revenue,
  MEASURE(order_count) as orders
FROM main.sales.revenue_metrics
WHERE order_date >= '2020-01-01'
GROUP BY customer_region, order_month;

REFRESH MATERIALIZED VIEW main.sales.monthly_revenue_by_region;

Semantic Metadata

Add metadata for AI tool integration:

yaml
version: 1.1
source: main.sales.orders_clean

dimensions:
  - name: customer_region
    expr: customers.region
    comment: "Geographic region"
    metadata:
      display_name: "Customer Region"
      synonyms: ["region", "customer location", "geography"]

measures:
  - name: total_revenue
    expr: SUM(revenue_amount)
    comment: "Sum of order revenue"
    metadata:
      display_name: "Total Revenue"
      format: "$#,##0.00"
      synonyms: ["revenue", "sales", "total sales"]

Handling Migration Challenges

Aggregate Table Dependencies

AtScale manages aggregate tables automatically. Metric views don't replicate this automation.

Create pre-aggregated Delta tables:

sql
CREATE OR REPLACE TABLE main.sales.daily_revenue_agg AS
SELECT
  order_date,
  customer_id,
  product_id,
  SUM(revenue_amount) as revenue,
  COUNT(DISTINCT order_id) as order_count
FROM main.sales.orders_clean
GROUP BY order_date, customer_id, product_id;

CREATE VIEW main.sales.daily_revenue_metrics AS $$
version: 1.1
source: main.sales.daily_revenue_agg

dimensions:
  - name: order_date
    expr: order_date

measures:
  - name: total_revenue
    expr: SUM(revenue)

  - name: order_count
    expr: SUM(order_count)
$$;

Analyze query execution plans:

  1. Run slow query
  2. Open Query History
  3. Analyze execution plan
  4. Identify full table scans or large shuffles
  5. Add indexes or create targeted aggregates

Excel MDX Connectivity

Databricks doesn't support MDX protocol used by Excel pivot tables.

ODBC SQL connection:

  1. Install Databricks ODBC driver
  2. Configure connection to SQL warehouse
  3. Create pivot tables using SQL queries
  4. Reference metric views with MEASURE()

Migration to Power BI:

  • DirectQuery support for real-time data
  • Better metric view query handling
  • More robust refresh and caching

Cross-Platform Queries

Metric views operate only within Databricks.

Use Lakehouse Federation for external sources:

sql
SELECT
  d.product_id,
  MEASURE(mv.total_revenue) as databricks_revenue,
  s.snowflake_revenue
FROM main.sales.revenue_metrics mv
JOIN snowflake_prod.sales.revenue s
  ON mv.product_id = s.product_id
WHERE order_date >= '2024-01-01'
GROUP BY d.product_id, s.snowflake_revenue;

Replicate frequently joined external data:

sql
COPY INTO main.external.snowflake_products
FROM 'snowflake://account.snowflakecomputing.com/database/schema/products'
FILEFORMAT = PARQUET;

Semantic Data Processing with Fenic

Organizations migrating from AtScale often process unstructured data before aggregation. Fenic provides semantic processing that complements metric views.

Extract Structured Data from Unstructured Sources

Process unstructured content into Delta tables for metric view consumption:

python
import fenic as fc
from pydantic import BaseModel
from typing import Literal

class CustomerFeedback(BaseModel):
    sentiment: Literal["positive", "neutral", "negative"]
    product_category: str
    issue_type: str
    priority: Literal["low", "medium", "high"]

# Read from Parquet/CSV/JSON (Fenic's supported formats)
df = (
    fc.read_parquet("main.raw.support_tickets.parquet")
    .with_column(
        "structured_feedback",
        fc.semantic.extract(fc.col("ticket_text"), CustomerFeedback)
    )
)

# Note: Fenic doesn't have write_delta() - use standard I/O methods
# For Delta integration, you would need to convert to pandas/arrow and use Databricks APIs

Create metric views on structured output:

yaml
version: 1.1
source: main.processed.support_tickets

dimensions:
  - name: sentiment
    expr: sentiment

  - name: product_category
    expr: product_category

  - name: issue_type
    expr: issue_type

measures:
  - name: ticket_count
    expr: COUNT(*)

  - name: high_priority_count
    expr: COUNT_IF(priority = 'high')

  - name: negative_sentiment_pct
    expr: COUNT_IF(sentiment = 'negative') / COUNT(*) * 100

Details on semantic operators: Build Reliable AI Pipelines with Fenic's Semantic Operators.

Document Processing Pipelines

Handle markdown documentation:

python
docs_df = (
    fc.read_parquet("main.raw.documentation.parquet")
    .with_column("parsed_md", fc.col("content").cast(fc.MarkdownType))
    .with_column(
        "sections",
        fc.markdown.extract_header_chunks(fc.col("parsed_md"), header_level=2)
    )
    .explode("sections")
    .with_column(
        "section_category",
        fc.semantic.classify(
            fc.col("sections"),
            categories=["tutorial", "reference", "troubleshooting", "API"]
        )
    )
)

# For Delta output, convert and use Databricks APIs separately

Query through metric views:

sql
SELECT
  section_category,
  MEASURE(section_count) as sections,
  MEASURE(avg_section_length) as avg_length
FROM main.docs.documentation_metrics
WHERE publish_date >= '2024-01-01'
GROUP BY section_category;

Transcript Analysis

Process call transcripts:

python
from fenic import TranscriptType

class CallAnalysis(BaseModel):
    # Define your schema fields here
    resolution_status: str
    call_duration_seconds: int
    # ... other fields

calls_df = (
    fc.read_parquet("main.raw.call_transcripts.parquet")
    .with_column("transcript", fc.col("raw_transcript").cast(TranscriptType))
    .with_column(
        "transcript_analysis",
        fc.semantic.extract(fc.col("transcript"), CallAnalysis)
    )
)

# For Delta output, convert and use Databricks APIs separately

Build metric views:

yaml
version: 1.1
source: main.processed.call_analysis

dimensions:
  - name: call_date
    expr: DATE(call_timestamp)

  - name: agent_id
    expr: agent_id

  - name: resolution_status
    expr: resolution_status

measures:
  - name: total_calls
    expr: COUNT(*)

  - name: avg_handle_time
    expr: AVG(call_duration_seconds)

  - name: first_call_resolution_rate
    expr: COUNT_IF(resolution_status = 'resolved') / COUNT(*) * 100

Additional patterns: Create Composable Semantic Operators for Data Transformation.


Production Deployment Practices

Incremental Migration

Migrate in phases:

Phase 1: Non-critical metrics

  • Low-traffic metric views
  • Validate query patterns and performance
  • Establish deployment procedures

Phase 2: Department-specific metrics

  • One business unit at a time
  • Train users on query syntax
  • Gather feedback

Phase 3: Enterprise-critical metrics

  • High-traffic, mission-critical metrics
  • Run parallel systems temporarily
  • Validate consistency before cutover

Documentation

Document metric definitions:

yaml
version: 1.1
source: main.sales.orders_clean
comment: |
  Revenue metrics for sales reporting

  Migration notes:
  - Source: AtScale cube 'Sales_Analysis'
  - Migration date: 2024-12-01
  - Base logic unchanged
  - Validated against 2023-2024 data

  Owner: analytics-team@company.com
  Last updated: 2024-12-04

Track lineage:

AtScale MeasureMetric View MeasureNotes
[Total Sales]total_revenueDirect translation
[AOV]avg_order_valueChanged from calculated member to expression
[YoY Growth %]revenue_yoy_pctUses window calculation

Monitor Usage

Track metric view queries:

sql
CREATE OR REPLACE VIEW main.admin.metric_view_usage AS
SELECT
  DATE(start_time) as query_date,
  user_name,
  CASE
    WHEN statement_text LIKE '%revenue_metrics%' THEN 'revenue_metrics'
    WHEN statement_text LIKE '%customer_metrics%' THEN 'customer_metrics'
    ELSE 'other'
  END as metric_view,
  COUNT(*) as query_count,
  AVG(execution_duration_ms) / 1000 as avg_duration_seconds,
  SUM(read_bytes) / 1024 / 1024 / 1024 as total_gb_scanned
FROM system.query.history
WHERE statement_text LIKE '%MEASURE(%'
  AND start_time >= current_date() - 30
GROUP BY DATE(start_time), user_name,
  CASE
    WHEN statement_text LIKE '%revenue_metrics%' THEN 'revenue_metrics'
    WHEN statement_text LIKE '%customer_metrics%' THEN 'customer_metrics'
    ELSE 'other'
  END;

Disaster Recovery

Version control YAML:

bash
metric_views/
├── sales/
│   ├── revenue_metrics.yaml
│   ├── customer_metrics.yaml
│   └── product_metrics.yaml
├── marketing/
│   ├── campaign_metrics.yaml
│   └── attribution_metrics.yaml
└── finance/
    ├── cost_metrics.yaml
    └── margin_metrics.yaml

Automate deployment:

python
import yaml
from databricks.sdk import WorkspaceClient

def deploy_metric_view(yaml_path, catalog, schema):
    with open(yaml_path, 'r') as f:
        mv_yaml = yaml.safe_load(f)

    view_name = yaml_path.stem

    sql = f"""
    CREATE OR REPLACE VIEW {catalog}.{schema}.{view_name} AS $$
    {yaml.dump(mv_yaml)}
    $$;
    """

    w = WorkspaceClient()
    w.statement_execution.execute_statement(
        warehouse_id="<warehouse_id>",
        statement=sql
    )

Conclusion

Migrating from AtScale to Unity Catalog Metric Views eliminates middleware infrastructure while maintaining semantic consistency. The process requires extracting AtScale models, translating to YAML specifications, validating metric consistency, and updating BI tool connections.

Native Databricks metric views provide:

  • Unified governance through Unity Catalog
  • Direct query optimization via Spark SQL
  • Seamless AI/BI tool integration with Databricks Assistant and Genie
  • Reduced operational complexity
  • Native Delta Lake feature support

Migration follows clear phases: assessment, table preparation, YAML translation, deployment, connection updates, validation, and optimization.

For unstructured data processing before aggregation, Fenic's semantic operators create end-to-end pipelines from raw content to governed metrics. Additional integration patterns: Build Reliable AI Pipelines with Fenic's Semantic Operators and Create Composable Semantic Operators for Data Transformation.

The migration represents a shift from virtualized semantic layers to native Lakehouse capabilities, positioning organizations for AI-driven analytics on unified data platforms.

the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.