<< goback()

How to Migrate from Kyvos Semantic Layer to Databricks Unity Catalog Metric Views

Typedef Team

How to Migrate from Kyvos Semantic Layer to Databricks Unity Catalog Metric Views

Kyvos implements traditional OLAP cubes with pre-aggregated MOLAP storage and MDX query interfaces. Databricks Unity Catalog Metric Views operate as catalog-native objects with on-demand Spark SQL computation. Migration requires translating cube definitions to YAML metric views, converting MDX to SQL, flattening hierarchies, and reimplementing time intelligence. Typedef's Fenic framework provides semantic extraction and data transformation capabilities for preprocessing migration artifacts and validating measure logic during the migration process.


Architecture Comparison

Kyvos Smart OLAP Architecture

Kyvos operates as a semantic performance layer sitting between data sources and BI tools:

Pre-aggregation approach:

  • Smart Aggregation technology creates managed MOLAP cubes
  • AI-powered selection of aggregation patterns
  • Pre-computed measures stored in columnar format
  • Automatic aggregation maintenance based on query patterns

Storage modes:

  • MOLAP: Fully pre-aggregated with optimized storage
  • ROLAP: Direct queries to underlying tables
  • HOLAP: Hybrid combining both approaches

Query interfaces:

  • MDX (Multidimensional Expressions) for OLAP operations
  • SQL for relational access patterns
  • DAX support for Power BI semantic models
  • Custom connectors for Tableau, Excel, and other BI tools

Semantic model structure:

  • Star and snowflake schema optimization
  • Ragged and unbalanced hierarchies
  • Parent-child hierarchies
  • Calculated members and measures
  • Row and column-level security

Databricks Unity Catalog Metric Views Architecture

Databricks embeds metrics directly in the lakehouse catalog:

Catalog-native implementation:

  • Metric views as first-class Unity Catalog objects
  • YAML or SQL DDL definitions
  • No separate semantic layer infrastructure
  • Metadata stored in catalog, not pre-aggregated cubes

Computation model:

  • On-demand query execution via Spark SQL
  • Photon vectorized engine for performance
  • Delta Lake optimizations (data skipping, Z-ordering, liquid clustering)
  • Result caching for repeated queries

Query interface:

  • MEASURE() clause for metric invocation
  • Standard Spark SQL syntax
  • No MDX support

Governance integration:

  • Unity Catalog RBAC applies to metric views
  • Row-level security and column masking inherited from base tables
  • Lineage tracking in catalog metadata
  • Certification and tagging capabilities

Fundamental Differences

AspectKyvosDatabricks Metric Views
ArchitectureSeparate semantic layerCatalog-native objects
StoragePre-aggregated cubesDelta tables with on-demand aggregation
Query LanguageMDX, SQL, DAXSpark SQL only
HierarchiesNative support for ragged/parent-childRequires flattening
Time IntelligenceBuilt-in functionsManual implementation required
Refresh ModelScheduled cube refreshReal-time on underlying Delta tables
Security ModelSemantic layer securityUnity Catalog RBAC

Pre-Migration Assessment

Inventory Kyvos Semantic Models

Extract cube metadata:

Document all Kyvos cubes including:

  • Dimension definitions and hierarchies
  • Measure formulas and aggregation types
  • Calculated members
  • Security rules and filters
  • Refresh schedules
  • Storage mode (MOLAP/ROLAP/HOLAP)

Classify measure types:

Simple aggregations:

SUM(order_amount)
COUNT(order_id)
AVG(unit_price)

Calculated measures:

[Net Revenue] = [Gross Revenue] - [Discounts] - [Refunds]
[Profit Margin] = ([Revenue] - [Cost]) / [Revenue]

Semi-additive measures:

DISTINCT COUNT(customer_id)
AVERAGE(account_balance) -- across time dimension only

Time-based calculations:

YTD([Revenue])
ParallelPeriod([Date].[Year], 1, [Date].[Month])

Categorize by translation difficulty:

Measure TypeTranslation ApproachNotes
Simple SUM/COUNT/AVGDirect SQL translationStraightforward
Calculated measuresSQL expressionsMay require restructuring
Distinct countsCOUNT(DISTINCT ...)Requires careful join handling
Semi-additiveWindow clausesManual time dimension logic
MDX time intelligenceCustom SQL with date dimensionNo direct equivalent

Analyze Source Data Structure

Map Kyvos cubes to Delta tables:

For each cube, identify:

  • Source fact tables
  • Dimension tables
  • Join relationships
  • Data freshness requirements
  • Volume and partitioning strategy

Example mapping:

Kyvos Cube: Sales_Analysis_Cube
├── Fact: orders table (500M rows)
├── Dimension: customers (2M rows)
├── Dimension: products (50K rows)
├── Dimension: date (3,650 days)
└── Dimension: stores (500 locations)

Target Databricks Structure:
├── main.sales.orders (Delta table, partitioned by order_date)
├── main.sales.customers (Delta table)
├── main.sales.products (Delta table)
├── main.common.date_dimension (Delta table)
└── main.sales.stores (Delta table)

Assess BI Tool Dependencies

Document query patterns:

Analyze how BI tools currently access Kyvos:

  • MDX queries from Excel/Power BI
  • SQL passthrough from Tableau
  • Custom connector usage
  • Dashboard refresh frequencies
  • User access patterns

Identify breaking changes:

Current Kyvos FeatureDatabricks EquivalentImpact
Automatic drill-downManual dimension level selectionBI tools require query updates
MDX TOPCOUNT()SQL LIMIT with ORDER BYQuery rewrite required
Parent-child hierarchiesFlattened dimension tablesLoss of dynamic hierarchy navigation
Cube browsing in ExcelSQL Warehouse connectionDifferent connection mechanism

Migration Process

Step 1: Create Delta Table Foundation

Establish star schema in Unity Catalog:

sql
-- Create catalog and schema structure
CREATE CATALOG IF NOT EXISTS main;
CREATE SCHEMA IF NOT EXISTS main.sales;
CREATE SCHEMA IF NOT EXISTS main.common;

-- Fact table
CREATE TABLE main.sales.orders (
    order_id STRING NOT NULL,
    order_date DATE NOT NULL,
    customer_id STRING NOT NULL,
    product_id STRING NOT NULL,
    store_id STRING NOT NULL,
    order_amount DECIMAL(18,2),
    quantity INT,
    discount_amount DECIMAL(18,2),
    refund_amount DECIMAL(18,2),
    tax_amount DECIMAL(18,2)
)
USING DELTA
PARTITIONED BY (order_date)
LOCATION 's3://lakehouse/sales/orders';

-- Dimension tables
CREATE TABLE main.sales.customers (
    customer_id STRING NOT NULL PRIMARY KEY,
    customer_name STRING,
    region STRING,
    segment STRING,
    registration_date DATE
) USING DELTA;

CREATE TABLE main.sales.products (
    product_id STRING NOT NULL PRIMARY KEY,
    product_name STRING,
    category STRING,
    subcategory STRING,
    brand STRING,
    unit_cost DECIMAL(18,2)
) USING DELTA;

Apply Delta Lake optimizations:

sql
-- Z-ordering for common filter patterns
OPTIMIZE main.sales.orders
ZORDER BY (customer_id, product_id);

-- Enable liquid clustering for evolving access patterns
ALTER TABLE main.sales.orders
CLUSTER BY (order_date, customer_id);

-- Collect statistics
ANALYZE TABLE main.sales.orders COMPUTE STATISTICS FOR ALL COLUMNS;

Step 2: Translate Cube Definitions to Metric Views

Basic metric view structure:

sql
CREATE METRIC VIEW main.sales.sales_metrics AS
SELECT
  -- Measures from Kyvos cube
  SUM(order_amount) AS total_revenue,
  SUM(order_amount - discount_amount - refund_amount) AS net_revenue,
  SUM(order_amount - discount_amount - refund_amount - (quantity * products.unit_cost)) AS gross_profit,
  COUNT(DISTINCT order_id) AS order_count,
  COUNT(DISTINCT customer_id) AS unique_customers,
  SUM(quantity) AS total_quantity,

  -- Calculated measures
  SUM(order_amount) / COUNT(DISTINCT order_id) AS avg_order_value,
  (SUM(order_amount - discount_amount - refund_amount - (quantity * products.unit_cost)) / SUM(order_amount)) * 100 AS profit_margin_pct,
  SUM(order_amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer,

  -- Dimensions from Kyvos cube
  orders.order_date,
  customers.region,
  customers.segment,
  products.category,
  products.subcategory,
  products.brand

FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
JOIN main.sales.products ON orders.product_id = products.product_id;

YAML definition approach:

yaml
metric_view:
  name: sales_metrics
  catalog: main
  schema: sales

  source_table: main.sales.orders

  joins:
    - table: main.sales.customers
      on: orders.customer_id = customers.customer_id
      type: inner

    - table: main.sales.products
      on: orders.product_id = products.product_id
      type: inner

  measures:
    - name: total_revenue
      expr: SUM(order_amount)
      description: "Total order revenue including discounts"
      data_type: decimal

    - name: net_revenue
      expr: SUM(order_amount - discount_amount - refund_amount)
      description: "Revenue after discounts and refunds"
      data_type: decimal

    - name: order_count
      expr: COUNT(DISTINCT order_id)
      description: "Number of unique orders"
      data_type: bigint

    - name: avg_order_value
      expr: SUM(order_amount) / COUNT(DISTINCT order_id)
      description: "Average revenue per order"
      data_type: decimal

  dimensions:
    - name: order_date
      expr: order_date
      data_type: date

    - name: customer_region
      expr: customers.region
      data_type: string

    - name: product_category
      expr: products.category
      data_type: string

Step 3: Convert MDX Measures to SQL

Simple aggregation translation:

Kyvos MDX:
[Measures].[Total Revenue]
AGGREGATE: SUM
SOURCE: [order_amount]

Databricks SQL:
SUM(order_amount) AS total_revenue

Calculated measure translation:

Kyvos MDX:
[Measures].[Profit Margin] =
  ([Measures].[Revenue] - [Measures].[Cost]) / [Measures].[Revenue]

Databricks SQL:
(SUM(order_amount) - SUM(quantity * unit_cost)) / SUM(order_amount) AS profit_margin

Distinct count translation:

Kyvos MDX:
[Measures].[Customer Count]
AGGREGATE: DistinctCount
DIMENSION: [Customer].[Customer ID]

Databricks SQL:
COUNT(DISTINCT customer_id) AS customer_count

Ratio measure translation:

Kyvos MDX:
[Measures].[Units Per Order] =
  [Measures].[Total Quantity] / [Measures].[Order Count]

Databricks SQL:
SUM(quantity) / COUNT(DISTINCT order_id) AS units_per_order

Time intelligence translation:

Kyvos MDX:
[Measures].[YTD Revenue] =
  SUM(YTD([Date].[Calendar].[Date]), [Measures].[Revenue])

Databricks SQL:
SUM(order_amount) OVER (
  PARTITION BY YEAR(order_date)
  ORDER BY order_date
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS ytd_revenue

Step 4: Flatten Hierarchical Dimensions

Product hierarchy example:

Kyvos Hierarchy:
[Product].[Category].[Subcategory].[Product Name]
- Electronics
  - Computers
    - Laptop-15inch-Intel
    - Desktop-Tower-AMD
  - Mobile Devices
    - Smartphone-5G

Databricks flattening approach:

sql
-- Create flattened dimension table
CREATE TABLE main.sales.products_flat AS
SELECT
  product_id,
  product_name,
  -- Level 1: Category
  category,
  -- Level 2: Subcategory
  subcategory,
  -- Level 3: Product name already captured

  -- Composite keys for intermediate rollups
  category AS level1_key,
  CONCAT(category, '|', subcategory) AS level2_key,
  CONCAT(category, '|', subcategory, '|', product_name) AS level3_key,

  -- Display labels
  category AS level1_label,
  subcategory AS level2_label,
  product_name AS level3_label
FROM main.sales.products;

-- Use in metric view
CREATE METRIC VIEW main.sales.product_hierarchy_metrics AS
SELECT
  SUM(order_amount) AS total_revenue,

  -- Multiple dimension columns for different hierarchy levels
  p.category,
  p.subcategory,
  p.product_name,
  p.level2_key,  -- For category + subcategory rollups
  p.level3_key   -- For full hierarchy path

FROM main.sales.orders o
JOIN main.sales.products_flat p ON o.product_id = p.product_id;

Date hierarchy implementation:

sql
-- Create comprehensive date dimension
CREATE TABLE main.common.date_dimension (
    date_key DATE PRIMARY KEY,

    -- Day level
    day_of_week INT,
    day_name STRING,
    day_of_month INT,
    day_of_year INT,

    -- Week level
    week_of_year INT,
    week_start_date DATE,
    week_end_date DATE,

    -- Month level
    month INT,
    month_name STRING,
    month_abbr STRING,
    month_start_date DATE,
    month_end_date DATE,

    -- Quarter level
    quarter INT,
    quarter_name STRING,
    quarter_start_date DATE,
    quarter_end_date DATE,

    -- Year level
    year INT,

    -- Fiscal calendar
    fiscal_year INT,
    fiscal_quarter INT,
    fiscal_month INT,

    -- Business day indicators
    is_weekend BOOLEAN,
    is_holiday BOOLEAN,
    is_business_day BOOLEAN
) USING DELTA;

-- Populate date dimension
INSERT INTO main.common.date_dimension
SELECT
  d.date_key,
  DAYOFWEEK(d.date_key) AS day_of_week,
  DATE_FORMAT(d.date_key, 'EEEE') AS day_name,
  DAYOFMONTH(d.date_key) AS day_of_month,
  DAYOFYEAR(d.date_key) AS day_of_year,
  WEEKOFYEAR(d.date_key) AS week_of_year,
  DATE_TRUNC('week', d.date_key) AS week_start_date,
  DATE_ADD(DATE_TRUNC('week', d.date_key), 6) AS week_end_date,
  MONTH(d.date_key) AS month,
  DATE_FORMAT(d.date_key, 'MMMM') AS month_name,
  DATE_FORMAT(d.date_key, 'MMM') AS month_abbr,
  DATE_TRUNC('month', d.date_key) AS month_start_date,
  LAST_DAY(d.date_key) AS month_end_date,
  QUARTER(d.date_key) AS quarter,
  CONCAT('Q', QUARTER(d.date_key)) AS quarter_name,
  DATE_TRUNC('quarter', d.date_key) AS quarter_start_date,
  LAST_DAY(ADD_MONTHS(DATE_TRUNC('quarter', d.date_key), 2)) AS quarter_end_date,
  YEAR(d.date_key) AS year,
  -- Fiscal year logic (example: fiscal year starts July 1)
  CASE
    WHEN MONTH(d.date_key) >= 7 THEN YEAR(d.date_key)
    ELSE YEAR(d.date_key) - 1
  END AS fiscal_year,
  CASE
    WHEN MONTH(d.date_key) >= 7 THEN ((MONTH(d.date_key) - 7) DIV 3) + 1
    ELSE ((MONTH(d.date_key) + 5) DIV 3) + 1
  END AS fiscal_quarter,
  CASE
    WHEN MONTH(d.date_key) >= 7 THEN MONTH(d.date_key) - 6
    ELSE MONTH(d.date_key) + 6
  END AS fiscal_month,
  CASE WHEN DAYOFWEEK(d.date_key) IN (1, 7) THEN TRUE ELSE FALSE END AS is_weekend,
  FALSE AS is_holiday,  -- Populate separately
  CASE WHEN DAYOFWEEK(d.date_key) NOT IN (1, 7) THEN TRUE ELSE FALSE END AS is_business_day
FROM (
  SELECT EXPLODE(SEQUENCE(DATE('2020-01-01'), DATE('2030-12-31'), INTERVAL 1 DAY)) AS date_key
) d;

Step 5: Implement Time Intelligence

Period-over-period calculations:

sql
CREATE METRIC VIEW main.sales.time_comparison_metrics AS
SELECT
  SUM(order_amount) AS current_revenue,

  -- Prior period comparison
  LAG(SUM(order_amount), 1) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.month
  ) AS prior_month_revenue,

  -- Calculate month-over-month growth
  (SUM(order_amount) - LAG(SUM(order_amount), 1) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.month
  )) / LAG(SUM(order_amount), 1) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.month
  ) AS mom_growth_rate,

  -- Year-over-year comparison
  LAG(SUM(order_amount), 12) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.month
  ) AS prior_year_revenue,

  -- Dimensions
  date_dim.year,
  date_dim.month,
  date_dim.month_name,
  customers.region AS customer_region

FROM main.sales.orders
JOIN main.common.date_dimension date_dim ON orders.order_date = date_dim.date_key
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
GROUP BY date_dim.year, date_dim.month, date_dim.month_name, customers.region;

Year-to-date calculations:

sql
CREATE METRIC VIEW main.sales.ytd_metrics AS
SELECT
  -- Current period revenue
  SUM(order_amount) AS revenue,

  -- YTD revenue using window clause
  SUM(SUM(order_amount)) OVER (
    PARTITION BY date_dim.year, customer_region
    ORDER BY date_dim.date_key
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS ytd_revenue,

  -- MTD revenue
  SUM(SUM(order_amount)) OVER (
    PARTITION BY date_dim.year, date_dim.month, customer_region
    ORDER BY date_dim.date_key
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS mtd_revenue,

  -- QTD revenue
  SUM(SUM(order_amount)) OVER (
    PARTITION BY date_dim.year, date_dim.quarter, customer_region
    ORDER BY date_dim.date_key
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS qtd_revenue,

  -- Dimensions
  date_dim.date_key,
  date_dim.year,
  date_dim.quarter,
  date_dim.month,
  customers.region AS customer_region

FROM main.sales.orders
JOIN main.common.date_dimension date_dim ON orders.order_date = date_dim.date_key
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
GROUP BY date_dim.date_key, date_dim.year, date_dim.quarter, date_dim.month, customers.region;

Moving average calculations:

sql
CREATE METRIC VIEW main.sales.moving_average_metrics AS
SELECT
  SUM(order_amount) AS daily_revenue,

  -- 7-day moving average
  AVG(SUM(order_amount)) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.date_key
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS ma_7day,

  -- 30-day moving average
  AVG(SUM(order_amount)) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.date_key
    ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
  ) AS ma_30day,

  -- 90-day moving average
  AVG(SUM(order_amount)) OVER (
    PARTITION BY customer_region
    ORDER BY date_dim.date_key
    ROWS BETWEEN 89 PRECEDING AND CURRENT ROW
  ) AS ma_90day,

  date_dim.date_key,
  customers.region AS customer_region

FROM main.sales.orders
JOIN main.common.date_dimension date_dim ON orders.order_date = date_dim.date_key
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
GROUP BY date_dim.date_key, customers.region;

Step 6: Configure Security and Governance

Row-level security implementation:

sql
-- Apply row filters on dimension tables
CREATE OR REPLACE TABLE main.sales.customers (
    customer_id STRING PRIMARY KEY,
    customer_name STRING,
    region STRING,
    segment STRING,
    sales_rep STRING
) USING DELTA;

-- Create row access policy
CREATE FUNCTION main.sales.region_filter(region STRING)
RETURNS BOOLEAN
RETURN EXISTS (
  SELECT 1
  FROM main.security.user_region_access
  WHERE user_email = current_user()
    AND authorized_region = region
);

-- Apply filter to table
ALTER TABLE main.sales.customers
SET ROW FILTER main.sales.region_filter(region);

-- Metric views automatically inherit row-level security
CREATE METRIC VIEW main.sales.regional_sales AS
SELECT
  SUM(order_amount) AS revenue,
  COUNT(DISTINCT order_id) AS order_count,
  customers.region
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id;

Column masking:

sql
-- Mask sensitive customer information
CREATE FUNCTION main.sales.mask_customer_name(name STRING)
RETURNS STRING
RETURN CASE
  WHEN is_member('sensitive_data_access') THEN name
  ELSE CONCAT(LEFT(name, 1), '***')
END;

ALTER TABLE main.sales.customers
ALTER COLUMN customer_name
SET MASK main.sales.mask_customer_name;

Grant metric view access:

sql
-- Grant access to specific roles
GRANT SELECT ON METRIC VIEW main.sales.sales_metrics TO `sales_analyst`;
GRANT SELECT ON METRIC VIEW main.sales.finance_metrics TO `finance_team`;

-- Grant access to all metric views in schema
GRANT SELECT ON SCHEMA main.sales TO `executive_team`;

Add certification metadata:

sql
ALTER METRIC VIEW main.sales.sales_metrics
SET TBLPROPERTIES (
  'certified' = 'true',
  'owner' = 'sales_analytics',
  'last_validated' = '2025-01-15',
  'validation_status' = 'passed',
  'source_system' = 'migrated_from_kyvos'
);

-- Add discovery tags
ALTER METRIC VIEW main.sales.sales_metrics
SET TAGS (
  'domain' = 'sales',
  'sensitivity' = 'internal',
  'pii' = 'false',
  'refresh_frequency' = 'real_time'
);

Step 7: Optimize for Query Performance

Implement materialized aggregations:

sql
-- Create materialized view for frequently queried aggregations
CREATE MATERIALIZED VIEW main.sales.daily_sales_rollup AS
SELECT
  order_date,
  customer_region,
  product_category,
  SUM(order_amount) AS daily_revenue,
  COUNT(DISTINCT customer_id) AS daily_customers,
  COUNT(DISTINCT order_id) AS daily_orders,
  SUM(quantity) AS daily_quantity
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
JOIN main.sales.products ON orders.product_id = products.product_id
GROUP BY order_date, customer_region, product_category;

-- Query materialized view instead of raw data
SELECT
  order_date,
  customer_region,
  SUM(daily_revenue) AS revenue
FROM main.sales.daily_sales_rollup
WHERE order_date >= '2024-01-01'
GROUP BY order_date, customer_region;

Configure table optimization:

sql
-- Enable auto-optimize for Delta tables
ALTER TABLE main.sales.orders
SET TBLPROPERTIES (
  'delta.autoOptimize.optimizeWrite' = 'true',
  'delta.autoOptimize.autoCompact' = 'true'
);

-- Schedule regular optimization
CREATE OR REPLACE PROCEDURE optimize_sales_tables()
LANGUAGE SQL
AS
BEGIN
  OPTIMIZE main.sales.orders ZORDER BY (order_date, customer_id);
  OPTIMIZE main.sales.customers;
  OPTIMIZE main.sales.products;
END;

-- Create job to run optimization
CREATE JOB daily_optimization
SCHEDULE CRON '0 2 * * *'  -- 2 AM daily
AS
CALL optimize_sales_tables();

Implement caching strategy:

sql
-- Delta cache automatically handles frequently accessed data
-- Query results cache for repeated identical queries

-- Force cache refresh when needed
REFRESH METRIC VIEW main.sales.sales_metrics;

-- Monitor cache effectiveness
SELECT
  table_name,
  cache_hit_ratio,
  bytes_cached,
  bytes_read
FROM system.information_schema.table_cache_statistics
WHERE table_schema = 'sales'
ORDER BY cache_hit_ratio ASC;

Step 8: Migrate BI Tool Connections

Tableau connection update:

Before (Kyvos):
Server: kyvos-server.company.com
Port: 8080
Connection Type: Kyvos OLAP
Authentication: Kyvos credentials

After (Databricks):
Server: company.databricks.com
HTTP Path: /sql/1.0/warehouses/warehouse_id
Connection Type: Databricks
Authentication: Personal Access Token

Query pattern translation:

sql
-- Kyvos MDX query pattern:
-- SELECT [Measures].[Revenue] ON COLUMNS,
-- [Date].[Year].[Year].MEMBERS ON ROWS
-- FROM [Sales_Cube]
-- WHERE [Product].[Category].[Electronics]

-- Databricks SQL equivalent:
SELECT
  date_dim.year,
  MEASURE(total_revenue) AS revenue
FROM main.sales.sales_metrics
JOIN main.common.date_dimension date_dim
  ON sales_metrics.order_date = date_dim.date_key
WHERE product_category = 'Electronics'
GROUP BY date_dim.year
ORDER BY date_dim.year;

Power BI semantic model update:

Power BI DirectQuery Configuration:

1. Remove Kyvos connector
2. Install Databricks connector from AppSource
3. Connect to SQL Warehouse endpoint
4. Import metric views as DirectQuery tables
5. Update DAX measures:

Before:
Total Revenue = SUM(Sales[Revenue])

After:
Total Revenue = SUM('sales_metrics'[total_revenue])

6. Refresh dataset schema
7. Validate dashboard visuals

Excel connection update:

Excel Configuration:

1. Remove MDX connection to Kyvos
2. Add new connection via Data > Get Data > Databricks
3. Configure connection:
   - Server: company.databricks.com
   - HTTP Path: /sql/1.0/warehouses/warehouse_id
   - Authentication: Azure Active Directory
4. Load metric view as Table
5. Create PivotTables using metric view data

Step 9: Validate Metric Accuracy

Create validation framework:

sql
-- Export Kyvos results for validation
-- (Assume Kyvos data exported to validation schema)

CREATE TABLE main.validation.kyvos_export_results AS
SELECT * FROM parquet.`s3://migration/kyvos_export/sales_metrics/*.parquet`;

-- Compare with Databricks metric view
CREATE TABLE main.validation.comparison_results AS
SELECT
  COALESCE(k.order_date, d.order_date) AS order_date,
  COALESCE(k.region, d.region) AS region,
  k.revenue AS kyvos_revenue,
  d.revenue AS databricks_revenue,
  ABS(k.revenue - d.revenue) AS revenue_diff,
  ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 AS revenue_diff_pct,
  k.order_count AS kyvos_order_count,
  d.order_count AS databricks_order_count,
  ABS(k.order_count - d.order_count) AS order_count_diff,
  CASE
    WHEN ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 <= 0.01 THEN 'PASS'
    ELSE 'FAIL'
  END AS validation_status
FROM main.validation.kyvos_export_results k
FULL OUTER JOIN (
  SELECT
    order_date,
    customer_region AS region,
    MEASURE(total_revenue) AS revenue,
    MEASURE(order_count) AS order_count
  FROM main.sales.sales_metrics
  GROUP BY order_date, customer_region
) d ON k.order_date = d.order_date AND k.region = d.region
WHERE ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 > 0.01;

-- Report validation failures
SELECT
  COUNT(*) AS total_comparisons,
  SUM(CASE WHEN validation_status = 'PASS' THEN 1 ELSE 0 END) AS passed,
  SUM(CASE WHEN validation_status = 'FAIL' THEN 1 ELSE 0 END) AS failed,
  AVG(revenue_diff_pct) AS avg_revenue_diff_pct,
  MAX(revenue_diff_pct) AS max_revenue_diff_pct
FROM main.validation.comparison_results;

Automated validation queries:

sql
-- Validate measure totals
WITH kyvos_totals AS (
  SELECT
    SUM(revenue) AS total_revenue,
    SUM(order_count) AS total_orders
  FROM main.validation.kyvos_export_results
),
databricks_totals AS (
  SELECT
    SUM(MEASURE(total_revenue)) AS total_revenue,
    SUM(MEASURE(order_count)) AS total_orders
  FROM main.sales.sales_metrics
)
SELECT
  k.total_revenue AS kyvos_total_revenue,
  d.total_revenue AS databricks_total_revenue,
  ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 AS revenue_variance_pct,
  k.total_orders AS kyvos_total_orders,
  d.total_orders AS databricks_total_orders,
  ABS(k.total_orders - d.total_orders) AS order_variance,
  CASE
    WHEN ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 <= 0.01
      AND ABS(k.total_orders - d.total_orders) = 0
    THEN 'VALIDATION PASSED'
    ELSE 'VALIDATION FAILED'
  END AS validation_result
FROM kyvos_totals k, databricks_totals d;

Step 10: Apply Typedef for Data Transformation

Typedef's Fenic framework provides semantic data processing capabilities for migration preprocessing:

Extract and classify measures:

python
import fenic as fc
from pydantic import BaseModel, Field
from typing import List, Literal

# Configure session
config = fc.SessionConfig(
    default_model="gpt-4o-mini",
    max_retries=3,
    timeout=30
)

session = fc.FenicSession(config)
fc.set_session(session)

# Define measure classification schema
class MeasureClassification(BaseModel):
    measure_type: Literal["simple_sum", "simple_count", "simple_avg",
                          "calculated", "distinct_count", "semi_additive",
                          "time_intelligence", "mdx_specific"]
    sql_translation: str = Field(description="Equivalent Spark SQL expression")
    dependencies: List[str] = Field(description="List of dependent columns or measures")
    translation_notes: str = Field(description="Notes about translation approach")

# Load Kyvos measure definitions
measures_df = fc.read_csv("kyvos_measure_export.csv")
# Classify each measure
classified_measures = (
    measures_df
    .with_column(
        "classification",
        fc.semantic.extract(
            fc.col("measure_definition"),
            MeasureClassification
        )
    )
    .unnest("classification")
)

# Export classification results
# Note: Write operations not documented in Fenic - use alternative export method
classified_measures.collect()  # Or handle export through other means

Transform hierarchical dimension data:

python
from pydantic import BaseModel
from typing import List

class HierarchyStructure(BaseModel):
    dimension_name: str
    levels: List[str]
    level_count: int
    has_ragged_structure: bool
    flattening_approach: str = Field(description="Recommended approach to flatten hierarchy")

# Load Kyvos hierarchy definitions
hierarchy_df = fc.read_json("kyvos_hierarchy_export.json")

# Analyze hierarchy structures
analyzed = (
    hierarchy_df
    .with_column(
        "structure",
        fc.semantic.extract(
            fc.col("hierarchy_definition"),
            HierarchyStructure
        )
    )
    .unnest("structure")
)

# Generate flattening SQL
flattening_sql = (
    analyzed
    .with_column(
        "ddl_statement",
        fc.semantic.map(
            fc.col("flattening_approach"),
            "Generate CREATE TABLE SQL for this dimension flattening approach"
        )
    )
)

# Note: Write operations not documented in Fenic - use alternative export method
flattening_sql.collect()  # Or handle export through other means

Validate measure logic:

python
class MeasureValidation(BaseModel):
    is_valid: bool
    validation_notes: str
    recommended_test_queries: List[str]

# Validate translated measures
validation_df = fc.read_csv("translated_measures.csv")

validated = (
    validation_df
    .with_column(
        "validation",
        fc.semantic.extract(
            fc.col("sql_translation"),
            MeasureValidation
        )
    )
    .unnest("validation")
    .filter(fc.col("is_valid") == True)
)

# Note: Write operations not documented in Fenic - use alternative export method
validated.collect()  # Or handle export through other means

Enrich customer dimensions:

python
class CustomerSegment(BaseModel):
    segment_type: Literal["enterprise", "mid_market", "small_business", "individual"]
    value_tier: Literal["high_value", "medium_value", "low_value"]
    engagement_level: Literal["active", "moderate", "dormant"]

# Load customer transaction data
customer_df = fc.read_parquet("s3://migration/customer_transactions/*.parquet")

# Apply semantic enrichment
enriched = (
    customer_df
    .with_column(
        "segment",
        fc.semantic.extract(
            fc.col("transaction_history"),
            CustomerSegment
        )
    )
    .unnest("segment")
)

# Write enriched customer dimension
# Note: Write operations not documented in Fenic - use alternative export method
enriched.collect()  # Or handle export through other means

Post-Migration Optimization

Monitor Query Performance

Track metric view usage:

sql
-- Query audit logs
SELECT
  request_params.metric_view,
  COUNT(*) AS query_count,
  AVG(execution_duration_ms) AS avg_duration_ms,
  PERCENTILE(execution_duration_ms, 0.50) AS p50_duration_ms,
  PERCENTILE(execution_duration_ms, 0.95) AS p95_duration_ms,
  PERCENTILE(execution_duration_ms, 0.99) AS p99_duration_ms,
  MAX(execution_duration_ms) AS max_duration_ms
FROM system.access.audit
WHERE action_name = 'queryMetricView'
  AND event_date >= CURRENT_DATE - 7
GROUP BY request_params.metric_view
ORDER BY query_count DESC;

Identify optimization opportunities:

sql
-- Find slow queries
SELECT
  request_params.metric_view,
  request_params.query_text,
  execution_duration_ms,
  rows_produced,
  bytes_scanned,
  user_identity.email
FROM system.access.audit
WHERE action_name = 'queryMetricView'
  AND execution_duration_ms > 10000  -- Queries over 10 seconds
  AND event_date >= CURRENT_DATE - 1
ORDER BY execution_duration_ms DESC
LIMIT 50;

Monitor warehouse utilization:

sql
-- Warehouse performance metrics
SELECT
  warehouse_id,
  warehouse_name,
  DATE(start_time) AS date,
  COUNT(*) AS query_count,
  SUM(execution_duration) AS total_execution_time_ms,
  AVG(execution_duration) AS avg_execution_time_ms,
  SUM(rows_produced) AS total_rows_produced,
  SUM(CASE WHEN error_message IS NOT NULL THEN 1 ELSE 0 END) AS failed_queries
FROM system.query.history
WHERE start_time >= CURRENT_DATE - 7
GROUP BY warehouse_id, warehouse_name, DATE(start_time)
ORDER BY date DESC, total_execution_time_ms DESC;

Implement Performance Tuning

Create aggregation tables for hot paths:

sql
-- Monthly aggregation for reporting
CREATE TABLE main.sales.monthly_aggregates AS
SELECT
  DATE_TRUNC('month', order_date) AS month,
  customer_region,
  product_category,
  SUM(order_amount) AS monthly_revenue,
  COUNT(DISTINCT customer_id) AS monthly_customers,
  COUNT(DISTINCT order_id) AS monthly_orders,
  SUM(quantity) AS monthly_quantity
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
JOIN main.sales.products ON orders.product_id = products.product_id
GROUP BY DATE_TRUNC('month', order_date), customer_region, product_category;

-- Use aggregation table for month-level queries
SELECT
  month,
  customer_region,
  SUM(monthly_revenue) AS revenue
FROM main.sales.monthly_aggregates
WHERE month >= '2024-01-01'
GROUP BY month, customer_region;

Optimize partition strategy:

sql
-- Repartition fact table by frequently filtered columns
CREATE OR REPLACE TABLE main.sales.orders
USING DELTA
PARTITIONED BY (order_year, order_month)
AS
SELECT
  *,
  YEAR(order_date) AS order_year,
  MONTH(order_date) AS order_month
FROM main.sales.orders_staging;

-- Analyze partition statistics
DESCRIBE DETAIL main.sales.orders;

Establish Data Quality Monitoring

Create quality check framework:

sql
CREATE TABLE main.monitoring.metric_quality_log (
    check_timestamp TIMESTAMP,
    metric_view STRING,
    check_type STRING,
    check_status STRING,
    error_details STRING,
    actual_value DECIMAL(18,2),
    expected_range STRING
);

-- Quality check procedure
CREATE OR REPLACE PROCEDURE validate_metric_quality()
LANGUAGE SQL
AS
BEGIN
  -- Check for NULL measures
  INSERT INTO main.monitoring.metric_quality_log
  SELECT
    CURRENT_TIMESTAMP,
    'main.sales.sales_metrics',
    'null_check',
    CASE WHEN COUNT(*) = 0 THEN 'PASS' ELSE 'FAIL' END,
    CASE WHEN COUNT(*) > 0 THEN CONCAT('Found ', COUNT(*), ' NULL values') ELSE NULL END,
    COUNT(*),
    '0'
  FROM (
    SELECT order_date
    FROM main.sales.sales_metrics
    WHERE MEASURE(total_revenue) IS NULL
  );

  -- Check for negative revenue
  INSERT INTO main.monitoring.metric_quality_log
  SELECT
    CURRENT_TIMESTAMP,
    'main.sales.sales_metrics',
    'negative_value_check',
    CASE WHEN MIN(MEASURE(total_revenue)) >= 0 THEN 'PASS' ELSE 'FAIL' END,
    CASE WHEN MIN(MEASURE(total_revenue)) < 0 THEN 'Negative revenue detected' ELSE NULL END,
    MIN(MEASURE(total_revenue)),
    '>= 0'
  FROM main.sales.sales_metrics;

  -- Check for data freshness
  INSERT INTO main.monitoring.metric_quality_log
  SELECT
    CURRENT_TIMESTAMP,
    'main.sales.sales_metrics',
    'freshness_check',
    CASE WHEN MAX(order_date) >= CURRENT_DATE - 1 THEN 'PASS' ELSE 'FAIL' END,
    CASE WHEN MAX(order_date) < CURRENT_DATE - 1 THEN 'Stale data detected' ELSE NULL END,
    DATEDIFF(DAY, MAX(order_date), CURRENT_DATE),
    '<= 1 day'
  FROM main.sales.sales_metrics;
END;

-- Schedule quality checks
CREATE JOB metric_quality_checks
SCHEDULE CRON '0 */4 * * *'  -- Every 4 hours
AS
CALL validate_metric_quality();

Alert on quality failures:

sql
-- Query recent quality check failures
SELECT
  check_timestamp,
  metric_view,
  check_type,
  error_details,
  actual_value,
  expected_range
FROM main.monitoring.metric_quality_log
WHERE check_status = 'FAIL'
  AND check_timestamp >= CURRENT_TIMESTAMP - INTERVAL 24 HOURS
ORDER BY check_timestamp DESC;

Document Migration Artifacts

Create metadata registry:

sql
CREATE TABLE main.documentation.migration_metadata (
    metric_view STRING,
    source_kyvos_cube STRING,
    migration_date DATE,
    validation_status STRING,
    known_differences STRING,
    performance_baseline STRING,
    owner STRING,
    documentation_url STRING
);

INSERT INTO main.documentation.migration_metadata VALUES
(
  'main.sales.sales_metrics',
  'Sales_Analysis_Cube',
  '2025-01-15',
  'VALIDATED',
  'Profit margin calculation: 0.05% difference due to NULL handling approach',
  'P95 latency: 2.3s for typical queries on 500M rows',
  'sales_analytics_team@company.com',
  'https://wiki.company.com/databricks/metric-views/sales-metrics'
);

Version control metric definitions:

Store metric view YAML definitions in source control:

yaml
# git repository: databricks-metric-views/sales/sales_metrics.yaml

version: 1.0.0
migration_date: 2025-01-15
source_system: kyvos
source_cube: Sales_Analysis_Cube

validation:
  status: passed
  validation_date: 2025-01-20
  total_revenue_variance: 0.001%
  order_count_variance: 0

known_differences:
  - measure: profit_margin
    difference: 0.05%
    reason: "Different NULL handling: Databricks uses COALESCE, Kyvos ignored NULLs"
    impact: negligible

  - feature: hierarchical_drilldown
    difference: manual_implementation
    reason: "Databricks requires explicit dimension levels, Kyvos had automatic drill-down"
    impact: "BI tools must specify dimension granularity in queries"

performance:
  baseline_query_latency_p95: 2.3s
  data_volume: 500M_rows
  warehouse_size: medium

metric_view:
  name: sales_metrics
  catalog: main
  schema: sales
  # ... rest of metric view definition

Migration Patterns and Solutions

Pattern 1: Pre-Aggregation Performance Gap

Scenario: Kyvos MOLAP cubes provide sub-second query response through pre-aggregation. Databricks on-demand computation shows higher latency for cold queries.

Solution approaches:

Materialized views for high-frequency queries:

sql
CREATE MATERIALIZED VIEW main.sales.hourly_revenue AS
SELECT
  DATE_TRUNC('hour', order_timestamp) AS hour,
  customer_region,
  product_category,
  SUM(order_amount) AS hourly_revenue,
  COUNT(DISTINCT order_id) AS hourly_orders,
  COUNT(DISTINCT customer_id) AS hourly_customers
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
JOIN main.sales.products ON orders.product_id = products.product_id
GROUP BY DATE_TRUNC('hour', order_timestamp), customer_region, product_category;

-- Query materialized view instead of raw data
SELECT
  hour,
  customer_region,
  SUM(hourly_revenue) AS revenue
FROM main.sales.hourly_revenue
WHERE hour >= CURRENT_TIMESTAMP - INTERVAL 24 HOURS
GROUP BY hour, customer_region;

Pre-computed rollup tables:

sql
-- Daily rollup
CREATE TABLE main.sales.daily_rollup AS
SELECT
  order_date,
  customer_region,
  product_category,
  SUM(order_amount) AS daily_revenue,
  COUNT(DISTINCT order_id) AS daily_orders,
  AVG(order_amount) AS avg_order_value
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
JOIN main.sales.products ON orders.product_id = products.product_id
GROUP BY order_date, customer_region, product_category;

-- Schedule daily refresh
CREATE JOB refresh_daily_rollup
SCHEDULE CRON '0 1 * * *'
AS
MERGE INTO main.sales.daily_rollup t
USING (
  SELECT
    order_date,
    customer_region,
    product_category,
    SUM(order_amount) AS daily_revenue,
    COUNT(DISTINCT order_id) AS daily_orders,
    AVG(order_amount) AS avg_order_value
  FROM main.sales.orders
  JOIN main.sales.customers ON orders.customer_id = customers.customer_id
  JOIN main.sales.products ON orders.product_id = products.product_id
  WHERE order_date >= CURRENT_DATE - 7
  GROUP BY order_date, customer_region, product_category
) s
ON t.order_date = s.order_date
  AND t.customer_region = s.customer_region
  AND t.product_category = s.product_category
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Pattern 2: MDX to SQL Translation

Scenario: BI tools use MDX queries that require translation to Spark SQL.

Common MDX patterns and SQL equivalents:

TOPCOUNT translation:

MDX:
TOPCOUNT([Customer].[Customer].MEMBERS, 10, [Measures].[Revenue])

Spark SQL:
SELECT
  customer_id,
  customer_name,
  MEASURE(total_revenue) AS revenue
FROM main.sales.sales_metrics
GROUP BY customer_id, customer_name
ORDER BY MEASURE(total_revenue) DESC
LIMIT 10;

CROSSJOIN translation:

MDX:
CROSSJOIN([Product].[Category].MEMBERS, [Date].[Year].MEMBERS)

Spark SQL:
SELECT DISTINCT
  product_category,
  year
FROM main.sales.sales_metrics
CROSS JOIN main.common.date_dimension
ORDER BY product_category, year;

FILTER translation:

MDX:
FILTER([Customer].[Region].MEMBERS, [Measures].[Revenue] > 1000000)

Spark SQL:
SELECT
  customer_region,
  MEASURE(total_revenue) AS revenue
FROM main.sales.sales_metrics
GROUP BY customer_region
HAVING MEASURE(total_revenue) > 1000000;

Pattern 3: Semi-Additive Measure Handling

Scenario: Account balances and inventory levels that don't sum across time dimension.

Solution:

sql
-- Last balance per period
CREATE METRIC VIEW main.finance.balance_metrics AS
SELECT
  -- Last value for the period
  MAX_BY(account_balance, snapshot_date) AS ending_balance,

  -- Average for the period
  AVG(account_balance) AS avg_balance,

  -- Opening balance (first value)
  MIN_BY(account_balance, snapshot_date) AS opening_balance,

  account_id,
  account_type,
  DATE_TRUNC('month', snapshot_date) AS month
FROM main.finance.daily_balances
GROUP BY account_id, account_type, DATE_TRUNC('month', snapshot_date);

-- Query ending balance (semi-additive across accounts, not time)
SELECT
  month,
  account_type,
  SUM(ending_balance) AS total_ending_balance
FROM main.finance.balance_metrics
WHERE month = '2025-01-01'
GROUP BY month, account_type;

Pattern 4: Parent-Child Hierarchy Flattening

Scenario: Employee reporting structure with unlimited depth.

Solution:

sql
-- Flatten recursive hierarchy
CREATE OR REPLACE TABLE main.hr.employee_hierarchy AS
WITH RECURSIVE org_structure AS (
  -- Base case: CEO and executives
  SELECT
    employee_id,
    employee_name,
    manager_id,
    1 AS level,
    employee_name AS level1_manager,
    CAST(NULL AS STRING) AS level2_manager,
    CAST(NULL AS STRING) AS level3_manager,
    CAST(NULL AS STRING) AS level4_manager,
    CAST(employee_name AS STRING) AS reporting_path
  FROM main.hr.employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case
  SELECT
    e.employee_id,
    e.employee_name,
    e.manager_id,
    os.level + 1,
    CASE WHEN os.level + 1 = 1 THEN e.employee_name ELSE os.level1_manager END,
    CASE WHEN os.level + 1 = 2 THEN e.employee_name ELSE os.level2_manager END,
    CASE WHEN os.level + 1 = 3 THEN e.employee_name ELSE os.level3_manager END,
    CASE WHEN os.level + 1 = 4 THEN e.employee_name ELSE os.level4_manager END,
    CONCAT(os.reporting_path, ' > ', e.employee_name)
  FROM main.hr.employees e
  JOIN org_structure os ON e.manager_id = os.employee_id
  WHERE os.level < 10  -- Limit recursion depth
)
SELECT * FROM org_structure;

-- Use in metric view
CREATE METRIC VIEW main.hr.headcount_metrics AS
SELECT
  COUNT(DISTINCT employee_id) AS headcount,
  SUM(salary) AS total_compensation,

  level AS org_level,
  level1_manager,
  level2_manager,
  level3_manager
FROM main.hr.employee_hierarchy
GROUP BY level, level1_manager, level2_manager, level3_manager;

Validation and Testing Framework

Measure Accuracy Validation

Aggregate-level validation:

sql
-- Compare total measures
CREATE OR REPLACE VIEW main.validation.measure_totals_comparison AS
WITH kyvos_totals AS (
  SELECT
    'kyvos' AS source,
    SUM(total_revenue) AS total_revenue,
    SUM(order_count) AS total_orders,
    SUM(customer_count) AS total_customers
  FROM main.validation.kyvos_export
),
databricks_totals AS (
  SELECT
    'databricks' AS source,
    SUM(MEASURE(total_revenue)) AS total_revenue,
    SUM(MEASURE(order_count)) AS total_orders,
    SUM(MEASURE(unique_customers)) AS total_customers
  FROM main.sales.sales_metrics
)
SELECT
  k.total_revenue AS kyvos_revenue,
  d.total_revenue AS databricks_revenue,
  ABS(k.total_revenue - d.total_revenue) AS revenue_diff,
  ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 AS revenue_diff_pct,
  k.total_orders AS kyvos_orders,
  d.total_orders AS databricks_orders,
  ABS(k.total_orders - d.total_orders) AS order_diff,
  CASE
    WHEN ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 <= 0.01
      AND ABS(k.total_orders - d.total_orders) = 0
    THEN 'PASS'
    ELSE 'FAIL'
  END AS validation_status
FROM kyvos_totals k, databricks_totals d;

Dimension-level validation:

sql
-- Compare by dimension
CREATE OR REPLACE VIEW main.validation.dimension_comparison AS
SELECT
  COALESCE(k.customer_region, d.customer_region) AS region,
  k.revenue AS kyvos_revenue,
  d.revenue AS databricks_revenue,
  ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 AS variance_pct,
  CASE
    WHEN ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 <= 1.0 THEN 'PASS'
    WHEN ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 <= 5.0 THEN 'WARNING'
    ELSE 'FAIL'
  END AS status
FROM main.validation.kyvos_export k
FULL OUTER JOIN (
  SELECT
    customer_region,
    SUM(MEASURE(total_revenue)) AS revenue
  FROM main.sales.sales_metrics
  GROUP BY customer_region
) d ON k.customer_region = d.customer_region
ORDER BY variance_pct DESC;

Query Performance Testing

Benchmark query performance:

sql
-- Create performance test queries
CREATE TABLE main.validation.performance_tests (
    test_id INT,
    test_name STRING,
    query_text STRING,
    expected_max_duration_ms INT
);

INSERT INTO main.validation.performance_tests VALUES
(1, 'daily_revenue_by_region',
 'SELECT order_date, customer_region, MEASURE(total_revenue) FROM main.sales.sales_metrics WHERE order_date >= CURRENT_DATE - 7 GROUP BY order_date, customer_region',
 5000),
(2, 'top_products_ytd',
 'SELECT product_name, MEASURE(total_revenue) FROM main.sales.sales_metrics WHERE YEAR(order_date) = YEAR(CURRENT_DATE) GROUP BY product_name ORDER BY MEASURE(total_revenue) DESC LIMIT 100',
 3000);

-- Execute and log performance
CREATE TABLE main.validation.performance_results (
    test_id INT,
    execution_timestamp TIMESTAMP,
    execution_duration_ms BIGINT,
    rows_returned BIGINT,
    test_status STRING
);

-- Performance test execution procedure
CREATE OR REPLACE PROCEDURE run_performance_tests()
LANGUAGE SQL
AS
BEGIN
  DECLARE test_cursor CURSOR FOR
    SELECT test_id, query_text, expected_max_duration_ms
    FROM main.validation.performance_tests;

  FOR test_record IN test_cursor DO
    -- Execute test query and log results
    -- (Implementation depends on execution framework)
    INSERT INTO main.validation.performance_results
    SELECT
      test_record.test_id,
      CURRENT_TIMESTAMP,
      execution_duration,
      row_count,
      CASE
        WHEN execution_duration <= test_record.expected_max_duration_ms THEN 'PASS'
        ELSE 'FAIL'
      END
    FROM query_execution_log
    WHERE query_id = LAST_QUERY_ID();
  END FOR;
END;

Data Lineage Verification

Validate measure dependencies:

sql
-- Document measure lineage
CREATE TABLE main.documentation.measure_lineage (
    metric_view STRING,
    measure_name STRING,
    source_tables ARRAY<STRING>,
    source_columns ARRAY<STRING>,
    transformation_logic STRING
);

INSERT INTO main.documentation.measure_lineage VALUES
(
  'main.sales.sales_metrics',
  'total_revenue',
  ARRAY('main.sales.orders'),
  ARRAY('order_amount'),
  'SUM(order_amount)'
),
(
  'main.sales.sales_metrics',
  'net_revenue',
  ARRAY('main.sales.orders'),
  ARRAY('order_amount', 'discount_amount', 'refund_amount'),
  'SUM(order_amount - discount_amount - refund_amount)'
);

-- Query lineage for measure
SELECT
  measure_name,
  transformation_logic,
  array_join(source_tables, ', ') AS source_tables,
  array_join(source_columns, ', ') AS source_columns
FROM main.documentation.measure_lineage
WHERE metric_view = 'main.sales.sales_metrics'
ORDER BY measure_name;

Production Deployment

Establish Metric Governance Framework

Metric certification process:

sql
-- Metric certification workflow
CREATE TABLE main.governance.metric_certification (
    metric_view STRING PRIMARY KEY,
    certification_status STRING,
    certified_by STRING,
    certification_date DATE,
    review_date DATE,
    certification_notes STRING
);

-- Certification procedure
CREATE OR REPLACE PROCEDURE certify_metric_view(
    view_name STRING,
    certified_by_email STRING,
    notes STRING
)
LANGUAGE SQL
AS
BEGIN
  -- Validate metric view exists
  IF NOT EXISTS (
    SELECT 1
    FROM main.information_schema.tables
    WHERE table_catalog || '.' || table_schema || '.' || table_name = view_name
      AND table_type = 'METRIC VIEW'
  ) THEN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Metric view does not exist';
  END IF;

  -- Record certification
  MERGE INTO main.governance.metric_certification t
  USING (SELECT view_name AS metric_view) s
  ON t.metric_view = s.metric_view
  WHEN MATCHED THEN UPDATE SET
    certification_status = 'CERTIFIED',
    certified_by = certified_by_email,
    certification_date = CURRENT_DATE,
    review_date = ADD_MONTHS(CURRENT_DATE, 6),
    certification_notes = notes
  WHEN NOT MATCHED THEN INSERT
    (metric_view, certification_status, certified_by, certification_date, review_date, certification_notes)
  VALUES
    (view_name, 'CERTIFIED', certified_by_email, CURRENT_DATE, ADD_MONTHS(CURRENT_DATE, 6), notes);

  -- Apply certification flag to metric view
  EXECUTE IMMEDIATE 'ALTER METRIC VIEW ' || view_name ||
    ' SET TBLPROPERTIES (''certified'' = ''true'', ''certification_date'' = ''' ||
    CURRENT_DATE || ''')';
END;

Change management:

sql
-- Track metric view changes
CREATE TABLE main.governance.metric_change_log (
    change_id INT GENERATED ALWAYS AS IDENTITY,
    metric_view STRING,
    change_type STRING,
    change_description STRING,
    changed_by STRING,
    change_timestamp TIMESTAMP,
    version STRING,
    rollback_ddl STRING
);

-- Log changes
INSERT INTO main.governance.metric_change_log
(metric_view, change_type, change_description, changed_by, change_timestamp, version, rollback_ddl)
VALUES
(
  'main.sales.sales_metrics',
  'MEASURE_ADDED',
  'Added profit_margin_pct measure',
  current_user(),
  CURRENT_TIMESTAMP,
  '1.1.0',
  'ALTER METRIC VIEW main.sales.sales_metrics DROP MEASURE profit_margin_pct'
);

Configure Refresh and Maintenance

Schedule table maintenance:

sql
-- Create maintenance procedures
CREATE OR REPLACE PROCEDURE maintain_sales_tables()
LANGUAGE SQL
AS
BEGIN
  -- Optimize fact table
  OPTIMIZE main.sales.orders
  ZORDER BY (order_date, customer_id, product_id);

  -- Vacuum old files
  VACUUM main.sales.orders RETAIN 168 HOURS;

  -- Update statistics
  ANALYZE TABLE main.sales.orders COMPUTE STATISTICS FOR ALL COLUMNS;
  ANALYZE TABLE main.sales.customers COMPUTE STATISTICS FOR ALL COLUMNS;
  ANALYZE TABLE main.sales.products COMPUTE STATISTICS FOR ALL COLUMNS;

  -- Refresh materialized views
  REFRESH MATERIALIZED VIEW main.sales.daily_sales_rollup;
  REFRESH MATERIALIZED VIEW main.sales.hourly_revenue;
END;

-- Schedule maintenance
CREATE JOB sales_table_maintenance
SCHEDULE CRON '0 2 * * *'  -- 2 AM daily
AS
CALL maintain_sales_tables();

Implement Monitoring and Alerting

Create monitoring dashboard:

sql
-- Metric health metrics
CREATE OR REPLACE VIEW main.monitoring.metric_health AS
SELECT
  metric_view,
  COUNT(*) AS query_count_24h,
  AVG(execution_duration_ms) AS avg_duration_ms,
  PERCENTILE(execution_duration_ms, 0.95) AS p95_duration_ms,
  SUM(CASE WHEN error_message IS NOT NULL THEN 1 ELSE 0 END) AS error_count,
  MAX(event_time) AS last_query_time
FROM system.access.audit
WHERE action_name = 'queryMetricView'
  AND event_date >= CURRENT_DATE - 1
GROUP BY metric_view;

-- Alert on unhealthy metrics
SELECT
  metric_view,
  'HIGH_ERROR_RATE' AS alert_type,
  CONCAT('Error rate: ', ROUND(error_count * 100.0 / query_count_24h, 2), '%') AS alert_message
FROM main.monitoring.metric_health
WHERE error_count * 100.0 / query_count_24h > 5.0  -- More than 5% error rate

UNION ALL

SELECT
  metric_view,
  'SLOW_PERFORMANCE' AS alert_type,
  CONCAT('P95 latency: ', p95_duration_ms, 'ms') AS alert_message
FROM main.monitoring.metric_health
WHERE p95_duration_ms > 10000;  -- P95 over 10 seconds

Conclusion

Migrating from Kyvos to Databricks Unity Catalog Metric Views requires systematic translation of OLAP concepts to lakehouse-native patterns. The architecture shift from pre-aggregated cubes to on-demand computation demands careful performance tuning and query optimization. Typedef's Fenic framework provides semantic processing capabilities for preprocessing migration artifacts, validating measure logic, and enriching dimension data during the migration process.

The lakehouse-native approach eliminates separate semantic layer infrastructure while maintaining governed metric definitions through Unity Catalog. Organizations gain unified data access patterns, native AI integration, and real-time metric freshness at the cost of rewriting MDX queries and flattening hierarchical dimensions.

For additional technical resources on semantic data transformation and AI-powered data processing, refer to Typedef's documentation and Fenic framework guides.

the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.