Kyvos implements traditional OLAP cubes with pre-aggregated MOLAP storage and MDX query interfaces. Databricks Unity Catalog Metric Views operate as catalog-native objects with on-demand Spark SQL computation. Migration requires translating cube definitions to YAML metric views, converting MDX to SQL, flattening hierarchies, and reimplementing time intelligence. Typedef's Fenic framework provides semantic extraction and data transformation capabilities for preprocessing migration artifacts and validating measure logic during the migration process.
Architecture Comparison
Kyvos Smart OLAP Architecture
Kyvos operates as a semantic performance layer sitting between data sources and BI tools:
Pre-aggregation approach:
- Smart Aggregation technology creates managed MOLAP cubes
- AI-powered selection of aggregation patterns
- Pre-computed measures stored in columnar format
- Automatic aggregation maintenance based on query patterns
Storage modes:
- MOLAP: Fully pre-aggregated with optimized storage
- ROLAP: Direct queries to underlying tables
- HOLAP: Hybrid combining both approaches
Query interfaces:
- MDX (Multidimensional Expressions) for OLAP operations
- SQL for relational access patterns
- DAX support for Power BI semantic models
- Custom connectors for Tableau, Excel, and other BI tools
Semantic model structure:
- Star and snowflake schema optimization
- Ragged and unbalanced hierarchies
- Parent-child hierarchies
- Calculated members and measures
- Row and column-level security
Databricks Unity Catalog Metric Views Architecture
Databricks embeds metrics directly in the lakehouse catalog:
Catalog-native implementation:
- Metric views as first-class Unity Catalog objects
- YAML or SQL DDL definitions
- No separate semantic layer infrastructure
- Metadata stored in catalog, not pre-aggregated cubes
Computation model:
- On-demand query execution via Spark SQL
- Photon vectorized engine for performance
- Delta Lake optimizations (data skipping, Z-ordering, liquid clustering)
- Result caching for repeated queries
Query interface:
- MEASURE() clause for metric invocation
- Standard Spark SQL syntax
- No MDX support
Governance integration:
- Unity Catalog RBAC applies to metric views
- Row-level security and column masking inherited from base tables
- Lineage tracking in catalog metadata
- Certification and tagging capabilities
Fundamental Differences
| Aspect | Kyvos | Databricks Metric Views |
|---|---|---|
| Architecture | Separate semantic layer | Catalog-native objects |
| Storage | Pre-aggregated cubes | Delta tables with on-demand aggregation |
| Query Language | MDX, SQL, DAX | Spark SQL only |
| Hierarchies | Native support for ragged/parent-child | Requires flattening |
| Time Intelligence | Built-in functions | Manual implementation required |
| Refresh Model | Scheduled cube refresh | Real-time on underlying Delta tables |
| Security Model | Semantic layer security | Unity Catalog RBAC |
Pre-Migration Assessment
Inventory Kyvos Semantic Models
Extract cube metadata:
Document all Kyvos cubes including:
- Dimension definitions and hierarchies
- Measure formulas and aggregation types
- Calculated members
- Security rules and filters
- Refresh schedules
- Storage mode (MOLAP/ROLAP/HOLAP)
Classify measure types:
Simple aggregations:
SUM(order_amount)
COUNT(order_id)
AVG(unit_price)
Calculated measures:
[Net Revenue] = [Gross Revenue] - [Discounts] - [Refunds]
[Profit Margin] = ([Revenue] - [Cost]) / [Revenue]
Semi-additive measures:
DISTINCT COUNT(customer_id)
AVERAGE(account_balance) -- across time dimension only
Time-based calculations:
YTD([Revenue])
ParallelPeriod([Date].[Year], 1, [Date].[Month])
Categorize by translation difficulty:
| Measure Type | Translation Approach | Notes |
|---|---|---|
| Simple SUM/COUNT/AVG | Direct SQL translation | Straightforward |
| Calculated measures | SQL expressions | May require restructuring |
| Distinct counts | COUNT(DISTINCT ...) | Requires careful join handling |
| Semi-additive | Window clauses | Manual time dimension logic |
| MDX time intelligence | Custom SQL with date dimension | No direct equivalent |
Analyze Source Data Structure
Map Kyvos cubes to Delta tables:
For each cube, identify:
- Source fact tables
- Dimension tables
- Join relationships
- Data freshness requirements
- Volume and partitioning strategy
Example mapping:
Kyvos Cube: Sales_Analysis_Cube
├── Fact: orders table (500M rows)
├── Dimension: customers (2M rows)
├── Dimension: products (50K rows)
├── Dimension: date (3,650 days)
└── Dimension: stores (500 locations)
Target Databricks Structure:
├── main.sales.orders (Delta table, partitioned by order_date)
├── main.sales.customers (Delta table)
├── main.sales.products (Delta table)
├── main.common.date_dimension (Delta table)
└── main.sales.stores (Delta table)
Assess BI Tool Dependencies
Document query patterns:
Analyze how BI tools currently access Kyvos:
- MDX queries from Excel/Power BI
- SQL passthrough from Tableau
- Custom connector usage
- Dashboard refresh frequencies
- User access patterns
Identify breaking changes:
| Current Kyvos Feature | Databricks Equivalent | Impact |
|---|---|---|
| Automatic drill-down | Manual dimension level selection | BI tools require query updates |
| MDX TOPCOUNT() | SQL LIMIT with ORDER BY | Query rewrite required |
| Parent-child hierarchies | Flattened dimension tables | Loss of dynamic hierarchy navigation |
| Cube browsing in Excel | SQL Warehouse connection | Different connection mechanism |
Migration Process
Step 1: Create Delta Table Foundation
Establish star schema in Unity Catalog:
sql-- Create catalog and schema structure CREATE CATALOG IF NOT EXISTS main; CREATE SCHEMA IF NOT EXISTS main.sales; CREATE SCHEMA IF NOT EXISTS main.common; -- Fact table CREATE TABLE main.sales.orders ( order_id STRING NOT NULL, order_date DATE NOT NULL, customer_id STRING NOT NULL, product_id STRING NOT NULL, store_id STRING NOT NULL, order_amount DECIMAL(18,2), quantity INT, discount_amount DECIMAL(18,2), refund_amount DECIMAL(18,2), tax_amount DECIMAL(18,2) ) USING DELTA PARTITIONED BY (order_date) LOCATION 's3://lakehouse/sales/orders'; -- Dimension tables CREATE TABLE main.sales.customers ( customer_id STRING NOT NULL PRIMARY KEY, customer_name STRING, region STRING, segment STRING, registration_date DATE ) USING DELTA; CREATE TABLE main.sales.products ( product_id STRING NOT NULL PRIMARY KEY, product_name STRING, category STRING, subcategory STRING, brand STRING, unit_cost DECIMAL(18,2) ) USING DELTA;
Apply Delta Lake optimizations:
sql-- Z-ordering for common filter patterns OPTIMIZE main.sales.orders ZORDER BY (customer_id, product_id); -- Enable liquid clustering for evolving access patterns ALTER TABLE main.sales.orders CLUSTER BY (order_date, customer_id); -- Collect statistics ANALYZE TABLE main.sales.orders COMPUTE STATISTICS FOR ALL COLUMNS;
Step 2: Translate Cube Definitions to Metric Views
Basic metric view structure:
sqlCREATE METRIC VIEW main.sales.sales_metrics AS SELECT -- Measures from Kyvos cube SUM(order_amount) AS total_revenue, SUM(order_amount - discount_amount - refund_amount) AS net_revenue, SUM(order_amount - discount_amount - refund_amount - (quantity * products.unit_cost)) AS gross_profit, COUNT(DISTINCT order_id) AS order_count, COUNT(DISTINCT customer_id) AS unique_customers, SUM(quantity) AS total_quantity, -- Calculated measures SUM(order_amount) / COUNT(DISTINCT order_id) AS avg_order_value, (SUM(order_amount - discount_amount - refund_amount - (quantity * products.unit_cost)) / SUM(order_amount)) * 100 AS profit_margin_pct, SUM(order_amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer, -- Dimensions from Kyvos cube orders.order_date, customers.region, customers.segment, products.category, products.subcategory, products.brand FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id JOIN main.sales.products ON orders.product_id = products.product_id;
YAML definition approach:
yamlmetric_view: name: sales_metrics catalog: main schema: sales source_table: main.sales.orders joins: - table: main.sales.customers on: orders.customer_id = customers.customer_id type: inner - table: main.sales.products on: orders.product_id = products.product_id type: inner measures: - name: total_revenue expr: SUM(order_amount) description: "Total order revenue including discounts" data_type: decimal - name: net_revenue expr: SUM(order_amount - discount_amount - refund_amount) description: "Revenue after discounts and refunds" data_type: decimal - name: order_count expr: COUNT(DISTINCT order_id) description: "Number of unique orders" data_type: bigint - name: avg_order_value expr: SUM(order_amount) / COUNT(DISTINCT order_id) description: "Average revenue per order" data_type: decimal dimensions: - name: order_date expr: order_date data_type: date - name: customer_region expr: customers.region data_type: string - name: product_category expr: products.category data_type: string
Step 3: Convert MDX Measures to SQL
Simple aggregation translation:
Kyvos MDX:
[Measures].[Total Revenue]
AGGREGATE: SUM
SOURCE: [order_amount]
Databricks SQL:
SUM(order_amount) AS total_revenue
Calculated measure translation:
Kyvos MDX:
[Measures].[Profit Margin] =
([Measures].[Revenue] - [Measures].[Cost]) / [Measures].[Revenue]
Databricks SQL:
(SUM(order_amount) - SUM(quantity * unit_cost)) / SUM(order_amount) AS profit_margin
Distinct count translation:
Kyvos MDX:
[Measures].[Customer Count]
AGGREGATE: DistinctCount
DIMENSION: [Customer].[Customer ID]
Databricks SQL:
COUNT(DISTINCT customer_id) AS customer_count
Ratio measure translation:
Kyvos MDX:
[Measures].[Units Per Order] =
[Measures].[Total Quantity] / [Measures].[Order Count]
Databricks SQL:
SUM(quantity) / COUNT(DISTINCT order_id) AS units_per_order
Time intelligence translation:
Kyvos MDX:
[Measures].[YTD Revenue] =
SUM(YTD([Date].[Calendar].[Date]), [Measures].[Revenue])
Databricks SQL:
SUM(order_amount) OVER (
PARTITION BY YEAR(order_date)
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS ytd_revenue
Step 4: Flatten Hierarchical Dimensions
Product hierarchy example:
Kyvos Hierarchy:
[Product].[Category].[Subcategory].[Product Name]
- Electronics
- Computers
- Laptop-15inch-Intel
- Desktop-Tower-AMD
- Mobile Devices
- Smartphone-5G
Databricks flattening approach:
sql-- Create flattened dimension table CREATE TABLE main.sales.products_flat AS SELECT product_id, product_name, -- Level 1: Category category, -- Level 2: Subcategory subcategory, -- Level 3: Product name already captured -- Composite keys for intermediate rollups category AS level1_key, CONCAT(category, '|', subcategory) AS level2_key, CONCAT(category, '|', subcategory, '|', product_name) AS level3_key, -- Display labels category AS level1_label, subcategory AS level2_label, product_name AS level3_label FROM main.sales.products; -- Use in metric view CREATE METRIC VIEW main.sales.product_hierarchy_metrics AS SELECT SUM(order_amount) AS total_revenue, -- Multiple dimension columns for different hierarchy levels p.category, p.subcategory, p.product_name, p.level2_key, -- For category + subcategory rollups p.level3_key -- For full hierarchy path FROM main.sales.orders o JOIN main.sales.products_flat p ON o.product_id = p.product_id;
Date hierarchy implementation:
sql-- Create comprehensive date dimension CREATE TABLE main.common.date_dimension ( date_key DATE PRIMARY KEY, -- Day level day_of_week INT, day_name STRING, day_of_month INT, day_of_year INT, -- Week level week_of_year INT, week_start_date DATE, week_end_date DATE, -- Month level month INT, month_name STRING, month_abbr STRING, month_start_date DATE, month_end_date DATE, -- Quarter level quarter INT, quarter_name STRING, quarter_start_date DATE, quarter_end_date DATE, -- Year level year INT, -- Fiscal calendar fiscal_year INT, fiscal_quarter INT, fiscal_month INT, -- Business day indicators is_weekend BOOLEAN, is_holiday BOOLEAN, is_business_day BOOLEAN ) USING DELTA; -- Populate date dimension INSERT INTO main.common.date_dimension SELECT d.date_key, DAYOFWEEK(d.date_key) AS day_of_week, DATE_FORMAT(d.date_key, 'EEEE') AS day_name, DAYOFMONTH(d.date_key) AS day_of_month, DAYOFYEAR(d.date_key) AS day_of_year, WEEKOFYEAR(d.date_key) AS week_of_year, DATE_TRUNC('week', d.date_key) AS week_start_date, DATE_ADD(DATE_TRUNC('week', d.date_key), 6) AS week_end_date, MONTH(d.date_key) AS month, DATE_FORMAT(d.date_key, 'MMMM') AS month_name, DATE_FORMAT(d.date_key, 'MMM') AS month_abbr, DATE_TRUNC('month', d.date_key) AS month_start_date, LAST_DAY(d.date_key) AS month_end_date, QUARTER(d.date_key) AS quarter, CONCAT('Q', QUARTER(d.date_key)) AS quarter_name, DATE_TRUNC('quarter', d.date_key) AS quarter_start_date, LAST_DAY(ADD_MONTHS(DATE_TRUNC('quarter', d.date_key), 2)) AS quarter_end_date, YEAR(d.date_key) AS year, -- Fiscal year logic (example: fiscal year starts July 1) CASE WHEN MONTH(d.date_key) >= 7 THEN YEAR(d.date_key) ELSE YEAR(d.date_key) - 1 END AS fiscal_year, CASE WHEN MONTH(d.date_key) >= 7 THEN ((MONTH(d.date_key) - 7) DIV 3) + 1 ELSE ((MONTH(d.date_key) + 5) DIV 3) + 1 END AS fiscal_quarter, CASE WHEN MONTH(d.date_key) >= 7 THEN MONTH(d.date_key) - 6 ELSE MONTH(d.date_key) + 6 END AS fiscal_month, CASE WHEN DAYOFWEEK(d.date_key) IN (1, 7) THEN TRUE ELSE FALSE END AS is_weekend, FALSE AS is_holiday, -- Populate separately CASE WHEN DAYOFWEEK(d.date_key) NOT IN (1, 7) THEN TRUE ELSE FALSE END AS is_business_day FROM ( SELECT EXPLODE(SEQUENCE(DATE('2020-01-01'), DATE('2030-12-31'), INTERVAL 1 DAY)) AS date_key ) d;
Step 5: Implement Time Intelligence
Period-over-period calculations:
sqlCREATE METRIC VIEW main.sales.time_comparison_metrics AS SELECT SUM(order_amount) AS current_revenue, -- Prior period comparison LAG(SUM(order_amount), 1) OVER ( PARTITION BY customer_region ORDER BY date_dim.month ) AS prior_month_revenue, -- Calculate month-over-month growth (SUM(order_amount) - LAG(SUM(order_amount), 1) OVER ( PARTITION BY customer_region ORDER BY date_dim.month )) / LAG(SUM(order_amount), 1) OVER ( PARTITION BY customer_region ORDER BY date_dim.month ) AS mom_growth_rate, -- Year-over-year comparison LAG(SUM(order_amount), 12) OVER ( PARTITION BY customer_region ORDER BY date_dim.month ) AS prior_year_revenue, -- Dimensions date_dim.year, date_dim.month, date_dim.month_name, customers.region AS customer_region FROM main.sales.orders JOIN main.common.date_dimension date_dim ON orders.order_date = date_dim.date_key JOIN main.sales.customers ON orders.customer_id = customers.customer_id GROUP BY date_dim.year, date_dim.month, date_dim.month_name, customers.region;
Year-to-date calculations:
sqlCREATE METRIC VIEW main.sales.ytd_metrics AS SELECT -- Current period revenue SUM(order_amount) AS revenue, -- YTD revenue using window clause SUM(SUM(order_amount)) OVER ( PARTITION BY date_dim.year, customer_region ORDER BY date_dim.date_key ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS ytd_revenue, -- MTD revenue SUM(SUM(order_amount)) OVER ( PARTITION BY date_dim.year, date_dim.month, customer_region ORDER BY date_dim.date_key ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS mtd_revenue, -- QTD revenue SUM(SUM(order_amount)) OVER ( PARTITION BY date_dim.year, date_dim.quarter, customer_region ORDER BY date_dim.date_key ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS qtd_revenue, -- Dimensions date_dim.date_key, date_dim.year, date_dim.quarter, date_dim.month, customers.region AS customer_region FROM main.sales.orders JOIN main.common.date_dimension date_dim ON orders.order_date = date_dim.date_key JOIN main.sales.customers ON orders.customer_id = customers.customer_id GROUP BY date_dim.date_key, date_dim.year, date_dim.quarter, date_dim.month, customers.region;
Moving average calculations:
sqlCREATE METRIC VIEW main.sales.moving_average_metrics AS SELECT SUM(order_amount) AS daily_revenue, -- 7-day moving average AVG(SUM(order_amount)) OVER ( PARTITION BY customer_region ORDER BY date_dim.date_key ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS ma_7day, -- 30-day moving average AVG(SUM(order_amount)) OVER ( PARTITION BY customer_region ORDER BY date_dim.date_key ROWS BETWEEN 29 PRECEDING AND CURRENT ROW ) AS ma_30day, -- 90-day moving average AVG(SUM(order_amount)) OVER ( PARTITION BY customer_region ORDER BY date_dim.date_key ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ma_90day, date_dim.date_key, customers.region AS customer_region FROM main.sales.orders JOIN main.common.date_dimension date_dim ON orders.order_date = date_dim.date_key JOIN main.sales.customers ON orders.customer_id = customers.customer_id GROUP BY date_dim.date_key, customers.region;
Step 6: Configure Security and Governance
Row-level security implementation:
sql-- Apply row filters on dimension tables CREATE OR REPLACE TABLE main.sales.customers ( customer_id STRING PRIMARY KEY, customer_name STRING, region STRING, segment STRING, sales_rep STRING ) USING DELTA; -- Create row access policy CREATE FUNCTION main.sales.region_filter(region STRING) RETURNS BOOLEAN RETURN EXISTS ( SELECT 1 FROM main.security.user_region_access WHERE user_email = current_user() AND authorized_region = region ); -- Apply filter to table ALTER TABLE main.sales.customers SET ROW FILTER main.sales.region_filter(region); -- Metric views automatically inherit row-level security CREATE METRIC VIEW main.sales.regional_sales AS SELECT SUM(order_amount) AS revenue, COUNT(DISTINCT order_id) AS order_count, customers.region FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id;
Column masking:
sql-- Mask sensitive customer information CREATE FUNCTION main.sales.mask_customer_name(name STRING) RETURNS STRING RETURN CASE WHEN is_member('sensitive_data_access') THEN name ELSE CONCAT(LEFT(name, 1), '***') END; ALTER TABLE main.sales.customers ALTER COLUMN customer_name SET MASK main.sales.mask_customer_name;
Grant metric view access:
sql-- Grant access to specific roles GRANT SELECT ON METRIC VIEW main.sales.sales_metrics TO `sales_analyst`; GRANT SELECT ON METRIC VIEW main.sales.finance_metrics TO `finance_team`; -- Grant access to all metric views in schema GRANT SELECT ON SCHEMA main.sales TO `executive_team`;
Add certification metadata:
sqlALTER METRIC VIEW main.sales.sales_metrics SET TBLPROPERTIES ( 'certified' = 'true', 'owner' = 'sales_analytics', 'last_validated' = '2025-01-15', 'validation_status' = 'passed', 'source_system' = 'migrated_from_kyvos' ); -- Add discovery tags ALTER METRIC VIEW main.sales.sales_metrics SET TAGS ( 'domain' = 'sales', 'sensitivity' = 'internal', 'pii' = 'false', 'refresh_frequency' = 'real_time' );
Step 7: Optimize for Query Performance
Implement materialized aggregations:
sql-- Create materialized view for frequently queried aggregations CREATE MATERIALIZED VIEW main.sales.daily_sales_rollup AS SELECT order_date, customer_region, product_category, SUM(order_amount) AS daily_revenue, COUNT(DISTINCT customer_id) AS daily_customers, COUNT(DISTINCT order_id) AS daily_orders, SUM(quantity) AS daily_quantity FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id JOIN main.sales.products ON orders.product_id = products.product_id GROUP BY order_date, customer_region, product_category; -- Query materialized view instead of raw data SELECT order_date, customer_region, SUM(daily_revenue) AS revenue FROM main.sales.daily_sales_rollup WHERE order_date >= '2024-01-01' GROUP BY order_date, customer_region;
Configure table optimization:
sql-- Enable auto-optimize for Delta tables ALTER TABLE main.sales.orders SET TBLPROPERTIES ( 'delta.autoOptimize.optimizeWrite' = 'true', 'delta.autoOptimize.autoCompact' = 'true' ); -- Schedule regular optimization CREATE OR REPLACE PROCEDURE optimize_sales_tables() LANGUAGE SQL AS BEGIN OPTIMIZE main.sales.orders ZORDER BY (order_date, customer_id); OPTIMIZE main.sales.customers; OPTIMIZE main.sales.products; END; -- Create job to run optimization CREATE JOB daily_optimization SCHEDULE CRON '0 2 * * *' -- 2 AM daily AS CALL optimize_sales_tables();
Implement caching strategy:
sql-- Delta cache automatically handles frequently accessed data -- Query results cache for repeated identical queries -- Force cache refresh when needed REFRESH METRIC VIEW main.sales.sales_metrics; -- Monitor cache effectiveness SELECT table_name, cache_hit_ratio, bytes_cached, bytes_read FROM system.information_schema.table_cache_statistics WHERE table_schema = 'sales' ORDER BY cache_hit_ratio ASC;
Step 8: Migrate BI Tool Connections
Tableau connection update:
Before (Kyvos):
Server: kyvos-server.company.com
Port: 8080
Connection Type: Kyvos OLAP
Authentication: Kyvos credentials
After (Databricks):
Server: company.databricks.com
HTTP Path: /sql/1.0/warehouses/warehouse_id
Connection Type: Databricks
Authentication: Personal Access Token
Query pattern translation:
sql-- Kyvos MDX query pattern: -- SELECT [Measures].[Revenue] ON COLUMNS, -- [Date].[Year].[Year].MEMBERS ON ROWS -- FROM [Sales_Cube] -- WHERE [Product].[Category].[Electronics] -- Databricks SQL equivalent: SELECT date_dim.year, MEASURE(total_revenue) AS revenue FROM main.sales.sales_metrics JOIN main.common.date_dimension date_dim ON sales_metrics.order_date = date_dim.date_key WHERE product_category = 'Electronics' GROUP BY date_dim.year ORDER BY date_dim.year;
Power BI semantic model update:
Power BI DirectQuery Configuration:
1. Remove Kyvos connector
2. Install Databricks connector from AppSource
3. Connect to SQL Warehouse endpoint
4. Import metric views as DirectQuery tables
5. Update DAX measures:
Before:
Total Revenue = SUM(Sales[Revenue])
After:
Total Revenue = SUM('sales_metrics'[total_revenue])
6. Refresh dataset schema
7. Validate dashboard visuals
Excel connection update:
Excel Configuration:
1. Remove MDX connection to Kyvos
2. Add new connection via Data > Get Data > Databricks
3. Configure connection:
- Server: company.databricks.com
- HTTP Path: /sql/1.0/warehouses/warehouse_id
- Authentication: Azure Active Directory
4. Load metric view as Table
5. Create PivotTables using metric view data
Step 9: Validate Metric Accuracy
Create validation framework:
sql-- Export Kyvos results for validation -- (Assume Kyvos data exported to validation schema) CREATE TABLE main.validation.kyvos_export_results AS SELECT * FROM parquet.`s3://migration/kyvos_export/sales_metrics/*.parquet`; -- Compare with Databricks metric view CREATE TABLE main.validation.comparison_results AS SELECT COALESCE(k.order_date, d.order_date) AS order_date, COALESCE(k.region, d.region) AS region, k.revenue AS kyvos_revenue, d.revenue AS databricks_revenue, ABS(k.revenue - d.revenue) AS revenue_diff, ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 AS revenue_diff_pct, k.order_count AS kyvos_order_count, d.order_count AS databricks_order_count, ABS(k.order_count - d.order_count) AS order_count_diff, CASE WHEN ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 <= 0.01 THEN 'PASS' ELSE 'FAIL' END AS validation_status FROM main.validation.kyvos_export_results k FULL OUTER JOIN ( SELECT order_date, customer_region AS region, MEASURE(total_revenue) AS revenue, MEASURE(order_count) AS order_count FROM main.sales.sales_metrics GROUP BY order_date, customer_region ) d ON k.order_date = d.order_date AND k.region = d.region WHERE ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 > 0.01; -- Report validation failures SELECT COUNT(*) AS total_comparisons, SUM(CASE WHEN validation_status = 'PASS' THEN 1 ELSE 0 END) AS passed, SUM(CASE WHEN validation_status = 'FAIL' THEN 1 ELSE 0 END) AS failed, AVG(revenue_diff_pct) AS avg_revenue_diff_pct, MAX(revenue_diff_pct) AS max_revenue_diff_pct FROM main.validation.comparison_results;
Automated validation queries:
sql-- Validate measure totals WITH kyvos_totals AS ( SELECT SUM(revenue) AS total_revenue, SUM(order_count) AS total_orders FROM main.validation.kyvos_export_results ), databricks_totals AS ( SELECT SUM(MEASURE(total_revenue)) AS total_revenue, SUM(MEASURE(order_count)) AS total_orders FROM main.sales.sales_metrics ) SELECT k.total_revenue AS kyvos_total_revenue, d.total_revenue AS databricks_total_revenue, ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 AS revenue_variance_pct, k.total_orders AS kyvos_total_orders, d.total_orders AS databricks_total_orders, ABS(k.total_orders - d.total_orders) AS order_variance, CASE WHEN ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 <= 0.01 AND ABS(k.total_orders - d.total_orders) = 0 THEN 'VALIDATION PASSED' ELSE 'VALIDATION FAILED' END AS validation_result FROM kyvos_totals k, databricks_totals d;
Step 10: Apply Typedef for Data Transformation
Typedef's Fenic framework provides semantic data processing capabilities for migration preprocessing:
Extract and classify measures:
pythonimport fenic as fc from pydantic import BaseModel, Field from typing import List, Literal # Configure session config = fc.SessionConfig( default_model="gpt-4o-mini", max_retries=3, timeout=30 ) session = fc.FenicSession(config) fc.set_session(session) # Define measure classification schema class MeasureClassification(BaseModel): measure_type: Literal["simple_sum", "simple_count", "simple_avg", "calculated", "distinct_count", "semi_additive", "time_intelligence", "mdx_specific"] sql_translation: str = Field(description="Equivalent Spark SQL expression") dependencies: List[str] = Field(description="List of dependent columns or measures") translation_notes: str = Field(description="Notes about translation approach") # Load Kyvos measure definitions measures_df = fc.read_csv("kyvos_measure_export.csv") # Classify each measure classified_measures = ( measures_df .with_column( "classification", fc.semantic.extract( fc.col("measure_definition"), MeasureClassification ) ) .unnest("classification") ) # Export classification results # Note: Write operations not documented in Fenic - use alternative export method classified_measures.collect() # Or handle export through other means
Transform hierarchical dimension data:
pythonfrom pydantic import BaseModel from typing import List class HierarchyStructure(BaseModel): dimension_name: str levels: List[str] level_count: int has_ragged_structure: bool flattening_approach: str = Field(description="Recommended approach to flatten hierarchy") # Load Kyvos hierarchy definitions hierarchy_df = fc.read_json("kyvos_hierarchy_export.json") # Analyze hierarchy structures analyzed = ( hierarchy_df .with_column( "structure", fc.semantic.extract( fc.col("hierarchy_definition"), HierarchyStructure ) ) .unnest("structure") ) # Generate flattening SQL flattening_sql = ( analyzed .with_column( "ddl_statement", fc.semantic.map( fc.col("flattening_approach"), "Generate CREATE TABLE SQL for this dimension flattening approach" ) ) ) # Note: Write operations not documented in Fenic - use alternative export method flattening_sql.collect() # Or handle export through other means
Validate measure logic:
pythonclass MeasureValidation(BaseModel): is_valid: bool validation_notes: str recommended_test_queries: List[str] # Validate translated measures validation_df = fc.read_csv("translated_measures.csv") validated = ( validation_df .with_column( "validation", fc.semantic.extract( fc.col("sql_translation"), MeasureValidation ) ) .unnest("validation") .filter(fc.col("is_valid") == True) ) # Note: Write operations not documented in Fenic - use alternative export method validated.collect() # Or handle export through other means
Enrich customer dimensions:
pythonclass CustomerSegment(BaseModel): segment_type: Literal["enterprise", "mid_market", "small_business", "individual"] value_tier: Literal["high_value", "medium_value", "low_value"] engagement_level: Literal["active", "moderate", "dormant"] # Load customer transaction data customer_df = fc.read_parquet("s3://migration/customer_transactions/*.parquet") # Apply semantic enrichment enriched = ( customer_df .with_column( "segment", fc.semantic.extract( fc.col("transaction_history"), CustomerSegment ) ) .unnest("segment") ) # Write enriched customer dimension # Note: Write operations not documented in Fenic - use alternative export method enriched.collect() # Or handle export through other means
Post-Migration Optimization
Monitor Query Performance
Track metric view usage:
sql-- Query audit logs SELECT request_params.metric_view, COUNT(*) AS query_count, AVG(execution_duration_ms) AS avg_duration_ms, PERCENTILE(execution_duration_ms, 0.50) AS p50_duration_ms, PERCENTILE(execution_duration_ms, 0.95) AS p95_duration_ms, PERCENTILE(execution_duration_ms, 0.99) AS p99_duration_ms, MAX(execution_duration_ms) AS max_duration_ms FROM system.access.audit WHERE action_name = 'queryMetricView' AND event_date >= CURRENT_DATE - 7 GROUP BY request_params.metric_view ORDER BY query_count DESC;
Identify optimization opportunities:
sql-- Find slow queries SELECT request_params.metric_view, request_params.query_text, execution_duration_ms, rows_produced, bytes_scanned, user_identity.email FROM system.access.audit WHERE action_name = 'queryMetricView' AND execution_duration_ms > 10000 -- Queries over 10 seconds AND event_date >= CURRENT_DATE - 1 ORDER BY execution_duration_ms DESC LIMIT 50;
Monitor warehouse utilization:
sql-- Warehouse performance metrics SELECT warehouse_id, warehouse_name, DATE(start_time) AS date, COUNT(*) AS query_count, SUM(execution_duration) AS total_execution_time_ms, AVG(execution_duration) AS avg_execution_time_ms, SUM(rows_produced) AS total_rows_produced, SUM(CASE WHEN error_message IS NOT NULL THEN 1 ELSE 0 END) AS failed_queries FROM system.query.history WHERE start_time >= CURRENT_DATE - 7 GROUP BY warehouse_id, warehouse_name, DATE(start_time) ORDER BY date DESC, total_execution_time_ms DESC;
Implement Performance Tuning
Create aggregation tables for hot paths:
sql-- Monthly aggregation for reporting CREATE TABLE main.sales.monthly_aggregates AS SELECT DATE_TRUNC('month', order_date) AS month, customer_region, product_category, SUM(order_amount) AS monthly_revenue, COUNT(DISTINCT customer_id) AS monthly_customers, COUNT(DISTINCT order_id) AS monthly_orders, SUM(quantity) AS monthly_quantity FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id JOIN main.sales.products ON orders.product_id = products.product_id GROUP BY DATE_TRUNC('month', order_date), customer_region, product_category; -- Use aggregation table for month-level queries SELECT month, customer_region, SUM(monthly_revenue) AS revenue FROM main.sales.monthly_aggregates WHERE month >= '2024-01-01' GROUP BY month, customer_region;
Optimize partition strategy:
sql-- Repartition fact table by frequently filtered columns CREATE OR REPLACE TABLE main.sales.orders USING DELTA PARTITIONED BY (order_year, order_month) AS SELECT *, YEAR(order_date) AS order_year, MONTH(order_date) AS order_month FROM main.sales.orders_staging; -- Analyze partition statistics DESCRIBE DETAIL main.sales.orders;
Establish Data Quality Monitoring
Create quality check framework:
sqlCREATE TABLE main.monitoring.metric_quality_log ( check_timestamp TIMESTAMP, metric_view STRING, check_type STRING, check_status STRING, error_details STRING, actual_value DECIMAL(18,2), expected_range STRING ); -- Quality check procedure CREATE OR REPLACE PROCEDURE validate_metric_quality() LANGUAGE SQL AS BEGIN -- Check for NULL measures INSERT INTO main.monitoring.metric_quality_log SELECT CURRENT_TIMESTAMP, 'main.sales.sales_metrics', 'null_check', CASE WHEN COUNT(*) = 0 THEN 'PASS' ELSE 'FAIL' END, CASE WHEN COUNT(*) > 0 THEN CONCAT('Found ', COUNT(*), ' NULL values') ELSE NULL END, COUNT(*), '0' FROM ( SELECT order_date FROM main.sales.sales_metrics WHERE MEASURE(total_revenue) IS NULL ); -- Check for negative revenue INSERT INTO main.monitoring.metric_quality_log SELECT CURRENT_TIMESTAMP, 'main.sales.sales_metrics', 'negative_value_check', CASE WHEN MIN(MEASURE(total_revenue)) >= 0 THEN 'PASS' ELSE 'FAIL' END, CASE WHEN MIN(MEASURE(total_revenue)) < 0 THEN 'Negative revenue detected' ELSE NULL END, MIN(MEASURE(total_revenue)), '>= 0' FROM main.sales.sales_metrics; -- Check for data freshness INSERT INTO main.monitoring.metric_quality_log SELECT CURRENT_TIMESTAMP, 'main.sales.sales_metrics', 'freshness_check', CASE WHEN MAX(order_date) >= CURRENT_DATE - 1 THEN 'PASS' ELSE 'FAIL' END, CASE WHEN MAX(order_date) < CURRENT_DATE - 1 THEN 'Stale data detected' ELSE NULL END, DATEDIFF(DAY, MAX(order_date), CURRENT_DATE), '<= 1 day' FROM main.sales.sales_metrics; END; -- Schedule quality checks CREATE JOB metric_quality_checks SCHEDULE CRON '0 */4 * * *' -- Every 4 hours AS CALL validate_metric_quality();
Alert on quality failures:
sql-- Query recent quality check failures SELECT check_timestamp, metric_view, check_type, error_details, actual_value, expected_range FROM main.monitoring.metric_quality_log WHERE check_status = 'FAIL' AND check_timestamp >= CURRENT_TIMESTAMP - INTERVAL 24 HOURS ORDER BY check_timestamp DESC;
Document Migration Artifacts
Create metadata registry:
sqlCREATE TABLE main.documentation.migration_metadata ( metric_view STRING, source_kyvos_cube STRING, migration_date DATE, validation_status STRING, known_differences STRING, performance_baseline STRING, owner STRING, documentation_url STRING ); INSERT INTO main.documentation.migration_metadata VALUES ( 'main.sales.sales_metrics', 'Sales_Analysis_Cube', '2025-01-15', 'VALIDATED', 'Profit margin calculation: 0.05% difference due to NULL handling approach', 'P95 latency: 2.3s for typical queries on 500M rows', 'sales_analytics_team@company.com', 'https://wiki.company.com/databricks/metric-views/sales-metrics' );
Version control metric definitions:
Store metric view YAML definitions in source control:
yaml# git repository: databricks-metric-views/sales/sales_metrics.yaml version: 1.0.0 migration_date: 2025-01-15 source_system: kyvos source_cube: Sales_Analysis_Cube validation: status: passed validation_date: 2025-01-20 total_revenue_variance: 0.001% order_count_variance: 0 known_differences: - measure: profit_margin difference: 0.05% reason: "Different NULL handling: Databricks uses COALESCE, Kyvos ignored NULLs" impact: negligible - feature: hierarchical_drilldown difference: manual_implementation reason: "Databricks requires explicit dimension levels, Kyvos had automatic drill-down" impact: "BI tools must specify dimension granularity in queries" performance: baseline_query_latency_p95: 2.3s data_volume: 500M_rows warehouse_size: medium metric_view: name: sales_metrics catalog: main schema: sales # ... rest of metric view definition
Migration Patterns and Solutions
Pattern 1: Pre-Aggregation Performance Gap
Scenario: Kyvos MOLAP cubes provide sub-second query response through pre-aggregation. Databricks on-demand computation shows higher latency for cold queries.
Solution approaches:
Materialized views for high-frequency queries:
sqlCREATE MATERIALIZED VIEW main.sales.hourly_revenue AS SELECT DATE_TRUNC('hour', order_timestamp) AS hour, customer_region, product_category, SUM(order_amount) AS hourly_revenue, COUNT(DISTINCT order_id) AS hourly_orders, COUNT(DISTINCT customer_id) AS hourly_customers FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id JOIN main.sales.products ON orders.product_id = products.product_id GROUP BY DATE_TRUNC('hour', order_timestamp), customer_region, product_category; -- Query materialized view instead of raw data SELECT hour, customer_region, SUM(hourly_revenue) AS revenue FROM main.sales.hourly_revenue WHERE hour >= CURRENT_TIMESTAMP - INTERVAL 24 HOURS GROUP BY hour, customer_region;
Pre-computed rollup tables:
sql-- Daily rollup CREATE TABLE main.sales.daily_rollup AS SELECT order_date, customer_region, product_category, SUM(order_amount) AS daily_revenue, COUNT(DISTINCT order_id) AS daily_orders, AVG(order_amount) AS avg_order_value FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id JOIN main.sales.products ON orders.product_id = products.product_id GROUP BY order_date, customer_region, product_category; -- Schedule daily refresh CREATE JOB refresh_daily_rollup SCHEDULE CRON '0 1 * * *' AS MERGE INTO main.sales.daily_rollup t USING ( SELECT order_date, customer_region, product_category, SUM(order_amount) AS daily_revenue, COUNT(DISTINCT order_id) AS daily_orders, AVG(order_amount) AS avg_order_value FROM main.sales.orders JOIN main.sales.customers ON orders.customer_id = customers.customer_id JOIN main.sales.products ON orders.product_id = products.product_id WHERE order_date >= CURRENT_DATE - 7 GROUP BY order_date, customer_region, product_category ) s ON t.order_date = s.order_date AND t.customer_region = s.customer_region AND t.product_category = s.product_category WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *;
Pattern 2: MDX to SQL Translation
Scenario: BI tools use MDX queries that require translation to Spark SQL.
Common MDX patterns and SQL equivalents:
TOPCOUNT translation:
MDX:
TOPCOUNT([Customer].[Customer].MEMBERS, 10, [Measures].[Revenue])
Spark SQL:
SELECT
customer_id,
customer_name,
MEASURE(total_revenue) AS revenue
FROM main.sales.sales_metrics
GROUP BY customer_id, customer_name
ORDER BY MEASURE(total_revenue) DESC
LIMIT 10;
CROSSJOIN translation:
MDX:
CROSSJOIN([Product].[Category].MEMBERS, [Date].[Year].MEMBERS)
Spark SQL:
SELECT DISTINCT
product_category,
year
FROM main.sales.sales_metrics
CROSS JOIN main.common.date_dimension
ORDER BY product_category, year;
FILTER translation:
MDX:
FILTER([Customer].[Region].MEMBERS, [Measures].[Revenue] > 1000000)
Spark SQL:
SELECT
customer_region,
MEASURE(total_revenue) AS revenue
FROM main.sales.sales_metrics
GROUP BY customer_region
HAVING MEASURE(total_revenue) > 1000000;
Pattern 3: Semi-Additive Measure Handling
Scenario: Account balances and inventory levels that don't sum across time dimension.
Solution:
sql-- Last balance per period CREATE METRIC VIEW main.finance.balance_metrics AS SELECT -- Last value for the period MAX_BY(account_balance, snapshot_date) AS ending_balance, -- Average for the period AVG(account_balance) AS avg_balance, -- Opening balance (first value) MIN_BY(account_balance, snapshot_date) AS opening_balance, account_id, account_type, DATE_TRUNC('month', snapshot_date) AS month FROM main.finance.daily_balances GROUP BY account_id, account_type, DATE_TRUNC('month', snapshot_date); -- Query ending balance (semi-additive across accounts, not time) SELECT month, account_type, SUM(ending_balance) AS total_ending_balance FROM main.finance.balance_metrics WHERE month = '2025-01-01' GROUP BY month, account_type;
Pattern 4: Parent-Child Hierarchy Flattening
Scenario: Employee reporting structure with unlimited depth.
Solution:
sql-- Flatten recursive hierarchy CREATE OR REPLACE TABLE main.hr.employee_hierarchy AS WITH RECURSIVE org_structure AS ( -- Base case: CEO and executives SELECT employee_id, employee_name, manager_id, 1 AS level, employee_name AS level1_manager, CAST(NULL AS STRING) AS level2_manager, CAST(NULL AS STRING) AS level3_manager, CAST(NULL AS STRING) AS level4_manager, CAST(employee_name AS STRING) AS reporting_path FROM main.hr.employees WHERE manager_id IS NULL UNION ALL -- Recursive case SELECT e.employee_id, e.employee_name, e.manager_id, os.level + 1, CASE WHEN os.level + 1 = 1 THEN e.employee_name ELSE os.level1_manager END, CASE WHEN os.level + 1 = 2 THEN e.employee_name ELSE os.level2_manager END, CASE WHEN os.level + 1 = 3 THEN e.employee_name ELSE os.level3_manager END, CASE WHEN os.level + 1 = 4 THEN e.employee_name ELSE os.level4_manager END, CONCAT(os.reporting_path, ' > ', e.employee_name) FROM main.hr.employees e JOIN org_structure os ON e.manager_id = os.employee_id WHERE os.level < 10 -- Limit recursion depth ) SELECT * FROM org_structure; -- Use in metric view CREATE METRIC VIEW main.hr.headcount_metrics AS SELECT COUNT(DISTINCT employee_id) AS headcount, SUM(salary) AS total_compensation, level AS org_level, level1_manager, level2_manager, level3_manager FROM main.hr.employee_hierarchy GROUP BY level, level1_manager, level2_manager, level3_manager;
Validation and Testing Framework
Measure Accuracy Validation
Aggregate-level validation:
sql-- Compare total measures CREATE OR REPLACE VIEW main.validation.measure_totals_comparison AS WITH kyvos_totals AS ( SELECT 'kyvos' AS source, SUM(total_revenue) AS total_revenue, SUM(order_count) AS total_orders, SUM(customer_count) AS total_customers FROM main.validation.kyvos_export ), databricks_totals AS ( SELECT 'databricks' AS source, SUM(MEASURE(total_revenue)) AS total_revenue, SUM(MEASURE(order_count)) AS total_orders, SUM(MEASURE(unique_customers)) AS total_customers FROM main.sales.sales_metrics ) SELECT k.total_revenue AS kyvos_revenue, d.total_revenue AS databricks_revenue, ABS(k.total_revenue - d.total_revenue) AS revenue_diff, ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 AS revenue_diff_pct, k.total_orders AS kyvos_orders, d.total_orders AS databricks_orders, ABS(k.total_orders - d.total_orders) AS order_diff, CASE WHEN ABS(k.total_revenue - d.total_revenue) / k.total_revenue * 100 <= 0.01 AND ABS(k.total_orders - d.total_orders) = 0 THEN 'PASS' ELSE 'FAIL' END AS validation_status FROM kyvos_totals k, databricks_totals d;
Dimension-level validation:
sql-- Compare by dimension CREATE OR REPLACE VIEW main.validation.dimension_comparison AS SELECT COALESCE(k.customer_region, d.customer_region) AS region, k.revenue AS kyvos_revenue, d.revenue AS databricks_revenue, ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 AS variance_pct, CASE WHEN ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 <= 1.0 THEN 'PASS' WHEN ABS(k.revenue - d.revenue) / NULLIF(k.revenue, 0) * 100 <= 5.0 THEN 'WARNING' ELSE 'FAIL' END AS status FROM main.validation.kyvos_export k FULL OUTER JOIN ( SELECT customer_region, SUM(MEASURE(total_revenue)) AS revenue FROM main.sales.sales_metrics GROUP BY customer_region ) d ON k.customer_region = d.customer_region ORDER BY variance_pct DESC;
Query Performance Testing
Benchmark query performance:
sql-- Create performance test queries CREATE TABLE main.validation.performance_tests ( test_id INT, test_name STRING, query_text STRING, expected_max_duration_ms INT ); INSERT INTO main.validation.performance_tests VALUES (1, 'daily_revenue_by_region', 'SELECT order_date, customer_region, MEASURE(total_revenue) FROM main.sales.sales_metrics WHERE order_date >= CURRENT_DATE - 7 GROUP BY order_date, customer_region', 5000), (2, 'top_products_ytd', 'SELECT product_name, MEASURE(total_revenue) FROM main.sales.sales_metrics WHERE YEAR(order_date) = YEAR(CURRENT_DATE) GROUP BY product_name ORDER BY MEASURE(total_revenue) DESC LIMIT 100', 3000); -- Execute and log performance CREATE TABLE main.validation.performance_results ( test_id INT, execution_timestamp TIMESTAMP, execution_duration_ms BIGINT, rows_returned BIGINT, test_status STRING ); -- Performance test execution procedure CREATE OR REPLACE PROCEDURE run_performance_tests() LANGUAGE SQL AS BEGIN DECLARE test_cursor CURSOR FOR SELECT test_id, query_text, expected_max_duration_ms FROM main.validation.performance_tests; FOR test_record IN test_cursor DO -- Execute test query and log results -- (Implementation depends on execution framework) INSERT INTO main.validation.performance_results SELECT test_record.test_id, CURRENT_TIMESTAMP, execution_duration, row_count, CASE WHEN execution_duration <= test_record.expected_max_duration_ms THEN 'PASS' ELSE 'FAIL' END FROM query_execution_log WHERE query_id = LAST_QUERY_ID(); END FOR; END;
Data Lineage Verification
Validate measure dependencies:
sql-- Document measure lineage CREATE TABLE main.documentation.measure_lineage ( metric_view STRING, measure_name STRING, source_tables ARRAY<STRING>, source_columns ARRAY<STRING>, transformation_logic STRING ); INSERT INTO main.documentation.measure_lineage VALUES ( 'main.sales.sales_metrics', 'total_revenue', ARRAY('main.sales.orders'), ARRAY('order_amount'), 'SUM(order_amount)' ), ( 'main.sales.sales_metrics', 'net_revenue', ARRAY('main.sales.orders'), ARRAY('order_amount', 'discount_amount', 'refund_amount'), 'SUM(order_amount - discount_amount - refund_amount)' ); -- Query lineage for measure SELECT measure_name, transformation_logic, array_join(source_tables, ', ') AS source_tables, array_join(source_columns, ', ') AS source_columns FROM main.documentation.measure_lineage WHERE metric_view = 'main.sales.sales_metrics' ORDER BY measure_name;
Production Deployment
Establish Metric Governance Framework
Metric certification process:
sql-- Metric certification workflow CREATE TABLE main.governance.metric_certification ( metric_view STRING PRIMARY KEY, certification_status STRING, certified_by STRING, certification_date DATE, review_date DATE, certification_notes STRING ); -- Certification procedure CREATE OR REPLACE PROCEDURE certify_metric_view( view_name STRING, certified_by_email STRING, notes STRING ) LANGUAGE SQL AS BEGIN -- Validate metric view exists IF NOT EXISTS ( SELECT 1 FROM main.information_schema.tables WHERE table_catalog || '.' || table_schema || '.' || table_name = view_name AND table_type = 'METRIC VIEW' ) THEN SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Metric view does not exist'; END IF; -- Record certification MERGE INTO main.governance.metric_certification t USING (SELECT view_name AS metric_view) s ON t.metric_view = s.metric_view WHEN MATCHED THEN UPDATE SET certification_status = 'CERTIFIED', certified_by = certified_by_email, certification_date = CURRENT_DATE, review_date = ADD_MONTHS(CURRENT_DATE, 6), certification_notes = notes WHEN NOT MATCHED THEN INSERT (metric_view, certification_status, certified_by, certification_date, review_date, certification_notes) VALUES (view_name, 'CERTIFIED', certified_by_email, CURRENT_DATE, ADD_MONTHS(CURRENT_DATE, 6), notes); -- Apply certification flag to metric view EXECUTE IMMEDIATE 'ALTER METRIC VIEW ' || view_name || ' SET TBLPROPERTIES (''certified'' = ''true'', ''certification_date'' = ''' || CURRENT_DATE || ''')'; END;
Change management:
sql-- Track metric view changes CREATE TABLE main.governance.metric_change_log ( change_id INT GENERATED ALWAYS AS IDENTITY, metric_view STRING, change_type STRING, change_description STRING, changed_by STRING, change_timestamp TIMESTAMP, version STRING, rollback_ddl STRING ); -- Log changes INSERT INTO main.governance.metric_change_log (metric_view, change_type, change_description, changed_by, change_timestamp, version, rollback_ddl) VALUES ( 'main.sales.sales_metrics', 'MEASURE_ADDED', 'Added profit_margin_pct measure', current_user(), CURRENT_TIMESTAMP, '1.1.0', 'ALTER METRIC VIEW main.sales.sales_metrics DROP MEASURE profit_margin_pct' );
Configure Refresh and Maintenance
Schedule table maintenance:
sql-- Create maintenance procedures CREATE OR REPLACE PROCEDURE maintain_sales_tables() LANGUAGE SQL AS BEGIN -- Optimize fact table OPTIMIZE main.sales.orders ZORDER BY (order_date, customer_id, product_id); -- Vacuum old files VACUUM main.sales.orders RETAIN 168 HOURS; -- Update statistics ANALYZE TABLE main.sales.orders COMPUTE STATISTICS FOR ALL COLUMNS; ANALYZE TABLE main.sales.customers COMPUTE STATISTICS FOR ALL COLUMNS; ANALYZE TABLE main.sales.products COMPUTE STATISTICS FOR ALL COLUMNS; -- Refresh materialized views REFRESH MATERIALIZED VIEW main.sales.daily_sales_rollup; REFRESH MATERIALIZED VIEW main.sales.hourly_revenue; END; -- Schedule maintenance CREATE JOB sales_table_maintenance SCHEDULE CRON '0 2 * * *' -- 2 AM daily AS CALL maintain_sales_tables();
Implement Monitoring and Alerting
Create monitoring dashboard:
sql-- Metric health metrics CREATE OR REPLACE VIEW main.monitoring.metric_health AS SELECT metric_view, COUNT(*) AS query_count_24h, AVG(execution_duration_ms) AS avg_duration_ms, PERCENTILE(execution_duration_ms, 0.95) AS p95_duration_ms, SUM(CASE WHEN error_message IS NOT NULL THEN 1 ELSE 0 END) AS error_count, MAX(event_time) AS last_query_time FROM system.access.audit WHERE action_name = 'queryMetricView' AND event_date >= CURRENT_DATE - 1 GROUP BY metric_view; -- Alert on unhealthy metrics SELECT metric_view, 'HIGH_ERROR_RATE' AS alert_type, CONCAT('Error rate: ', ROUND(error_count * 100.0 / query_count_24h, 2), '%') AS alert_message FROM main.monitoring.metric_health WHERE error_count * 100.0 / query_count_24h > 5.0 -- More than 5% error rate UNION ALL SELECT metric_view, 'SLOW_PERFORMANCE' AS alert_type, CONCAT('P95 latency: ', p95_duration_ms, 'ms') AS alert_message FROM main.monitoring.metric_health WHERE p95_duration_ms > 10000; -- P95 over 10 seconds
Conclusion
Migrating from Kyvos to Databricks Unity Catalog Metric Views requires systematic translation of OLAP concepts to lakehouse-native patterns. The architecture shift from pre-aggregated cubes to on-demand computation demands careful performance tuning and query optimization. Typedef's Fenic framework provides semantic processing capabilities for preprocessing migration artifacts, validating measure logic, and enriching dimension data during the migration process.
The lakehouse-native approach eliminates separate semantic layer infrastructure while maintaining governed metric definitions through Unity Catalog. Organizations gain unified data access patterns, native AI integration, and real-time metric freshness at the cost of rewriting MDX queries and flattening hierarchical dimensions.
For additional technical resources on semantic data transformation and AI-powered data processing, refer to Typedef's documentation and Fenic framework guides.
