<< goback()

How to Migrate from Timbr.ai to Databricks Unity Catalog Metric Views

Typedef Team

How to Migrate from Timbr.ai to Databricks Unity Catalog Metric Views

Migrating from Timbr.ai's ontology-based semantic layer to Databricks Unity Catalog Metric Views requires converting knowledge graphs to YAML metric definitions, reimplementing relationships as joins, and translating semantic SQL to MEASURE() syntax. Typedef's Fenic framework bridges the migration gap by providing declarative semantic operations that work with Databricks, enabling teams to maintain semantic intelligence while gaining native lakehouse performance and governance.


Why Organizations Migrate from Timbr.ai to Databricks

The Consolidation Imperative

Organizations running Timbr.ai typically face infrastructure complexity: semantic layer sits atop existing databases, creating an additional query hop and operational overhead. When Databricks already serves as the primary data platform for ML, data engineering, and analytics, maintaining separate semantic infrastructure introduces friction.

The consolidation drivers:

  • Single platform operations: Eliminate Timbr.ai licensing and infrastructure costs
  • Native Unity Catalog integration: Metrics become first-class catalog objects with lineage
  • Unified governance: RBAC, audit logs, and access control managed in one place
  • Performance gains: Query execution happens natively in Spark/Photon without external hops
  • Lakehouse alignment: Metrics work seamlessly across SQL, Python notebooks, and ML pipelines

When Migration Makes Sense

Migrate from Timbr.ai when:

  • Databricks handles 80%+ of analytical workloads
  • Data science and BI teams need unified metric definitions
  • Infrastructure consolidation drives cost reduction
  • Unity Catalog already governs data assets
  • Native Databricks AI/BI features (Genie, Assistant) are strategic

Maintain Timbr.ai when:

  • Multi-warehouse architecture spans Snowflake, BigQuery, Databricks
  • SPARQL queries and RDF ontologies are requirements
  • Heavy investment in OWL-based domain ontologies
  • Data federation across external sources is core use case

Architecture Comparison: Timbr.ai vs Databricks Metric Views

Timbr.ai Architecture

Virtual knowledge graph sits between consumers and data sources:

BI Tools / Notebooks
         ↓
   Semantic SQL (Timbr)
         ↓
   SQL Knowledge Graph
         ↓
   Ontology Mapping
         ↓
Physical Databases/Data Lakes

Key characteristics:

  • Ontology-driven: OWL/RDFS concepts mapped to SQL tables
  • Relationship-first: Explicit semantic relationships replace JOINs
  • Data virtualization: No data movement, query pushdown to sources
  • Inference capabilities: Semantic reasoning over relationships
  • SQL interface: Standard SQL queries against ontology concepts

Databricks Metric Views Architecture

Native catalog objects within the lakehouse:

BI Tools / Notebooks / Genie
         ↓
   MEASURE() SQL syntax
         ↓
Unity Catalog Metric Views
         ↓
   Spark SQL / Photon
         ↓
   Delta Lake Tables

Key characteristics:

  • Measure-driven: Aggregations defined independently of dimensions
  • Schema-based joins: Star/snowflake schema relationships in YAML
  • Native execution: Spark SQL compiles measures into optimized queries
  • Unity Catalog integration: Metrics are governed catalog objects
  • Flexible aggregation: Query measures across any dimension at runtime

Critical Differences

AspectTimbr.aiDatabricks Metric Views
Semantic ModelOntology (concepts, properties, relationships)Metric View (measures, dimensions, joins)
Query SyntaxSemantic SQL (relationships implicit)Standard SQL + MEASURE() clause
ExecutionVirtual layer with pushdownNative Spark SQL / Photon
Data LocationFederated across sourcesDatabricks Delta Lake
GovernanceTimbr RBAC + source permissionsUnity Catalog RBAC
InferenceBuilt-in semantic reasoningNo inference (explicit definitions)

Migration Process Overview

Phase 1: Assessment and Planning

1. Inventory Timbr.ai Assets

Document existing semantic infrastructure:

sql
-- Query Timbr metadata to list all ontologies
SELECT ontology_name, concept_count, relationship_count
FROM timbr.metadata.ontologies;

-- Export ontology definitions
SELECT concept_name, properties, relationships
FROM timbr.metadata.concepts
WHERE ontology_name = 'sales_ontology';

Catalog:

  • Ontologies: Business domains modeled (Sales, Customer, Product)
  • Concepts: Entity types defined (Customer, Order, Product)
  • Relationships: Connections between concepts (Customer → Order, Order → Product)
  • Measures: Aggregations and calculations
  • Access patterns: Most queried concepts and relationships

2. Map Data Sources to Databricks

Identify where Timbr.ai queries data:

Timbr SQL Knowledge Graph
         ↓
    Federated Sources:
    - Oracle (CRM data)
    - PostgreSQL (Orders)
    - Snowflake (Analytics)

Migration strategy per source:

  • Already in Databricks: Direct mapping to Delta tables
  • External databases: Replicate to Databricks via Delta Live Tables or ETL
  • Real-time sources: Consider Databricks streaming ingestion

3. Assess Query Patterns

Analyze how business users query Timbr.ai:

sql
-- Example Timbr semantic SQL
SELECT
    customer.region,
    SUM(order.amount) as total_revenue
FROM customer
WHERE customer.tier = 'enterprise'

Note implicit relationships: Timbr automatically joined customerorder via ontology relationships. Databricks requires explicit join definitions in metric views.

Phase 2: Data Consolidation

1. Replicate External Data to Databricks

For data outside Databricks, establish ingestion pipelines:

python
# Delta Live Tables pipeline example
import dlt
from pyspark.sql.functions import *

@dlt.table(
    name="customers_bronze",
    comment="Raw customer data from Oracle CRM"
)
def customers_bronze():
    return (
        spark.readStream
        .format("jdbc")
        .option("url", "jdbc:oracle:thin:@crm-db:1521/ORCL")
        .option("dbtable", "customers")
        .option("user", dbutils.secrets.get("db", "oracle_user"))
        .option("password", dbutils.secrets.get("db", "oracle_password"))
        .load()
    )

@dlt.table(
    name="customers_silver",
    comment="Cleansed customer data with quality checks"
)
@dlt.expect_or_drop("valid_email", "email IS NOT NULL AND email LIKE '%@%'")
def customers_silver():
    return (
        dlt.read("customers_bronze")
        .withColumn("ingestion_timestamp", current_timestamp())
        .dropDuplicates(["customer_id"])
    )

2. Build Star Schema in Delta Lake

Transform data into dimensional models:

sql
-- Create dimension tables
CREATE TABLE IF NOT EXISTS sales.dim_customers (
    customer_key BIGINT GENERATED ALWAYS AS IDENTITY,
    customer_id STRING,
    customer_name STRING,
    region STRING,
    tier STRING,
    created_date DATE
);

CREATE TABLE IF NOT EXISTS sales.dim_products (
    product_key BIGINT GENERATED ALWAYS AS IDENTITY,
    product_id STRING,
    product_name STRING,
    category STRING,
    price DECIMAL(10,2)
);

-- Create fact table
CREATE TABLE IF NOT EXISTS sales.fact_orders (
    order_key BIGINT GENERATED ALWAYS AS IDENTITY,
    customer_key BIGINT,
    product_key BIGINT,
    order_date DATE,
    order_amount DECIMAL(10,2),
    quantity INT,
    CONSTRAINT fk_customer FOREIGN KEY (customer_key) REFERENCES sales.dim_customers(customer_key),
    CONSTRAINT fk_product FOREIGN KEY (product_key) REFERENCES sales.dim_products(product_key)
);

Phase 3: Convert Ontologies to Metric Views

1. Map Timbr.ai Concepts to Databricks Tables

Timbr.ai ontology:

Concept: Customer
  Properties: customer_id, name, region, tier

Concept: Order
  Properties: order_id, order_date, amount
  Relationship: Order.customer → Customer

Concept: Product
  Properties: product_id, name, category
  Relationship: Order.product → Product

Databricks mapping:

dim_customers → Customer concept
fact_orders → Order concept
dim_products → Product concept

Joins define relationships

2. Translate Measures to Metric View YAML

Timbr.ai measure definitions (conceptual):

sql
-- Measures in Timbr typically defined in UI or metadata
MEASURE total_revenue = SUM(Order.amount)
MEASURE order_count = COUNT(Order.order_id)
MEASURE avg_order_value = SUM(Order.amount) / COUNT(Order.order_id)

Databricks Metric View YAML:

yaml
metric_view:
  name: sales_metrics
  source_table: sales.fact_orders

  joins:
    - name: customers
      source: sales.dim_customers
      on: fact_orders.customer_key = customers.customer_key

    - name: products
      source: sales.dim_products
      on: fact_orders.product_key = products.product_key

  dimensions:
    - name: customer_region
      expr: customers.region
      description: "Customer geographic region"

    - name: customer_tier
      expr: customers.tier
      description: "Customer tier: enterprise, pro, free"

    - name: product_category
      expr: products.category
      description: "Product category"

    - name: order_date
      expr: fact_orders.order_date
      description: "Date order was placed"

  measures:
    - name: total_revenue
      expr: SUM(order_amount)
      description: "Total revenue from orders"

    - name: order_count
      expr: COUNT(*)
      description: "Number of orders"

    - name: avg_order_value
      expr: SUM(order_amount) / COUNT(*)
      description: "Average revenue per order"

    - name: unique_customers
      expr: COUNT(DISTINCT customer_key)
      description: "Number of distinct customers"

3. Create Metric View in Unity Catalog

sql
-- Create metric view from YAML
CREATE METRIC VIEW sales.sales_metrics
FROM 'dbfs:/metric_views/sales_metrics.yaml';

-- Or define inline using SQL
CREATE METRIC VIEW sales.sales_metrics AS
SELECT
  SUM(order_amount) AS total_revenue,
  COUNT(*) AS order_count,
  COUNT(DISTINCT customer_key) AS unique_customers,
  customers.region AS customer_region,
  customers.tier AS customer_tier,
  products.category AS product_category
FROM sales.fact_orders
JOIN sales.dim_customers customers
  ON fact_orders.customer_key = customers.customer_key
JOIN sales.dim_products products
  ON fact_orders.product_key = products.product_key;

Phase 4: Query Translation

From Timbr.ai Semantic SQL:

sql
-- Implicit relationships via ontology
SELECT
    customer.region,
    SUM(order.amount) as revenue
FROM customer
JOIN order ON customer.customer_id = order.customer_id
WHERE customer.tier = 'enterprise'
GROUP BY customer.region;

To Databricks Metric View:

sql
-- Explicit MEASURE() syntax
SELECT
    customer_region,
    MEASURE(total_revenue) AS revenue
FROM sales.sales_metrics
WHERE customer_tier = 'enterprise'
GROUP BY customer_region;

Phase 5: Semantic Intelligence with Typedef

Databricks Metric Views handle aggregation but lack Timbr.ai's semantic extraction and enrichment. Typedef's Fenic framework fills this gap.

Semantic extraction from unstructured data:

python
import fenic as fc
from pydantic import BaseModel, Field
from typing import Literal

# Define schema for customer feedback
class FeedbackInsight(BaseModel):
    sentiment: Literal["positive", "neutral", "negative"]
    key_issues: list[str]
    product_mentions: list[str]
    urgency: Literal["low", "medium", "high"]

# Configure Fenic session
config = fc.SessionConfig(
    app_name="customer_feedback_enrichment",
    semantic=fc.SemanticConfig(
        language_models={
            "fast": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=100, tpm=50000)
        },
        default_language_model="fast"
    )
)

session = fc.Session.get_or_create(config)

# Read feedback from Delta table
feedback_df = session.read.delta("sales.customer_feedback")

# Extract structured insights
enriched_feedback = (
    feedback_df
    .with_column("insights", fc.semantic.extract(fc.col("feedback_text"), FeedbackInsight))
    .unnest("insights")
    .filter(fc.col("sentiment") == "negative")
    .filter(fc.col("urgency") == "high")
)

# Write back to Delta Lake for metric views to consume
enriched_feedback.write.delta("sales.feedback_insights")

Now Databricks Metric Views can aggregate over enriched semantic data:

yaml
metric_view:
  name: feedback_metrics
  source_table: sales.feedback_insights

  dimensions:
    - name: sentiment
      expr: sentiment

    - name: urgency
      expr: urgency

  measures:
    - name: feedback_count
      expr: COUNT(*)

    - name: negative_high_urgency_count
      expr: COUNT_IF(sentiment = 'negative' AND urgency = 'high')

Semantic join for entity resolution:

python
# Join customer feedback with product catalog using semantic similarity
products_df = session.read.parquet("sales/dim_products.parquet")
# Semantic join matches product mentions in feedback to actual products
matched_feedback = (
    enriched_feedback
    .semantic.join(
        other=products_df,
        predicate="""
            The product mention "{{ left_on }}" refers to the product "{{ right_on }}"
        """,
        left_on=fc.col("product_mentions"),
        right_on=fc.col("product_name")
    )
)

matched_feedback.write.parquet("sales/feedback_products.parquet")

This pattern bridges Timbr.ai's semantic intelligence with Databricks' native performance.


Migration Patterns for Complex Use Cases

Pattern 1: Hierarchical Dimensions

Timbr.ai approach: Ontology naturally represents hierarchies (Product → Category → Department)

Databricks approach: Define snowflake schema joins

yaml
metric_view:
  name: product_hierarchy_metrics
  source_table: sales.fact_orders

  joins:
    - name: products
      source: sales.dim_products
      on: fact_orders.product_key = products.product_key

    - name: categories
      source: sales.dim_categories
      on: products.category_key = categories.category_key

    - name: departments
      source: sales.dim_departments
      on: categories.department_key = departments.department_key

  dimensions:
    - name: product_name
      expr: products.product_name

    - name: category_name
      expr: categories.category_name

    - name: department_name
      expr: departments.department_name

  measures:
    - name: total_sales
      expr: SUM(order_amount)

Query at any hierarchy level:

sql
-- Department level
SELECT department_name, MEASURE(total_sales)
FROM sales.product_hierarchy_metrics
GROUP BY department_name;

-- Category level
SELECT category_name, MEASURE(total_sales)
FROM sales.product_hierarchy_metrics
GROUP BY category_name;

Pattern 2: Time-Based Analytics

Timbr.ai: Often handles date hierarchies through ontology relationships

Databricks: Define time dimensions explicitly

yaml
dimensions:
  - name: order_date
    expr: fact_orders.order_date

  - name: order_month
    expr: DATE_TRUNC('month', fact_orders.order_date)

  - name: order_quarter
    expr: DATE_TRUNC('quarter', fact_orders.order_date)

  - name: order_year
    expr: YEAR(fact_orders.order_date)

Period-over-period comparisons:

sql
WITH current_period AS (
    SELECT
        customer_region,
        MEASURE(total_revenue) AS current_revenue
    FROM sales.sales_metrics
    WHERE order_quarter = DATE_TRUNC('quarter', CURRENT_DATE())
    GROUP BY customer_region
),
prior_period AS (
    SELECT
        customer_region,
        MEASURE(total_revenue) AS prior_revenue
    FROM sales.sales_metrics
    WHERE order_quarter = DATE_TRUNC('quarter', ADD_MONTHS(CURRENT_DATE(), -3))
    GROUP BY customer_region
)
SELECT
    c.customer_region,
    c.current_revenue,
    p.prior_revenue,
    ((c.current_revenue - p.prior_revenue) / p.prior_revenue) * 100 AS growth_pct
FROM current_period c
JOIN prior_period p ON c.customer_region = p.customer_region;

Pattern 3: Complex Business Rules

Timbr.ai: Business logic embedded in ontology or measures

Databricks: Use filters and derived measures

yaml
metric_view:
  name: customer_segmentation_metrics
  source_table: sales.fact_orders

  # Base filter applied to all queries
  filter: "order_date >= CURRENT_DATE() - INTERVAL 365 DAYS"

  joins:
    - name: customers
      source: sales.dim_customers
      on: fact_orders.customer_key = customers.customer_key

  dimensions:
    - name: customer_segment
      expr: |
        CASE
          WHEN customers.tier = 'enterprise' AND customers.annual_spend > 100000
            THEN 'enterprise_high'
          WHEN customers.tier = 'enterprise'
            THEN 'enterprise_standard'
          WHEN customers.tier = 'pro'
            THEN 'professional'
          ELSE 'standard'
        END

  measures:
    - name: active_customer_count
      expr: COUNT(DISTINCT CASE WHEN order_date >= CURRENT_DATE() - 90 THEN customer_key END)

    - name: churn_rate
      expr: |
        (COUNT(DISTINCT CASE WHEN last_order_date < CURRENT_DATE() - 90 THEN customer_key END)
         / COUNT(DISTINCT customer_key)) * 100

Testing and Validation

1. Compare Query Results

Run parallel queries against Timbr.ai and Databricks:

sql
-- Timbr.ai
SELECT
    customer.region,
    SUM(order.amount) as revenue,
    COUNT(order.order_id) as order_count
FROM customer
JOIN order ON customer.customer_id = order.customer_id
WHERE order.order_date >= '2025-01-01'
GROUP BY customer.region;
sql
-- Databricks Metric View
SELECT
    customer_region,
    MEASURE(total_revenue) as revenue,
    MEASURE(order_count) as order_count
FROM sales.sales_metrics
WHERE order_date >= '2025-01-01'
GROUP BY customer_region;

Validate results match within acceptable tolerance (account for timestamp differences, rounding).

2. Performance Benchmarking

Compare query execution times:

python
import time

# Timbr query via JDBC
timbr_start = time.time()
timbr_result = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:timbr://...") \
    .option("query", "SELECT ...") \
    .load()
timbr_duration = time.time() - timbr_start

# Databricks native query
databricks_start = time.time()
databricks_result = spark.sql("""
    SELECT customer_region, MEASURE(total_revenue)
    FROM sales.sales_metrics
    GROUP BY customer_region
""")
databricks_duration = time.time() - databricks_start

print(f"Timbr: {timbr_duration:.2f}s, Databricks: {databricks_duration:.2f}s")

3. Semantic Quality Validation

When using Typedef for semantic enrichment, validate extraction accuracy:

python
# Sample validation set
sample_feedback = [
    "Product is terrible, completely broken",
    "Loving the new features, works great",
    "App crashes on startup, urgent fix needed"
]

# Extract insights
results = (
    session.create_dataframe({"feedback": sample_feedback})
    .with_column("insights", fc.semantic.extract(fc.col("feedback"), FeedbackInsight))
    .unnest("insights")
    .collect()
)

# Manual validation
for row in results:
    print(f"Feedback: {row['feedback']}")
    print(f"Detected sentiment: {row['sentiment']}")
    print(f"Urgency: {row['urgency']}")
    print("---")

Production Deployment Strategy

1. Phased Rollout

Phase A: Read-Only Validation (Weeks 1-2)

  • Metric views created in dev catalog
  • BI tools read from both Timbr.ai and Databricks
  • Compare results in dashboards side-by-side
  • No production traffic to Databricks yet

Phase B: Pilot Users (Weeks 3-4)

  • Select 5-10 early adopters
  • Switch their dashboards to Databricks metric views
  • Collect feedback on performance and accuracy
  • Timbr.ai remains primary for most users

Phase C: Gradual Migration (Weeks 5-8)

  • Migrate teams incrementally (10% → 25% → 50% → 100%)
  • Monitor query performance via Databricks Query History
  • Track adoption through Unity Catalog insights
  • Maintain Timbr.ai as fallback

Phase D: Complete Cutover (Week 9+)

  • All users on Databricks metric views
  • Decommission Timbr.ai infrastructure
  • Archive ontology definitions for reference

2. Access Control Migration

Map Timbr.ai permissions to Unity Catalog:

sql
-- Create groups matching Timbr.ai roles
CREATE GROUP IF NOT EXISTS sales_analysts;
CREATE GROUP IF NOT EXISTS finance_team;
CREATE GROUP IF NOT EXISTS executive_viewers;

-- Grant permissions on metric views
GRANT SELECT ON METRIC VIEW sales.sales_metrics TO sales_analysts;
GRANT SELECT ON METRIC VIEW sales.sales_metrics TO finance_team;
GRANT SELECT ON METRIC VIEW sales.sales_metrics TO executive_viewers;

-- Apply row-level security if needed
ALTER TABLE sales.fact_orders SET ROW FILTER
  region_filter ON (region = current_user_region());

3. Documentation and Training

Create migration artifacts:

Metric view catalog (resource):

  • List all metric views with descriptions
  • Map old Timbr.ai concept names to new metric view names
  • Document MEASURE() syntax examples
  • Provide query patterns for common use cases

Training materials:

  • Side-by-side query comparisons (Timbr.ai SQL vs Databricks)
  • Video walkthrough of Catalog Explorer
  • FAQs addressing common migration questions

Advanced: Maintaining Semantic Intelligence

Databricks Metric Views excel at aggregation but don't natively handle:

  • Semantic extraction from unstructured text
  • Entity resolution across inconsistent naming
  • Classification into business taxonomies
  • Relationship inference beyond explicit joins

Typedef's Fenic framework provides these capabilities through declarative semantic operators.

Semantic Enrichment Pipeline

python
import fenic as fc
from pydantic import BaseModel, Field
from typing import List, Literal

# Schema for extracting structured data from support tickets
class TicketExtraction(BaseModel):
    issue_category: Literal["bug", "feature_request", "question", "complaint"]
    affected_product: str = Field(description="Name of the product mentioned")
    severity: Literal["low", "medium", "high", "critical"]
    customer_sentiment: Literal["positive", "neutral", "negative"]
    action_items: List[str]

# Configure session
config = fc.SessionConfig(
    app_name="support_ticket_enrichment",
    semantic=fc.SemanticConfig(
        language_models={
            "extractor": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=100, tpm=50000),
            "classifier": fc.AnthropicLanguageModel(model_name="claude-3-5-haiku-latest", rpm=50, input_tpm=100000, output_tpm=50000)
        },
        default_language_model="extractor"
    )
)

session = fc.Session.get_or_create(config)

# Read raw tickets from Delta
tickets = session.read.parquet("support/raw_tickets.parquet")

# Semantic enrichment pipeline
enriched_tickets = (
    tickets
    # Extract structured information
    .with_column("extracted",
        fc.semantic.extract(fc.col("ticket_text"), TicketExtraction, model_alias="extractor")
    )
    .unnest("extracted")

    # Classify customer tier using semantic predicate
    .with_column("high_value_customer",
        fc.semantic.predicate(
            "Is this a high-value customer based on their history? {{ history }}",
            history=fc.col("customer_history"),
            model_alias="classifier"
        )
    )

    # Semantic join to match affected product to product catalog
    .semantic.join(
		    other=session.read.parquet("products/catalog.parquet"),
		    predicate="""
		        The mentioned product "{{ left_on }}" refers to catalog product "{{ right_on }}"
    """,
		    left_on=fc.col("affected_product"),
		    right_on=fc.col("product_name")
)

    # Generate embeddings for similarity search
    .with_column("ticket_embedding",
        fc.semantic.embed(fc.col("ticket_text"))
    )
)

# Write enriched data back to Delta
enriched_tickets.write.parquet("support/enriched_tickets.parquet")

Now create a metric view over enriched data:

sql
CREATE METRIC VIEW support.ticket_metrics
FROM 'dbfs:/metric_views/ticket_metrics.yaml';
yaml
metric_view:
  name: ticket_metrics
  source_table: support.enriched_tickets

  joins:
    - name: products
      source: products.catalog
      on: enriched_tickets.product_key = products.product_key

  dimensions:
    - name: issue_category
      expr: issue_category

    - name: severity
      expr: severity

    - name: customer_sentiment
      expr: customer_sentiment

    - name: product_name
      expr: products.product_name

  measures:
    - name: ticket_count
      expr: COUNT(*)

    - name: critical_issues
      expr: COUNT_IF(severity = 'critical')

    - name: negative_sentiment_pct
      expr: (COUNT_IF(customer_sentiment = 'negative') / COUNT(*)) * 100

    - name: avg_resolution_time
      expr: AVG(resolution_time_hours)

Query semantically enriched metrics:

sql
SELECT
    product_name,
    MEASURE(ticket_count),
    MEASURE(critical_issues),
    MEASURE(negative_sentiment_pct)
FROM support.ticket_metrics
WHERE severity IN ('high', 'critical')
GROUP BY product_name
ORDER BY MEASURE(critical_issues) DESC;

Common Migration Challenges and Solutions

Challenge 1: Timbr.ai Inference Rules

Problem: Timbr.ai ontologies support inference (e.g., "Enterprise customers are VIP customers"). Databricks has no native inference.

Solution: Materialize inferred relationships in Delta tables or use SQL CASE expressions:

sql
-- Materialized inference
CREATE OR REPLACE TABLE sales.customer_classifications AS
SELECT
    customer_key,
    tier,
    CASE
        WHEN tier = 'enterprise' THEN TRUE
        WHEN lifetime_value > 50000 THEN TRUE
        ELSE FALSE
    END AS is_vip
FROM sales.dim_customers;

-- Reference in metric view
metric_view:
  joins:
    - name: customers
      source: sales.customer_classifications
      on: fact_orders.customer_key = customers.customer_key

  dimensions:
    - name: is_vip
      expr: customers.is_vip

Challenge 2: Query Performance Regression

Problem: Initial Databricks queries slower than Timbr.ai due to unoptimized joins.

Solution: Optimize star schema and leverage Databricks features:

sql
-- Create optimized tables with Z-ordering
OPTIMIZE sales.fact_orders
ZORDER BY (order_date, customer_key, product_key);

-- Use liquid clustering for dimension tables
ALTER TABLE sales.dim_customers
CLUSTER BY (customer_key, region);

-- Enable Photon for SQL warehouses (automatic in newer Databricks runtimes)
-- Set compute settings: Photon Acceleration = ON

Challenge 3: Missing Semantic Context

Problem: Databricks metric views aggregate numbers but don't extract meaning from text fields.

Solution: Typedef enrichment pipeline (learn more):

python
# Weekly batch enrichment job
def enrich_feedback_batch():
    session = fc.Session.get_or_create(config)

    # Read new feedback since last run
    new_feedback = (
		    session.read.parquet("support/raw_feedback.parquet")
		    .filter(fc.col("processed") == False)
)

    # Semantic extraction
    enriched = (
        new_feedback
        .with_column("insights", fc.semantic.extract(fc.col("feedback_text"), FeedbackSchema))
        .unnest("insights")
        .with_column("embedding", fc.semantic.embed(fc.col("feedback_text")))
        .with_column("processed", fc.lit(True))
    )

    # Merge back to source
    enriched.write.parquet("support/raw_feedback.parquet", mode="overwrite")

# Schedule as Databricks job
# Runs daily at 2 AM UTC

Monitoring and Observability

Track Migration Success

Unity Catalog Insights:

sql
-- Query frequency by metric view
SELECT
    table_name,
    COUNT(*) as query_count,
    AVG(execution_time_ms) as avg_execution_ms
FROM system.access.audit
WHERE table_catalog = 'sales'
  AND table_schema = 'metrics'
  AND action_name = 'SELECT'
  AND event_date >= CURRENT_DATE() - 30
GROUP BY table_name
ORDER BY query_count DESC;

Typedef Pipeline Metrics:

python
# Metrics are always included in QueryResult - no parameter needed
result = enriched_tickets.collect()

print(f"Total LLM tokens: {result.metrics.lm_metrics.total_tokens}")
print(f"Estimated cost: ${result.metrics.lm_metrics.estimated_cost:.4f}")
print(f"Pipeline execution time: {result.metrics.execution_time}s")

Conclusion

Migrating from Timbr.ai to Databricks Unity Catalog Metric Views consolidates semantic layer infrastructure while gaining native lakehouse performance. The migration requires converting ontology concepts to metric view YAML, reimplementing relationships as explicit joins, and translating semantic SQL to MEASURE() syntax.

For organizations that valued Timbr.ai's semantic intelligence—extracting meaning from text, resolving entities, inferring relationships—Typedef's Fenic framework bridges the gap. Declarative semantic operators provide the extraction and classification capabilities Databricks lacks, while Unity Catalog Metric Views handle governed aggregation at scale.

The result: unified metrics definitions across SQL, Python, and AI tools, native Unity Catalog governance, and the semantic processing power needed for modern data workloads.

Migration Resources:

Get Started:

the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.