Sylvain's Blog

Most AI analytics failures are not caused by weak models. They are caused by weak semantics.

Large language models are surprisingly capable at writing SQL, but they struggle when business definitions are ambiguous, joins are inconsistent, or metrics are defined differently across teams. Without structure, even advanced AI agents become unreliable.

This is why the semantic layer matters.

A semantic layer acts as the translation system between raw warehouse tables and business meaning. It gives AI agents a shared understanding of entities, metrics, dimensions, relationships, and analytical rules.

In this post, we'll go deep into the best practices for designing semantic layers specifically for AI analytics agents.

Why AI Agents Need Semantic Layers

Traditional BI tools already benefit from semantic modeling, but AI agents amplify the need dramatically.

Humans can often compensate for ambiguous schemas because they carry implicit business knowledge. AI agents do not.

Consider a warehouse table with these columns:

orders(
  id,
  amount,
  status,
  type,
  created_at
)

To a human analyst, the meaning may be obvious. To an AI agent, almost everything is ambiguous:

Does amount include refunds?
Which statuses count as completed revenue?
What does type represent?
Are timestamps UTC?
Should canceled orders be excluded?

Without semantic structure, the model fills gaps probabilistically. That leads to inconsistent analytics and hallucinated assumptions.

A semantic layer eliminates this ambiguity by explicitly encoding business meaning.

Core Design Principles

The best semantic layers are designed around business concepts, not physical tables.

Here are the principles that matter most:

1. Model Business Entities, Not Raw Schemas

Your semantic layer should expose concepts like:

Customers
Orders
Subscriptions
Sessions
Invoices

Avoid exposing fragmented warehouse implementation details directly to AI agents.

2. Centralize KPI Definitions

Every important metric should exist in exactly one canonical definition.

Bad:

SUM(amount)

Better:

metrics:
  net_revenue:
    sql: SUM(order_amount - refund_amount)
    description: Revenue excluding refunds

If teams define metrics differently across dashboards, AI agents will amplify the inconsistency.

3. Prefer Explicitness Over Cleverness

Humans tolerate abstraction. AI systems prefer clarity.

Avoid:

Overloaded column meanings
Implicit join behavior
Hidden filtering logic
Undocumented transformations

The more explicit the semantic model, the safer the agent becomes.

How to Model Metrics and Dimensions

Metrics and dimensions form the core language of analytics agents.

Metrics Should Encode Business Logic

A semantic metric should contain:

Calculation logic
Filters
Aggregation behavior
Business description
Expected grain

Example:

metrics:
  active_users_30d:
    sql: COUNT(DISTINCT user_id)
    filters:
      last_activity_days: "<= 30"
    grain: user
    description: Users active in the last 30 days

Dimensions Should Be Stable and Predictable

Dimensions are how AI agents slice and group data.

Strong dimensions are:

Consistently named
Business-readable
Low ambiguity
Properly typed

Weak dimensions create unstable analyses.

For example, avoid dimensions like:

dim_1
category_type
status_code

Prefer:

customer_segment
subscription_plan
payment_status
acquisition_channel

Time Dimensions Need Special Attention

Time handling is one of the most common sources of AI analytics errors.

Your semantic layer should define:

Timezone assumptions
Business calendar logic
Fiscal periods
Week start conventions
Rolling window definitions

Otherwise, the same query may produce different results depending on prompt phrasing.

Best Practices for Joins and Relationships

Joins are where many AI-generated SQL queries fail.

AI models often generate:

Incorrect join paths
Duplicate-producing joins
Cartesian explosions
Grain mismatches

Semantic layers should define relationships explicitly.

Example:

joins:
  customers:
    type: many_to_one
    on: orders.customer_id = customers.id

  refunds:
    type: one_to_many
    on: orders.id = refunds.order_id

Relationship metadata is extremely valuable for AI agents because it helps them reason about aggregation safety.

Encode Cardinality

Cardinality is critical.

The agent should know:

One-to-one
One-to-many
Many-to-one
Many-to-many

Without this, aggregation bugs become inevitable.

Governance and Consistency

A semantic layer is not just a technical artifact. It is organizational governance.

AI agents magnify inconsistencies across teams. If finance and product disagree on "active user," the model cannot resolve the conflict safely.

Strong governance includes:

Canonical metric ownership
Versioned definitions
Schema review processes
Documentation standards
Data quality validation

Semantic layers should evolve deliberately, not organically.

Designing Semantic Layers for AI Agents

AI-native semantic layers require additional structure beyond traditional BI tooling.

Include Rich Descriptions

LLMs benefit enormously from descriptive metadata.

Weak:

metric: revenue

Better:

metric: revenue
description: |
  Gross completed transaction revenue excluding
  refunds, chargebacks, and internal test purchases.

Rich descriptions improve reasoning quality dramatically.

Add Query Constraints

Semantic layers should encode safety guidance:

constraints:
  max_scan_days: 365
  require_partition_filter: true
  avoid_select_star: true

This helps prevent expensive or unsafe AI-generated queries.

Define Analytical Intent

AI agents perform better when semantic objects include usage guidance.

Example:

metric:
  churn_rate:
    intended_uses:
      - retention_analysis
      - cohort_reporting
      - executive_dashboarding

This gives the model higher-level analytical context beyond SQL syntax.

Common Semantic Layer Mistakes

Treating the Semantic Layer Like Documentation

A semantic layer is executable business logic, not a wiki.

If definitions are not enforced programmatically, drift will happen.

Exposing Raw Warehouse Complexity

AI agents should not need to understand every staging table or ETL artifact.

Expose curated analytical entities instead.

Ignoring Grain Definitions

Every metric and entity should define its expected grain clearly.

Many AI-generated aggregation bugs are actually grain mismatches.

Underinvesting in Descriptions

Sparse metadata creates weaker reasoning.

AI agents rely heavily on descriptive context to infer analytical intent correctly.

Final Thoughts

The semantic layer is becoming the control plane for AI analytics.

As AI agents become more capable, the limiting factor shifts away from SQL generation and toward semantic reliability.

Organizations that invest in strong semantic modeling will build safer, more trustworthy analytics systems. Those that rely on raw schemas and prompt engineering alone will struggle with inconsistency, hallucinations, and governance problems.

In the AI era, semantics are infrastructure.

Designing a Semantic Layer for AI Analytics Agents

Why AI Agents Need Semantic Layers

Core Design Principles

1. Model Business Entities, Not Raw Schemas

2. Centralize KPI Definitions

3. Prefer Explicitness Over Cleverness

How to Model Metrics and Dimensions

Metrics Should Encode Business Logic

Dimensions Should Be Stable and Predictable

Time Dimensions Need Special Attention

Best Practices for Joins and Relationships

Encode Cardinality

Governance and Consistency

Designing Semantic Layers for AI Agents

Include Rich Descriptions

Add Query Constraints

Define Analytical Intent

Common Semantic Layer Mistakes

Treating the Semantic Layer Like Documentation

Exposing Raw Warehouse Complexity

Ignoring Grain Definitions

Underinvesting in Descriptions

Final Thoughts