Most AI coding agents are optimized for generating code. But analytics work is different. A good data analyst agent needs context about your warehouse, semantic understanding of tables, reproducible workflows, and the ability to turn exploratory conversations into reusable analysis pipelines.
In this post, we'll walk through how to build an AI Data Analyst Agent in Claude Code using a structured claude.md system prompt, a semantic.yaml schema layer, and reusable skills that let you replay analyses after discovering insights through conversation.
The goal is not just "chatting with your database." It's creating an agent that behaves like a disciplined analytics engineer.
The Architecture of an Analytics Agent
A strong AI data analyst agent needs more than SQL generation. It needs memory, structure, and constraints.
At a high level, the stack looks like this:
- Claude Code: The execution environment and agent runtime.
- claude.md: Persistent operating instructions defining behavior, rules, and workflows.
- semantic.yaml: A semantic abstraction layer describing tables, metrics, joins, and business meaning.
- Skills: Reusable analysis procedures that standardize common workflows.
- Data warehouse access: Snowflake, BigQuery, Postgres, DuckDB, or similar systems.
The key insight is this: analytics agents fail less when they operate from structured semantic context instead of raw schemas.
Designing the claude.md File
The claude.md file acts as the operating manual for the agent. Think of it as a persistent system prompt that defines how the analyst should behave.
A good analytics-focused claude.md usually contains:
- Business context and KPI definitions
- SQL safety rules
- Preferred query patterns
- Data quality expectations
- Visualization conventions
- Instructions for reproducibility
Example:
# Analytics Agent Instructions
You are a senior data analyst.
Always:
- Prefer semantic layer definitions over raw table names
- Explain assumptions before querying
- Validate joins before aggregation
- Avoid SELECT *
- Limit exploratory queries to 100 rows first
- Save reusable workflows as skills
Business Definitions:
- "Active User" = user with a completed session in the last 30 days
- Revenue excludes refunds and test transactions
- Use UTC timestamps unless specified otherwise
This dramatically improves consistency. Without these constraints, agents tend to generate fragile or misleading SQL.
Using semantic.yaml as a Semantic Layer
One of the biggest problems with AI-generated analytics is schema ambiguity. Column names rarely explain business meaning clearly enough.
This is where semantic.yaml becomes critical.
Instead of exposing raw tables directly, define semantic entities, metrics, dimensions, and relationships.
Example:
tables:
orders:
description: Customer purchase transactions
metrics:
total_revenue:
sql: SUM(order_amount)
description: Gross revenue before refunds
completed_orders:
sql: COUNT(order_id)
filters:
status: completed
dimensions:
- customer_id
- country
- created_at
joins:
customers:
type: many_to_one
on: orders.customer_id = customers.id
This gives the agent semantic understanding instead of forcing it to infer meaning from raw SQL schemas.
Benefits include:
- Safer SQL generation
- Consistent KPI definitions
- Fewer hallucinated joins
- Better business alignment
- Improved explainability
In practice, the semantic layer becomes the difference between "AI autocomplete" and a trustworthy analytics system.
Creating Reusable Analysis Skills
One of the most underrated features in agentic workflows is reusable skills.
During exploratory analysis, you often discover a useful workflow through conversation:
- Investigating churn spikes
- Analyzing conversion funnels
- Segmenting high-value customers
- Debugging revenue anomalies
The problem is that conversational insights are ephemeral. Once the chat is over, reproducing the exact reasoning path can be difficult.
Skills solve this by converting successful workflows into reusable procedures.
Example skill:
name: analyze_conversion_drop
description: |
Investigate funnel conversion declines by segment,
traffic source, device type, and release window.
steps:
- compare weekly conversion trends
- identify statistically significant drops
- segment by acquisition channel
- correlate with deployment events
- generate summary findings
Instead of rediscovering the workflow manually, the agent can replay the same analytical methodology consistently.
Turning Conversations into Repeatable Analysis
This is where AI analyst agents become genuinely powerful.
Most analytics today are trapped inside Slack threads, notebooks, or one-off conversations. Valuable investigative logic disappears after the meeting ends.
A mature analytics agent should:
- Capture useful workflows discovered during conversation
- Convert them into reusable skills
- Parameterize the inputs
- Replay analyses on future datasets
- Maintain methodological consistency
For example, imagine discovering an effective fraud detection workflow while chatting with the agent. Instead of losing that reasoning process, the agent can save it as:
skill: detect_payment_fraud
inputs:
- start_date
- end_date
- region
workflow:
- identify anomalous transaction velocity
- compare against historical baselines
- cluster suspicious accounts
- score fraud likelihood
Over time, your analytics organization accumulates a library of reusable analytical intelligence instead of isolated dashboard queries.
Best Practices for Reliable AI Analytics
Building a useful analytics agent is less about model intelligence and more about operational discipline.
Here are the practices that matter most:
- Always use a semantic layer: Raw schemas are not enough for reliable business analytics.
- Encode business definitions centrally: Never let metrics drift between prompts.
- Treat successful analyses as assets: Save reusable workflows as skills.
- Require explainability: The agent should explain assumptions before querying.
- Constrain SQL generation: Safe defaults reduce hallucinations and expensive queries.
- Start exploratory queries small: Limit rows before running warehouse-scale scans.
- Separate reasoning from execution: Semantic planning should happen before query generation.
The future of analytics agents is not simply "natural language to SQL." The real opportunity is building systems that preserve institutional analytical reasoning and make it reusable.
When done correctly, AI analyst agents become less like chatbots and more like collaborative analytics infrastructure.