Skip to content
Module 03 of 845 min readIntermediate

CTEs and query composition

Multi-CTE pipelines, materialisation behaviour, the dbt-style readability pattern.

38%

Listen along

Read “CTEs and query composition” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Decompose a complex query into named CTEs that read top-to-bottom
  • 02Recognise when a CTE is just sugar vs when it materialises (and the perf implication)
  • 03Compose multiple CTEs into a readable analytical pipeline

A common table expression (CTE) is a named, query-scoped subquery you declare with WITH. CTEs are the analyst's most important readability tool — they let you decompose a 100-line query into named steps that read like prose.

From nested subquery to readable pipeline

The same query, two ways. First as a nested subquery — works, but you have to read it inside-out:

sql
SELECT country, AVG(monthly_spend) AS avg_spend
FROM (
SELECT customer_id, country,
SUM(amount) AS monthly_spend
FROM (
SELECT customer_id, country, DATE_TRUNC('month', order_date) AS m, amount
FROM orders JOIN customers USING (customer_id)
) raw
GROUP BY customer_id, country
) per_cust
GROUP BY country;

Now the same logic as a CTE pipeline — reads top-to-bottom like steps in a recipe:

sql
WITH enriched_orders AS (
SELECT customer_id, country, DATE_TRUNC('month', order_date) AS m, amount
FROM orders
JOIN customers USING (customer_id)
),
customer_monthly AS (
SELECT customer_id, country, SUM(amount) AS monthly_spend
FROM enriched_orders
GROUP BY customer_id, country
)
SELECT country, AVG(monthly_spend) AS avg_spend
FROM customer_monthly
GROUP BY country;

Naming is everything

A well-named CTE is documentation. 'enriched_orders' tells the reader (and reviewer, and future-you) what this step does. Anonymous subqueries don't get to do that.

Materialisation: CTE vs subquery, perf considerations

Historically (and still in older Postgres), CTEs were 'optimisation fences' — the engine materialised the CTE result before the outer query, even when inlining would have been faster. Postgres 12+, MySQL 8+, and most modern engines inline CTEs by default when there are no side effects.

  • Postgres 12+ — CTEs are inlined unless WITH … AS MATERIALIZED is specified, or the CTE is referenced more than once, or there's a recursive or non-deterministic component
  • MySQL 8+ — inlines aggressively
  • Snowflake / BigQuery / DuckDB — CTEs are pure sugar, inlined transparently
  • Older Postgres (≤11) — CTEs always materialised; that meant 'WITH x AS (slow_query) SELECT * FROM x' could be 10× slower than the equivalent subquery

When materialisation helps

If a CTE is referenced multiple times in the outer query and is expensive to compute, an explicit WITH … AS MATERIALIZED can be a perf win — the engine evaluates once and reuses. The default inlined behaviour would compute it twice.

Multi-CTE analytical pipelines

The dbt-style 'one CTE per logical step' pattern is the working analyst's house style. Each CTE has one job; the final SELECT is usually a one-liner that pulls from the last CTE.

sql
WITH active_users AS (
SELECT user_id
FROM users
WHERE deleted_at IS NULL AND last_seen_at > NOW() - INTERVAL '30 days'
),
revenue_30d AS (
SELECT user_id, SUM(amount) AS spend_30d
FROM orders
WHERE order_date > NOW() - INTERVAL '30 days'
GROUP BY user_id
),
feature_use AS (
SELECT user_id, COUNT(DISTINCT feature) AS features_used
FROM events
WHERE event_type = 'feature_used' AND event_date > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT a.user_id,
COALESCE(r.spend_30d, 0) AS spend_30d,
COALESCE(f.features_used, 0) AS features_used
FROM active_users a
LEFT JOIN revenue_30d r USING (user_id)
LEFT JOIN feature_use f USING (user_id);

Exercise

You have three tables: customers(id, signup_date, country), orders(customer_id, order_date, amount), churn_events(customer_id, churned_at). Write a multi-CTE query that returns: country, total customers, lifetime spend per customer (avg), and 90-day churn rate (% of customers signed up >90 days ago who have a churn_event). Use one CTE per logical step.

Key takeaways

  • CTEs turn nested logic into a readable top-to-bottom pipeline
  • Postgres 12+ inlines CTEs by default — use AS MATERIALIZED when you want the old materialisation behaviour
  • One CTE per logical step; name them descriptively; the final SELECT is usually a one-liner

Further reading

  1. 01

    dbt SQL style guide

    dbt LabsThe most influential modern SQL style guide. Single source for the multi-CTE pipeline pattern.

Loading progress…
LeadAfrikPublic Economics Hub