A cohort is a group of entities that share an entry event (signup, first purchase, loan disbursal). Cohort analysis tracks how those entities behave through time relative to that entry — the most common form is retention.
The cohort table
Step 1: assign each entity to a cohort. Usually the cohort is the month or week of their first event.
WITH cohorts AS (SELECT user_id,DATE_TRUNC('month', MIN(activity_date)) AS cohort_monthFROM activityGROUP BY user_id)SELECT cohort_month, COUNT(*) AS cohort_sizeFROM cohortsGROUP BY cohort_monthORDER BY cohort_month;
The retention curve
Step 2: for each cohort and each elapsed period, count how many users are still active.
WITH cohorts AS (SELECT user_id,DATE_TRUNC('month', MIN(activity_date)) AS cohort_monthFROM activityGROUP BY user_id),activity_with_cohort AS (SELECT a.user_id,c.cohort_month,DATE_TRUNC('month', a.activity_date) AS active_monthFROM activity aJOIN cohorts c USING (user_id))SELECT cohort_month,active_month,COUNT(DISTINCT user_id) AS active_usersFROM activity_with_cohortGROUP BY cohort_month, active_monthORDER BY cohort_month, active_month;
Step 3: pivot to wide form so the cohort triangle becomes readable. CASE-based pivots work in every dialect; some engines (Postgres with tablefunc, Snowflake, DuckDB) have a PIVOT clause.
WITH base AS (/* the GROUP BY query from Step 2, plus a 'months_since_join' calc */SELECT cohort_month,active_month,EXTRACT(YEAR FROM AGE(active_month, cohort_month)) * 12 +EXTRACT(MONTH FROM AGE(active_month, cohort_month)) AS months_since,COUNT(DISTINCT user_id) AS active_usersFROM activity_with_cohortGROUP BY cohort_month, active_month)SELECT cohort_month,MAX(CASE WHEN months_since = 0 THEN active_users END) AS m0,MAX(CASE WHEN months_since = 1 THEN active_users END) AS m1,MAX(CASE WHEN months_since = 2 THEN active_users END) AS m2,MAX(CASE WHEN months_since = 3 THEN active_users END) AS m3FROM baseGROUP BY cohort_monthORDER BY cohort_month;
Retention rate = activeₙ / active₀
The cohort-size column (m0) is the denominator. Divide every other column by m0 to convert raw counts to retention rates. Some teams display both — counts in the chart hover, rates in the headline number.
When the cohort is the wrong unit
Cohort-by-signup-date is the default, but not always the right one. Consider:
- Cohort-by-channel — when you're asking 'is the new acquisition channel retaining as well as organic?'
- Cohort-by-feature-adoption — when you're asking 'do users who hit feature X in their first week retain better?' (Activation cohorts.)
- Cohort-by-experiment-arm — A/B test cohorts. The cohort grouping is the experiment variant.
- No cohort at all — if you're asking 'what's our retention this month?', a rolling-window query is the right shape, not a cohort triangle.
Exercise
Your CFO wants to know whether the loans your microfinance arm disbursed in Q1 2025 are being repaid on schedule, broken down by loan officer. (1) What's the cohort? (2) What's the retention-equivalent metric? (3) Write the query.