GROUP BY and aggregate functions — SQL Module 5 | LeadAfrik Public Economics Hub

Aggregation is how you turn rows into summaries: counts, sums, averages, min/max. The verb is GROUP BY, and once it clicks, half your daily reporting work writes itself.

Aggregate functions without GROUP BY

sql

SELECT COUNT(*), SUM(total_amount), AVG(total_amount)
FROM orders;

Without GROUP BY, an aggregate collapses the entire table to one row. Three numbers: total count of orders, total revenue, average order value.

GROUP BY

sql

SELECT country, COUNT(*), SUM(total_amount)
FROM orders o
JOIN customers c ON c.id = o.customer_id
GROUP BY country
ORDER BY 3 DESC;

Now we get one row per country, with the count and revenue for that country. ORDER BY 3 means 'sort by the third column' — a shortcut. The result is a country-level rollup, sorted by revenue descending.

The GROUP BY rule

Every column in your SELECT must either be (a) inside an aggregate function or (b) listed in the GROUP BY. Otherwise the database doesn't know which value to show — there are many rows per group.

Common aggregates

COUNT(*) — every row, including NULLs
COUNT(column) — non-NULL values only
COUNT(DISTINCT column) — unique values
SUM(column) — total
AVG(column) — mean (NULLs ignored)
MIN(column), MAX(column) — extremes

HAVING — filtering after aggregation

sql

SELECT country, COUNT(*) AS order_count
FROM orders o
JOIN customers c ON c.id = o.customer_id
GROUP BY country
HAVING COUNT(*) > 100
ORDER BY order_count DESC;

WHERE filters rows before grouping. HAVING filters groups after. If you try to use COUNT(*) in WHERE, the database rejects it — at the WHERE stage, the count doesn't exist yet.

Exercise

Find the top 5 customers by lifetime revenue, including their email and total spent.