Window functions are SQL's superpower. They let you compute a per-row value that depends on other rows in a 'window' — without collapsing rows the way GROUP BY does. The Foundations course introduced SUM() OVER (). This module gets you fluent in the rest of the toolkit.
ROW_NUMBER vs RANK vs DENSE_RANK
All three assign a per-row position within an ordered partition. The difference is in how they handle ties.
SELECT name, score,ROW_NUMBER() OVER (ORDER BY score DESC) AS rn,RANK() OVER (ORDER BY score DESC) AS rk,DENSE_RANK() OVER (ORDER BY score DESC) AS drFROM students;-- name score rn rk dr-- Aisha 95 1 1 1-- Bilal 95 2 1 1-- Chen 90 3 3 2-- Dera 85 4 4 3
- ROW_NUMBER — unique 1, 2, 3, … even when values tie (assigned arbitrarily unless you add a tiebreaker to ORDER BY)
- RANK — same rank for ties; the next row jumps the count (1, 1, 3 instead of 1, 1, 2)
- DENSE_RANK — same rank for ties; the next row is consecutive (1, 1, 2)
Which one when?
Use ROW_NUMBER for de-duplication ('keep the most recent row per customer'). Use RANK when leaderboards must skip positions after ties. Use DENSE_RANK when computing a small number of distinct tiers (top-1, top-2, top-3 with ties counted as one tier).
LAG, LEAD, FIRST_VALUE, LAST_VALUE
These let a row peek at neighbouring rows. The killer use case is period-over-period change without a self-join.
SELECT month, revenue,LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,(revenue * 1.0 / LAG(revenue, 1) OVER (ORDER BY month) - 1) AS mom_pctFROM monthly_revenue;
LAG(col, n) returns the value n rows before the current row in the window. LEAD(col, n) returns n rows after. FIRST_VALUE returns the first row's value in the window — useful for 'difference from a baseline'. LAST_VALUE has a frame gotcha covered below.
Frames: ROWS BETWEEN vs RANGE BETWEEN
Every window function operates over a 'frame' — a subset of rows within the partition. Default frame for most engines is 'all rows from the start of the partition up to and including the current row.' For LAST_VALUE that surprises people because it gives you the current row, not the last.
-- 3-row trailing moving averageSELECT date, value,AVG(value) OVER (ORDER BY dateROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3FROM daily_metrics;-- LAST_VALUE that actually gives the last row in the partitionSELECT date, value,LAST_VALUE(value) OVER (ORDER BY dateROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS partition_lastFROM daily_metrics;
- ROWS frame — counts physical rows. BETWEEN 2 PRECEDING AND CURRENT ROW = a 3-row sliding window.
- RANGE frame — value-based; rows with the same ORDER BY value are treated as a single position. BETWEEN '7 days' PRECEDING AND CURRENT ROW gives a calendar-window, not a row-count window.
- Default frame is RANGE UNBOUNDED PRECEDING — the reason LAST_VALUE 'doesn't work as expected' without an explicit frame.
Always specify the frame for LAST_VALUE and any moving aggregate
Half the bugs I've seen with window functions come from relying on the default frame. If you write a moving average or a 'last value in the window' query, spell out the frame explicitly. Future-you will thank you.
Exercise
You have a table sales(customer_id, sale_date, amount). Write a query that returns, for each sale: the customer's running total to date, the running average, the rank of this sale by amount among that customer's sales, and the days since their previous sale. Result should keep one row per sale.