Skip to content
Module 10 of 1250 min readBeginner

Performance: indexes, EXPLAIN, and query smell

Why a query is slow, how to read an execution plan, and the three patterns that fix 80% of issues.

83%

Listen along

Read “Performance: indexes, EXPLAIN, and query smell” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

When a query is slow, it's almost always one of three things: missing index, accidental cartesian, or doing work in the wrong order. Knowing how to read an execution plan tells you which.

EXPLAIN

sql
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 12345;

EXPLAIN shows the planned execution. EXPLAIN ANALYZE actually runs it and shows real times. You're looking for two things: how the database is fetching rows (index scan? sequential scan?), and how many rows pass through each step.

Indexes

An index is a sorted lookup structure on one or more columns. Without one, the database scans every row to find matches (sequential scan, O(n)). With one, it can binary-search to the right rows (index scan, O(log n)).

sql
CREATE INDEX idx_orders_customer ON orders(customer_id);

Indexes speed up reads but slow down writes (the index has to be updated). Indexes also take disk space. Don't index everything — index columns you actually filter or join on.

Three patterns that fix 80% of slow queries

  1. Add an index on the column you're filtering / joining on
  2. Filter first, join second — push WHERE clauses as early as possible (the planner usually does this for you, but check)
  3. Avoid SELECT * on wide tables; pick the columns you need

Always read the row count

EXPLAIN shows estimated and actual row counts at each stage. If they diverge wildly (estimate 10, actual 10 million), your statistics are stale — run ANALYZE on the table.

Cardinality and selectivity

Cardinality = number of distinct values. Selectivity = how much an index narrows things down. Indexing a column with two values (boolean) is rarely worth it; indexing user_id (millions of distinct values) almost always is.

Exercise

A query that used to run in 200ms now takes 8 seconds against a transactions table that has grown from 1m to 50m rows. The query: SELECT customer_id, SUM(amount) FROM transactions WHERE transaction_date >= '2025-01-01' AND status = 'completed' GROUP BY customer_id ORDER BY SUM(amount) DESC LIMIT 100. Walk through how you would diagnose and fix it, in order. What is the first thing to check, then second, then third? When have you done enough?

Loading progress…
LeadAfrikPublic Economics Hub