Every SQL engine is doing the same three things behind your SELECT: parsing the text into a tree, planning the cheapest way to execute it, and running the plan. Advanced SQL starts when you stop treating that as a black box.
The parse → plan → execute lifecycle
- Parse — the SQL text is tokenised and built into a syntax tree. Errors at this stage are syntax errors.
- Rewrite — view definitions are expanded, simple algebraic simplifications are applied.
- Plan — the optimiser enumerates candidate plans (different join orders, scan types, aggregation strategies) and assigns each a cost estimate based on table statistics. The cheapest plan wins.
- Execute — the chosen plan runs. The engine streams rows through the operators (scan, join, sort, aggregate) and produces the result.
Cost-based optimisation
Modern engines (Postgres, MySQL, Snowflake, BigQuery, Trino) are cost-based — they compare candidate plans using estimated cost. Cost is a unit-less number combining I/O cost, CPU cost, and memory cost. The plan with the lowest cost is chosen.
Cost estimates depend on table statistics: row counts, column histograms, NDV (number of distinct values), null fractions, correlation between physical row order and column values. The planner reads these from pg_statistic (Postgres), information_schema.tables (MySQL), or equivalent catalogue tables.
Stale statistics = bad plans
If statistics say a table has 1,000 rows but it actually has 10M, the planner will think a sequential scan is cheap and skip the index. Symptom: a query that 'suddenly got slow' after a big data load. Fix: ANALYZE the table (Postgres) or ANALYZE TABLE (MySQL). Auto-vacuum/auto-analyze handles this in most cases, but heavy bulk loads can outrun it.
What the optimiser actually decides
- Scan type per table: sequential scan, index scan, index-only scan, bitmap heap scan
- Join order: A JOIN B JOIN C vs A JOIN (B JOIN C) etc. Theoretical N! combinations; the planner uses heuristics + dynamic programming
- Join algorithm per pair: nested loop, hash join, merge join
- Aggregation strategy: hash aggregate vs sort-then-aggregate (GroupAggregate)
- Whether to use parallel workers (most modern engines)
- Whether intermediate results spill to disk (sort, hash, aggregate spills)
The 'plan you got vs the plan you wanted' problem
When a query is slow, the question is rarely 'should I add more memory?' — it's 'why did the planner make a bad choice?'. The five usual suspects:
- Stale statistics (most common) — ANALYZE
- Skewed data — one value covers 90% of rows; histogram-based planning gets it wrong. Fix: extended statistics, query hints in some engines, or rewrite
- Missing index — sequential scan when you wanted index lookup
- Suboptimal join order — usually surfaces with a 10+ join query; planner heuristics give up. Fix: rewrite to influence order, or set join_collapse_limit (Postgres)
- Parameter sniffing (SQL Server) / generic plans (Postgres) — a prepared statement gets a plan optimised for one parameter that's wrong for another. Fix: force a re-plan or use literal values
Exercise
A query that was fast yesterday is now timing out today. Nothing about the query changed. Walk through the diagnostic sequence — what do you check first, second, third? Why in that order?