Skip to content
Module 07 of 860 min readAdvanced

Materialised views and partitioning

MV refresh strategies (manual/scheduled/concurrent/incremental), range/list/hash partitions, pruning.

88%

Listen along

Read “Materialised views and partitioning” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Recognise when a materialised view pays off (read-heavy, expensive query, tolerant of staleness)
  • 02Pick a refresh strategy: manual, scheduled, concurrent, incremental
  • 03Partition large tables by range/list/hash and benefit from partition pruning

Materialised views and partitioning are the two main tools for managing tables that have outgrown their query budget. MVs trade staleness for read speed. Partitioning trades schema complexity for I/O scalability and lifecycle ease.

Materialised views: when they pay off

A materialised view (MV) is a query whose result is stored as data on disk. Subsequent reads serve from the stored data; the underlying query doesn't re-execute. When the upstream tables change, the MV is stale until REFRESH'd.

  • Read pattern is hot (queried many times) and write pattern on underlying tables is moderate (refresh cost is amortised)
  • Underlying query is expensive (multi-join with aggregation, window functions, full table scans)
  • Consumers tolerate the staleness window — typically minutes to hours
  • You DON'T have a unique-key dependency that makes the MV inappropriate as a source for further joins (most engines allow indexes on MVs, which fixes this)

Refresh strategies

  • Manual REFRESH — you call REFRESH MATERIALIZED VIEW name; whenever needed. Right for batch / ETL contexts
  • Scheduled — cron-style refresh every N minutes. Right for dashboards where 'a few minutes stale' is fine
  • REFRESH CONCURRENTLY (Postgres) — refresh without blocking reads. Requires a unique index on the MV. Slower but doesn't lock readers out
  • Incremental refresh — only re-compute the rows that changed. Postgres lacks this natively; Materialize, ClickHouse, Snowflake (dynamic tables) all have varying levels of support
sql
-- Build a materialised daily revenue summary
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT DATE(order_date) AS day,
country,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM orders
JOIN customers USING (customer_id)
GROUP BY DATE(order_date), country;
-- Unique index enables REFRESH CONCURRENTLY
CREATE UNIQUE INDEX idx_daily_revenue_pk ON daily_revenue (day, country);
-- Non-blocking refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;

Partitioning: range / list / hash

Partitioning splits one logical table into multiple physical tables based on a partition key. Queries that filter on the key get 'partition pruning' — the engine touches only the relevant partitions.

  • Range partitioning — by a date or numeric range. Most common pattern: events_2025_01, events_2025_02, etc.
  • List partitioning — by a discrete value. e.g., orders_kenya, orders_uganda, orders_tanzania
  • Hash partitioning — by hash of a column. Right for even distribution across N partitions when there's no natural range or list
sql
-- Postgres range partitioning by date
CREATE TABLE events (
id BIGSERIAL,
event_date DATE NOT NULL,
user_id BIGINT,
payload JSONB
) PARTITION BY RANGE (event_date);
CREATE TABLE events_2025_01 PARTITION OF events FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- … one partition per month
-- Query with partition pruning — only touches Feb partition
SELECT COUNT(*) FROM events WHERE event_date BETWEEN '2025-02-05' AND '2025-02-12';

Why partition at all

  1. Query speed — pruning means scanning O(weeks) rows instead of O(years)
  2. Maintenance speed — DROP TABLE events_2024_01 is O(1) and instant; DELETE FROM events WHERE event_date < … is O(N) and locks rows
  3. Index size — indexes are per-partition. Smaller indexes fit better in memory; each per-partition index is independently tuneable
  4. Backup/restore granularity — partitions can be archived to colder storage, restored independently

Partitioning isn't free

The schema complexity is real. Cross-partition queries are slower (no pruning), unique constraints across all partitions require the partition key in the constraint, and the operational overhead of partition creation/drop usually needs automation (pg_partman, Postgres native auto-partitioning, or a cron job). Don't partition tables under ~100M rows; the maintenance cost outweighs the gain.

Exercise

Your events table is 400M rows and growing 10M/month. Queries are 90% 'last 7 days' filters, 10% historical analytics. Reads peak at 200 QPS during business hours. (1) Decide between (a) just adding more indexes, (b) building a materialised view of the last 7 days, (c) partitioning by event_date. (2) Justify. (3) What's the operational discipline that has to come with your choice?

Key takeaways

  • MVs trade staleness for read speed; pick a refresh strategy that fits the staleness budget
  • REFRESH CONCURRENTLY (Postgres) requires a unique index — non-blocking refresh
  • Partition by range for time-series tables; by list for discrete categories; by hash for even distribution
  • Don't partition under ~100M rows — schema/operations complexity outweighs the benefit
Loading progress…
LeadAfrikPublic Economics Hub