Materialised views and partitioning are the two main tools for managing tables that have outgrown their query budget. MVs trade staleness for read speed. Partitioning trades schema complexity for I/O scalability and lifecycle ease.
Materialised views: when they pay off
A materialised view (MV) is a query whose result is stored as data on disk. Subsequent reads serve from the stored data; the underlying query doesn't re-execute. When the upstream tables change, the MV is stale until REFRESH'd.
- Read pattern is hot (queried many times) and write pattern on underlying tables is moderate (refresh cost is amortised)
- Underlying query is expensive (multi-join with aggregation, window functions, full table scans)
- Consumers tolerate the staleness window — typically minutes to hours
- You DON'T have a unique-key dependency that makes the MV inappropriate as a source for further joins (most engines allow indexes on MVs, which fixes this)
Refresh strategies
- Manual REFRESH — you call REFRESH MATERIALIZED VIEW name; whenever needed. Right for batch / ETL contexts
- Scheduled — cron-style refresh every N minutes. Right for dashboards where 'a few minutes stale' is fine
- REFRESH CONCURRENTLY (Postgres) — refresh without blocking reads. Requires a unique index on the MV. Slower but doesn't lock readers out
- Incremental refresh — only re-compute the rows that changed. Postgres lacks this natively; Materialize, ClickHouse, Snowflake (dynamic tables) all have varying levels of support
-- Build a materialised daily revenue summaryCREATE MATERIALIZED VIEW daily_revenue ASSELECT DATE(order_date) AS day,country,COUNT(*) AS orders,SUM(amount) AS revenueFROM ordersJOIN customers USING (customer_id)GROUP BY DATE(order_date), country;-- Unique index enables REFRESH CONCURRENTLYCREATE UNIQUE INDEX idx_daily_revenue_pk ON daily_revenue (day, country);-- Non-blocking refreshREFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;
Partitioning: range / list / hash
Partitioning splits one logical table into multiple physical tables based on a partition key. Queries that filter on the key get 'partition pruning' — the engine touches only the relevant partitions.
- Range partitioning — by a date or numeric range. Most common pattern: events_2025_01, events_2025_02, etc.
- List partitioning — by a discrete value. e.g., orders_kenya, orders_uganda, orders_tanzania
- Hash partitioning — by hash of a column. Right for even distribution across N partitions when there's no natural range or list
-- Postgres range partitioning by dateCREATE TABLE events (id BIGSERIAL,event_date DATE NOT NULL,user_id BIGINT,payload JSONB) PARTITION BY RANGE (event_date);CREATE TABLE events_2025_01 PARTITION OF events FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');CREATE TABLE events_2025_02 PARTITION OF events FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');-- … one partition per month-- Query with partition pruning — only touches Feb partitionSELECT COUNT(*) FROM events WHERE event_date BETWEEN '2025-02-05' AND '2025-02-12';
Why partition at all
- Query speed — pruning means scanning O(weeks) rows instead of O(years)
- Maintenance speed — DROP TABLE events_2024_01 is O(1) and instant; DELETE FROM events WHERE event_date < … is O(N) and locks rows
- Index size — indexes are per-partition. Smaller indexes fit better in memory; each per-partition index is independently tuneable
- Backup/restore granularity — partitions can be archived to colder storage, restored independently
Partitioning isn't free
The schema complexity is real. Cross-partition queries are slower (no pruning), unique constraints across all partitions require the partition key in the constraint, and the operational overhead of partition creation/drop usually needs automation (pg_partman, Postgres native auto-partitioning, or a cron job). Don't partition tables under ~100M rows; the maintenance cost outweighs the gain.
Exercise
Your events table is 400M rows and growing 10M/month. Queries are 90% 'last 7 days' filters, 10% historical analytics. Reads peak at 200 QPS during business hours. (1) Decide between (a) just adding more indexes, (b) building a materialised view of the last 7 days, (c) partitioning by event_date. (2) Justify. (3) What's the operational discipline that has to come with your choice?