Modern apps log a lot of semi-structured data: event payloads, webhook bodies, configuration. SQL has evolved to query it natively. Postgres JSONB, MySQL 8 JSON, Snowflake VARIANT, BigQuery JSON, DuckDB JSON — all give you operators to read into the structure without unmarshalling in application code.
-> vs ->> in Postgres
-- events.payload is JSONBSELECTpayload -> 'user' AS user_json, -- returns JSONBpayload -> 'user' ->> 'id' AS user_id_text, -- returns text(payload -> 'user' ->> 'id')::int AS user_id_int -- cast to intFROM eventsWHERE payload ->> 'event_type' = 'purchase';
- -> returns JSONB (typed sub-value)
- ->> returns text (the JSON scalar coerced to a string)
- Chain -> for nested access, finish with ->> to compare to a string in WHERE
- Cast with ::type when you need a numeric or date comparison
Containment with @>
-- Find events whose payload contains user.country = 'Kenya' at any nesting levelSELECT * FROM eventsWHERE payload @> '{"user": {"country": "Kenya"}}'::jsonb;
The @> operator tests JSON containment: does the left value contain the right structure as a subset? It's the natural fit for 'has this set of attributes' queries and — critically — it's GIN-indexable.
GIN indexes for JSONB
-- Default GIN: supports @>, ?, ?&, ?| operatorsCREATE INDEX idx_events_payload_gin ON events USING GIN (payload);-- Path-ops GIN: smaller index, only supports @> (most common operator)CREATE INDEX idx_events_payload_pathops ON events USING GIN (payload jsonb_path_ops);-- Expression index on a specific JSON path (cheapest if you only query one path)CREATE INDEX idx_events_user_country ON events ((payload -> 'user' ->> 'country'));
jsonb_path_ops is usually the right choice
It's about 30% the size of the default GIN and faster to maintain. Only switch to default GIN if you need ? operators (key-existence tests) or full-text style searches inside JSON.
JSON path expressions (SQL/JSON, Postgres 12+)
-- Returns true if any event in items[] has type='premium'SELECT id FROM eventsWHERE payload @? '$.items[*] ? (@.type == "premium")';-- Extract all amounts from items[] as a JSONB arraySELECT jsonb_path_query_array(payload, '$.items[*].amount') FROM events;
SQL/JSON path is more powerful than -> / ->> for queries over arrays and conditional traversal. Standardised in SQL:2016, supported by Postgres 12+, Oracle 12c+, SQL Server 2017+, MySQL 8+.
When to denormalise to columns
JSON columns are flexible — schema-on-read, no migrations to add a field. But for fields you query frequently, denormalising them into real columns is faster and lets you use proper B-tree indexes. The break-even is roughly:
- Read >100 times for every write of the JSON blob → denormalise
- Filter or sort by the field in most queries → denormalise
- Field is part of foreign-key relationships → must denormalise (FKs don't work on JSON)
- Field is rarely read or only read by ad-hoc analytics → keep in JSON
- Schema is genuinely uncertain or differs by record type → keep in JSON
The 'just put it in JSON' anti-pattern
Schemas that should be relational often start as 'we'll put it in JSON for flexibility' and end up unmaintainable. If you find yourself writing payload ->> 'customer_id' in every query, that field deserves to be a column.
Exercise
Your app stores webhook payloads in events(id, received_at, source, payload JSONB). Engineers query the table in three patterns: (a) all events where payload.event_type IN ('purchase', 'refund') in the last 7 days, (b) all events for a given external user — payload.user.external_id — across all time, (c) ad-hoc 'what's in here?' exploration. (1) Design the indexes. (2) Argue for/against extracting event_type to a real column.