Skip to content
Module 02 of 1255 min readMixed

Power Query — connect, clean, shape

Connectors, transformations, the M language underneath. Why ETL belongs in Power Query, not in DAX or the data source.

17%

Listen along

Read “Power Query — connect, clean, shape” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Connect to a data source and transform the data via Power Query Editor
  • 02Recognise the M language underneath and read auto-generated M
  • 03Understand query folding — and why pushing transformations into the source matters

Power Query is one of Power BI's most under-appreciated assets. It is a visual ETL tool that generates a functional language called M underneath. The drag-and-drop transformations — remove columns, change types, pivot, merge — get recorded as M steps. The result is reproducible, refresh-able ETL that lives in the workbook.

The Power Query Editor

From Power BI Desktop: Home → Transform data → opens Power Query Editor. The interface has three panes: Queries (data sources), Query Settings (Applied Steps), and the data preview. Every transformation you do shows as a step in 'Applied Steps' — you can edit, delete, or reorder them like a Git history.

The M language

text
// Auto-generated M for a typical ETL flow
let
Source = Csv.Document(File.Contents("sales.csv"), [Delimiter=",", Columns=8, Encoding=65001]),
PromotedHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ChangedTypes = Table.TransformColumnTypes(PromotedHeaders, {{"Date", type date}, {"Amount", type number}}),
FilteredRows = Table.SelectRows(ChangedTypes, each [Amount] > 0),
AddedYear = Table.AddColumn(FilteredRows, "Year", each Date.Year([Date]), Int64.Type)
in
AddedYear
M is a functional language. Each step takes the previous table and returns a new one. Every GUI action you take produces a line of M.

Query folding

Query folding is the performance lever

When you connect to a database (SQL Server, Postgres, Snowflake) and apply Power Query steps, Power Query attempts to translate the steps into native SQL and push them to the source — so the source returns only the transformed data, not the full table. This is 'query folding' and it's the single biggest performance factor in Power BI. Right-click any step → 'View Native Query' to see if it's folded. Steps after the first non-foldable step run client-side in Power Query — much slower.

The ETL hygiene rules

  • Filter early: cut rows in the first few steps. Smaller dataset, faster everything downstream.
  • Pick columns: 'Choose Columns' early. Remove columns you won't use; folding is faster on narrow tables.
  • Set types explicitly: don't trust auto-detected types. A bad type assumption causes invisible refresh failures.
  • Use parameters: parameterise file paths, server names, environment differences. Lets you swap dev → prod without editing M.

Exercise

Connect Power BI Desktop to any CSV. In Power Query, apply 5+ transformations: filter, add column, change types, pivot/unpivot. Right-click each step and check whether 'View Native Query' is available. Which steps fold; which don't? What pattern do you see?

Key takeaways

  • Power Query is the right place for ETL — not DAX, not the data source's stored procedures.
  • M is the language; the GUI generates it. Read M to debug; write M for advanced cases.
  • Query folding — when Power Query translates your steps into native source SQL — is the key performance lever.
Loading progress…
LeadAfrikPublic Economics Hub