Pandas data wrangling — filtering, grouping, merging, pivoting, and reshaping — is what you'll do for 80% of any real analysis. The vocabulary is small: five verbs cover almost everything.
Filter — select rows
# Boolean masktier1 = df[df['tier'] == 1]big = df[df['assets_bn'] > 1000]both = df[(df['tier'] == 1) & (df['assets_bn'] > 1000)]# query() — readable for complex filtersdf.query('tier == 1 and assets_bn > 1000')
Group-by aggregate — split, apply, combine
groupby is the most powerful single operation in pandas. It splits the DataFrame by a key, applies an aggregation, and combines the results.
df.groupby('tier')['assets_bn'].sum()df.groupby('tier').agg({'assets_bn': 'sum', 'bank': 'count'})# Multiple keysdf.groupby(['tier', 'region'])['assets_bn'].mean()
Merge — join two DataFrames
merged = pd.merge(df_left, df_right, on='bank', how='inner')# how can be: 'inner', 'left', 'right', 'outer'
Pivot and melt — reshape wide↔long
# Long → widewide = df.pivot(index='month', columns='bank', values='rate')# Wide → longlong = wide.melt(ignore_index=False, var_name='bank', value_name='rate').reset_index()
Method chaining
Pandas methods return DataFrames, which lets you chain operations into a pipeline. This is the single most readable way to express a multi-step transformation.
result = (df.query('tier == 1').groupby('region')['assets_bn'].sum().sort_values(ascending=False).head(5))
Method chaining is the Pythonic data style
Each step does one thing. The pipeline reads top-to-bottom. Compare to nesting: head(5)(sort_values(False)(...)) is unreadable. Chain it.
Exercise
Using bankrates, group by month and compute the mean lending_rate.