NumPy is the foundation of the scientific Python stack. It provides ndarray — an N-dimensional array of homogeneous numbers — and a vast library of vectorised operations on it. Pandas, scikit-learn, TensorFlow, and matplotlib all build on top of NumPy.
Why NumPy is fast
NumPy arrays store data in contiguous memory as a single C array. Operations on the array are dispatched to highly optimised C/Fortran code (BLAS, LAPACK) instead of looping in Python. The result is 10-100x speedup over Python lists for numeric work.
import numpy as nprates = np.array([0.07, 0.10, 0.12, 0.15, 0.08])rates.mean() # 0.104rates.std() # 0.0298rates * 100 # array([ 7., 10., 12., 15., 8.])rates + 0.01 # adds 1pp to every element
Vectorisation — the core mental model
Operations on NumPy arrays are element-wise by default. You almost never need to write a for loop. If your code has a 'for i in range(len(arr))' over a NumPy array, you are probably doing it wrong.
# Slow Python loopresult = []for r in rates:result.append(r * 100)# Fast NumPy vectorisedresult = rates * 100
Slicing and indexing
arr = np.array([10, 20, 30, 40, 50])arr[0] # 10arr[-1] # 50arr[1:3] # array([20, 30])arr[arr > 20] # array([30, 40, 50]) — boolean mask# 2D arraysm = np.array([[1, 2, 3], [4, 5, 6]])m.shape # (2, 3)m[0, 1] # 2 (row 0, col 1)m[:, 1] # array([2, 5]) — all rows, col 1m.sum(axis=0) # array([5, 7, 9]) — column sumsm.sum(axis=1) # array([6, 15]) — row sums
Broadcasting
When operating on arrays of different shapes, NumPy 'broadcasts' the smaller one across the larger. This lets you write expressive vectorised code without writing explicit loops.
prices = np.array([100, 200, 300]) # shape (3,)shares = np.array([10, 5, 2]) # shape (3,)portfolio_value = prices * shares # element-wise: array([1000, 1000, 600])total = (prices * shares).sum() # 2600
If a loop is slow, vectorise it
The single biggest performance win in scientific Python is replacing Python-level loops with NumPy vectorised operations. 100,000 calculations that take 5 seconds in a Python loop typically take 50 milliseconds in NumPy.
Exercise
Create a NumPy array of the rates [0.07, 0.10, 0.12, 0.15, 0.08]. Compute the mean and the count of rates above 0.10.