Skip to content
Module 01 of 1050 min readAdvanced

Floating point, conditioning, and stability

IEEE 754, machine epsilon, catastrophic cancellation. Condition number. The numerical traps that turn correct maths into wrong numbers.

10%

Listen along

Read “Floating point, conditioning, and stability” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Numerical methods are where the maths meets the machine. Every formula in this course will eventually be evaluated in floating-point arithmetic, and every approximation will accumulate error. Understanding floating-point representation, machine epsilon, and conditioning is the prerequisite for trusting any number a computer hands you.

IEEE 754 double precision

A double-precision float uses 64 bits: 1 sign + 11 exponent + 52 mantissa. This gives roughly 15-17 significant decimal digits and a range of about ±10^308. Machine epsilon (the smallest x such that 1 + x ≠ 1 in fp) is ε_mach ≈ 2.22 × 10⁻¹⁶.

Three sources of error

  1. Representation error: not every real number can be exactly represented. 0.1 in binary is a repeating fraction, so 0.1 + 0.2 ≠ 0.3 exactly.
  2. Round-off error: arithmetic operations introduce tiny errors of relative size ≈ ε_mach. Accumulated over many operations, these can grow.
  3. Cancellation error: subtracting two near-equal numbers loses precision. Computing (1 + 10⁻¹⁵) - 1 returns ~10⁻¹⁵ correctly but with much fewer significant digits than the inputs.

Catastrophic cancellation in finance

Classic example: the variance computed as E[X²] - E[X]² when these two terms are nearly equal. The naive formula loses precision because both terms are large but their difference is small. The two-pass / Welford algorithm computes the variance directly without subtracting near-equal quantities — always use it.

Condition number

For a function f, the relative condition number is |f'(x) · x / f(x)|. For matrix problems, κ(A) = ‖A‖ · ‖A⁻¹‖. Large condition number means small input changes produce large output changes — the problem is ill-conditioned.

Condition number ε rule of thumb

Solving Ax = b in floating-point loses roughly log₁₀(κ(A)) digits of accuracy. With ε_mach = 10⁻¹⁶ and κ = 10¹⁰, you have only 6 reliable digits in the answer. Always know your κ before trusting numerical results.

Numerical stability of algorithms

An algorithm is backward-stable if the computed answer is the exact answer for a slightly-perturbed input. Most modern numerical-library algorithms (LU with partial pivoting, QR via Householder, SVD via Golub-Kahan-Reinsch) are backward-stable. Naive algorithms (normal equations for least squares, classical Gram-Schmidt for QR) often are not.

Vectorisation and SIMD

Modern CPUs do 4-16 floating-point operations per cycle when data is properly aligned. NumPy, BLAS, and LAPACK exploit this; Python loops do not. The 100× speed gap between numpy.dot and a Python for-loop is the SIMD gap.

Avoiding loss of precision

  • Use log-space when probabilities are tiny. log(p₁ · p₂ · ... · p_N) = Σ log p_i.
  • Reorder sums to add similar-magnitude numbers first (Kahan summation for ultimate precision).
  • Avoid catastrophic cancellation: rewrite (1 - cos(x)) as 2 sin²(x/2) for small x; rewrite √(x² + 1) - x as 1/(√(x² + 1) + x) for large x.
  • When the input range is unknown, normalize before computing (max-shift trick for softmax/logsumexp).

Single vs double precision

Doubles (64-bit) are the default in scientific Python. Singles (32-bit, ε ≈ 10⁻⁷) save memory and run faster on GPUs. Production option-pricing code typically uses doubles; large-scale ML often uses singles or even half-precision floats with explicit mixed-precision arithmetic.

Exercise

Why does the naive sample variance s² = (1/n) Σ (x_i - x̄)² perform better numerically than the algebraically-equivalent s² = (1/n) Σ x_i² - x̄²? Construct a numerical example to show the difference.

Loading progress…
LeadAfrikPublic Economics Hub