What the model got right last time.
Before the 2026 World Cup starts, here's how the same Dixon-Coles + Elo ensemble would have done on three recently-completed major tournaments. Out-of-sample — pre-tournament Elo only, no hindsight.
77%
Winner accuracy
30 of 39
0.168
Mean Brier
Coin flip: 0.250
0.515
Mean log loss
Coin flip: 0.693
77%
Naive Elo baseline
Higher-Elo always
What the numbers mean — honestly.
77% of 39 matches is meaningfully better than chance ( 50% ) and better than what a random forecaster (Brier ≈ 0.25) would manage. It's in the same range as published academic models on comparable datasets.
The headline winner accuracy is the same as a naive baseline that always picks the higher-Elo side (77%). That's an honest finding: the Dixon-Coles addition doesn't move the picks much because both methods share the same Elo input. Where Dixon-Coles does help is in confidence calibration — the probability assigned to the winner — which shows up in the Brier and log-loss scores rather than the binary pick.
The misses are worth noting. The model failed to predict major upsets — Morocco beating Spain and Portugal at WC 2022, Switzerland beating Italy at Euro 2024, Uruguay beating Brazil at Copa 2024. Pre-tournament Elo simply doesn't capture the in-tournament form of a side hitting their stride. This is a known limitation, not a fixable one without live in-tournament updates.
Sample size warning. 39 matches is not enough to draw strong statistical conclusions about model calibration. The 95% confidence interval on a 77% accuracy estimate from 39 trials is roughly ±13 percentage points. Three tournaments' worth of knockout matches is illustrative, not definitive.
2022 · 16 matches
2022 FIFA World Cup
First World Cup in the Middle East. Argentina won their third title, beating France on penalties after a 3-3 draw. 16 knockout matches.
Accuracy
81.3%
13 of 16
Mean Brier
0.167
Mean log loss
0.519
vs naive Elo
81%
| Round | Match | Predicted | Actual | Conf. | |
|---|---|---|---|---|---|
| R16 | Netherlands v USA | Netherlands | Netherlands | 50% | ✓ |
| R16 | Argentina v Australia | Argentina | Argentina | 58% | ✓ |
| R16 | France v Poland | France | France | 51% | ✓ |
| R16 | England v Senegal | England | England | 49% | ✓ |
| R16 | Japan v Croatiapens | Croatia | Croatia | 71% | ✓ |
| R16 | Brazil v Korea | Brazil | Brazil | 60% | ✓ |
| R16 | Morocco v Spainpens | Spain | Morocco | 47% | ✗ |
| R16 | Portugal v Switzerland | Portugal | Portugal | 48% | ✓ |
| QF | Croatia v Brazilpens | Brazil | Croatia | 47% | ✗ |
| QF | Netherlands v Argentinapens | Argentina | Argentina | 71% | ✓ |
| QF | Morocco v Portugal | Portugal | Morocco | 23% | ✗ |
| QF | England v France | France | France | 42% | ✓ |
| SF | Argentina v Croatia | Argentina | Argentina | 52% | ✓ |
| SF | France v Morocco | France | France | 51% | ✓ |
| 3P | Croatia v Morocco | Croatia | Croatia | 44% | ✓ |
| F | Argentina v Francepens | Argentina | Argentina | 72% | ✓ |
2024 · 15 matches
UEFA Euro 2024
Hosted by Germany. Spain won a record fourth Euros, beating England 2-1 in the final at Berlin's Olympiastadion. 15 knockout matches (no 3rd-place game).
Accuracy
73.3%
11 of 15
Mean Brier
0.168
Mean log loss
0.507
vs naive Elo
73%
| Round | Match | Predicted | Actual | Conf. | |
|---|---|---|---|---|---|
| R16 | Switzerland v Italy | Italy | Switzerland | 36% | ✗ |
| R16 | Germany v Denmark | Germany | Germany | 41% | ✓ |
| R16 | England v Slovakia | England | England | 55% | ✓ |
| R16 | Spain v Georgia | Spain | Spain | 61% | ✓ |
| R16 | France v Belgium | France | France | 38% | ✓ |
| R16 | Portugal v Sloveniapens | Portugal | Portugal | 83% | ✓ |
| R16 | Romania v Netherlands | Netherlands | Netherlands | 56% | ✓ |
| R16 | Austria v Türkiye | Austria | Türkiye | 33% | ✗ |
| QF | Spain v Germany | Spain | Spain | 43% | ✓ |
| QF | Portugal v Francepens | Portugal | France | 62% | ✗ |
| QF | Netherlands v Türkiye | Netherlands | Netherlands | 51% | ✓ |
| QF | England v Switzerlandpens | England | England | 70% | ✓ |
| SF | Spain v France | Spain | Spain | 40% | ✓ |
| SF | Netherlands v England | Netherlands | England | 34% | ✗ |
| F | Spain v England | Spain | Spain | 42% | ✓ |
2024 · 8 matches
2024 Copa América
Hosted by the United States. Argentina won a record 16th title, beating Colombia 1-0 in extra time at Hard Rock Stadium, Miami. 8 knockout matches.
Accuracy
75.0%
6 of 8
Mean Brier
0.173
Mean log loss
0.519
vs naive Elo
75%
| Round | Match | Predicted | Actual | Conf. | |
|---|---|---|---|---|---|
| QF | Argentina v Ecuadorpens | Argentina | Argentina | 82% | ✓ |
| QF | Venezuela v Canadapens | Canada | Canada | 65% | ✓ |
| QF | Colombia v Panama | Colombia | Colombia | 50% | ✓ |
| QF | Uruguay v Brazilpens | Brazil | Uruguay | 53% | ✗ |
| SF | Argentina v Canada | Argentina | Argentina | 58% | ✓ |
| SF | Colombia v Uruguay | Uruguay | Colombia | 27% | ✗ |
| 3P | Uruguay v Canadapens | Uruguay | Uruguay | 76% | ✓ |
| F | Argentina v Colombia | Argentina | Argentina | 55% | ✓ |
How the backtest works.
- Take each team's Elo rating as of the start of the tournament. Sources: eloratings.net historical snapshots, approximate to ±25 Elo where the exact snapshot wasn't recoverable. Same scale we use for the 2026 predictions.
- For every knockout match, run the same Dixon-Coles + Elo ensemble that powers the live WC 2026 page. The model produces P(home win), P(draw), P(away win) for regulation time.
- Since knockout matches must produce a winner, collapse the draw probability to whichever side has the higher regulation win probability. This matches how the live bracket simulator resolves draws.
- Compare the model's pick to the actual winner (including penalty-shootout outcomes). Compute Brier score and log loss against the binary outcome.
- As a sanity baseline, also compute what a naive "always pick higher-Elo" rule would have predicted. If the model doesn't beat this baseline materially, that's an honest finding worth reporting — not hidden.
The backtest dataset lives in lib/probability-lab/backtest-data.ts and the runner in lib/probability-lab/backtest-runner.ts. Both are committed to git — any reader can verify the predictions match what the live ensemble produces.