Prisoner's Dilemma & Nash Equilibrium

Why two rational actors can each do worse than they jointly could — and when repetition restores cooperation.

Developed by Flood & Dresher (RAND) · formalized by TuckerOrigin 1950Intro
SO

Built and reviewed by Stephen Omukoko Okoth

Mathematical Economist · ex-Morgan Stanley FI · Equilar

Theory

What the model says, and why

Two suspects are arrested. Police separate them and offer each the same deal: confess against your partner and walk free; if both confess, both serve a moderate sentence; if neither confesses, both walk on a lesser charge. The dominant strategy for each player, considered alone, is to confess (defect). But if both confess, both end up worse than if both had stayed silent. That’s the dilemma.

The payoffs satisfy:

T > R > P > S     and     2R > T + S

T = Temptation (defect when partner cooperates), R = Reward (mutual cooperate), P = Punishment (mutual defect), S = Sucker (cooperate when partner defects). The first inequality makes defection dominant; the second makes mutual cooperation Pareto-optimal.

Nash equilibrium in the one-shot game is (Defect, Defect). Each player’s best response to defection is defection. The Pareto-optimal outcome (Cooperate, Cooperate) is unstable — given that the other plays Cooperate, you do better by deviating to Defect.

The repeated game changes everything. If players meet repeatedly with high enough probability of continuation (discount factor δ close to 1), cooperation can be sustained as a Nash equilibrium. The tit-for-tat strategy — cooperate first, then mirror the opponent — wins Axelrod’s famous tournament (1980) and shows up in nature (vampire-bat blood sharing, cleaner fish, US Civil War trench warfare).

Why this matters beyond game theory. The Prisoner’s Dilemma is the cleanest model of situations where individual rationality doesn’t produce collective rationality. Examples: arms races, tax evasion, climate change, oil cartels, advertising wars, fisheries. The structural intuition — that repeated interaction with credible punishment can rescue cooperation — is one of the most important ideas in social science.

Interactive playground

Move the parameters, watch the equilibrium move

Payoffs

Set the four numbers

Status

These payoffs satisfy the Prisoner's Dilemma conditions: T > R > P > S and 2R > T + S.

Payoff matrix

Two players, two strategies

Player B
Player ACooperateDefect
Cooperate(3, 3)(0, 5)
Defect(5, 0)(1, 1)

Each cell is (A’s payoff, B’s payoff). The Nash equilibrium is bottom-right, even though top-left is Pareto-superior.

Equilibrium analysis

One-shot game

Nash equilibrium

(Defect, Defect)

Pareto-optimal

(Cooperate, Cooperate)

Dilemma size (R − P)

2.00

How much both players lose by defecting

Repeated game

When repetition supports cooperation

Cooperate forever (vs TFT)

38.49

Defect once (vs TFT)

16.83

Defect always (vs TFT)

12.83

Pays only P forever

Cooperation is stable: the payoff from cooperating forever exceeds the gain from a one-time defection followed by punishment.

In the classroom

How to teach it well

Run it as a game first. Before any theory, have students play the one-shot dilemma in pairs for ten minutes. Many cooperate the first time; almost all defect by the third round. The defection trajectory itself is a teaching moment.

Connect to real cases. Cartels (OPEC), arms races (Cold War), tax compliance, environmental treaties, advertising wars between Coke and Pepsi. Each one is a Prisoner’s Dilemma with a repeated-game escape mechanism (institutions, monitoring, credible punishment). The Cold War's MAD doctrine works exactly because repeated interaction with credible punishment makes mutual restraint stable.

Common student trap. Many believe the “rational” outcome should be cooperation because it’s Pareto-better. Push back: rational means best-responding to the opponent’s strategy taking your own as given. From within that frame, defection is rational. The dilemma isn’t about irrationality — it’s about the gap between individual and collective rationality, which is exactly why institutions exist.

African policy applications. Tax evasion in fragmented sectors, coordination failures in regional infrastructure, deforestation as a multi-player dilemma. The same structural insight — that repeated interaction with credible monitoring rescues cooperation — applies to public-finance design.