Skip to content
Module 11 of 1250 min readIntermediate

Limited dependent variables

Logit, probit, tobit, and the linear probability model — when each is right, and the marginal-effect interpretation that everyone gets wrong.

92%

Listen along

Read “Limited dependent variables” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Sometimes the dependent variable doesn't fit the linear-regression mould. It's binary (defaults yes/no), categorical (which insurance plan), censored (wages are observed only for employed people), count (number of doctor visits), or bounded (probability between 0 and 1). Each calls for different machinery.

The linear probability model

When y is binary (0/1), running OLS gives you the linear probability model:

math
Pr(yᵢ = 1 | xᵢ) = β₀ + β₁ x₁ᵢ + ... + βₖ xₖᵢ

Coefficients are marginal effects on probability. Easy to interpret. But two issues:

  • Predicted probabilities can fall outside [0, 1] — economically meaningless
  • Errors are heteroskedastic by construction (variance depends on Pr(y=1))

LPM is more popular than it looks

Despite the issues, modern empirical economists often use LPM with robust SEs because: (1) coefficients ARE marginal effects without post-estimation transformations, (2) it plays nicely with fixed effects and IV, (3) substantive results are usually similar to logit/probit. The issues with predicted-probabilities-outside-[0,1] are mostly aesthetic in causal-inference contexts.

Logit

Constrains predicted probabilities to [0, 1] via the logistic function:

math
Pr(yᵢ = 1 | xᵢ) = exp(xᵢβ) / (1 + exp(xᵢβ))

Estimated by maximum likelihood. The coefficient β is a log-odds ratio — not a marginal effect on probability. exp(β) is the odds ratio. To get marginal effects on probability, evaluate the derivative ∂P/∂x at a specific value of x.

Marginal effects: MEM vs AME

  • Marginal effect at the means (MEM): plug in mean values of x, compute ∂P/∂x there
  • Average marginal effect (AME): compute ∂P/∂x for each observation, then average
  • AME is generally preferred — robust to skewed regressors and binary controls

In Stata: margins after logit. In R: marginaleffects::avg_slopes(model). In Python: results.get_margeff() in statsmodels.

Probit

Same idea as logit but with the standard-normal CDF instead of the logistic:

math
Pr(yᵢ = 1 | xᵢ) = Φ(xᵢβ)

Logit and probit give nearly identical fits in practice. Choice is largely tradition: probit is common in IO and macro; logit dominates in epidemiology and machine learning. Coefficients are NOT directly comparable — logit β ≈ 1.6 × probit β as a rule of thumb.

Tobit and censored regression

When y is observed only above (or below) a threshold but the latent variable is continuous. Examples:

  • Wage offers below reservation wage → person is unemployed, hours = 0
  • Tax-deductible expenses below threshold are not reported
  • Loan amounts below minimum aren't disbursed
  • Test scores ceiling- or floor-censored

Tobit jointly estimates the censoring threshold and the underlying continuous distribution via maximum likelihood. The key assumption — normality of the latent error — is more restrictive than in OLS. The Heckman two-step (selection model) is an alternative when censoring is selection-driven (we observe wages only for those who chose to work, and that choice depends on potential wages).

Multinomial logit

When y is unordered categorical with K > 2 outcomes. Travel mode (car / bus / bike / walk), party voted for, industry of employment. One category is the baseline; K−1 sets of coefficients describe relative log-odds.

IIA: independence of irrelevant alternatives

MNL assumes that adding or removing alternatives doesn't change relative odds among the others. The classic counterexample: red bus / blue bus problem. If commuters are indifferent between equivalent buses, adding a blue bus to a market with car and red bus shifts probability from red bus, not from car. Nested logit and mixed logit (random coefficients) relax IIA.

Ordered logit/probit

When y is ordered categorical (Likert: strongly disagree → strongly agree; bond ratings: AAA → D). Estimates a single β vector plus K−1 cutoffs. Parallel-regressions assumption: the same β applies across all category cutoffs. Test with Brant test (ordered logit) — if it fails, fall back to multinomial logit and accept the loss of efficiency.

Count data: Poisson and negative binomial

When y is a non-negative integer count (doctor visits, patents, accidents). Poisson regression assumes equal mean and variance. Negative binomial relaxes this by allowing overdispersion (var > mean) — almost universally observed in real count data. Zero-inflated variants handle excess zeros (the population that never visits a doctor regardless of x).

What estimator should you use?

  • Binary y, causal inference: linear probability model with robust SEs (allows IV, FE, clean interpretation)
  • Binary y, prediction: logit or probit (better calibration of predicted probabilities)
  • Categorical y, K outcomes unordered: multinomial logit (test IIA)
  • Categorical y, ordered: ordered logit/probit (test parallel regressions)
  • Count y: negative binomial (default), Poisson if dispersion is well-behaved
  • Censored y, latent continuous: Tobit (or Heckman if selection is the censoring mechanism)

Exercise

You have data on 5,000 SACCO members and want to estimate how monthly contribution affects loan-default probability. The dependent variable is a binary default flag. State (a) which estimator you'd use for causal inference, (b) what marginal effect to report, (c) what robustness check you'd add.

Loading progress…
LeadAfrikPublic Economics Hub