Algorithms vs. Experts (Meehl)

← Back to Knowledge Graph

In roughly 200 studies across medicine, criminal justice, hiring, wine tasting, and graduate admissions, simple statistical formulas matched or outperformed the judgment of trained experts. Every single time. The experts were not amused.

The Framework

Paul Meehl's 1954 review of clinical vs. statistical prediction launched one of the most uncomfortable findings in social science: when given the same information, a simple algorithm — even a crude equal-weighting formula with no training data — consistently outperforms human expert judgment. The reason is not that experts are stupid. It's that experts are inconsistent. Given the same case twice, an expert will often reach different conclusions. An algorithm, given the same inputs, always produces the same output. And consistency, it turns out, matters more than insight.

Kahneman's Chapter 21 extends Meehl's finding beyond clinical psychology into hiring, investment, wine pricing, and parole decisions. Orley Ashenfelter's wine pricing model (using weather data to predict Bordeaux vintages) consistently beat expert sommeliers — and the experts hated it. The Apgar score (five variables, equal weighting, assessed in one minute) outperforms clinical judgment in evaluating newborn health. The lesson is brutal and universal: wherever a prediction task has a reasonable number of quantifiable inputs, the formula wins.

Where It Comes From

Meehl published Clinical vs. Statistical Prediction in 1954, launching a debate that is still not fully resolved in practice — despite being settled in the data. Chapter 21 of Thinking, Fast and Slow presents Meehl's work alongside later confirmations, including a meta-analysis by William Grove and colleagues that reviewed 136 studies and found that algorithms equaled or beat clinical judgment in virtually all of them. The rare exceptions (where humans slightly outperformed) involved cases where the expert had access to information the algorithm didn't.

> "The research suggests a surprising conclusion: to maximize predictive accuracy, final decisions should be left to formulas, especially in low-validity environments." — Thinking, Fast and Slow, Ch 21

Cross-Library Connections

Hughes's structured behavioral profiling system in Six-Minute X-Ray embodies the Meehl principle: instead of relying on intuitive "gut reads," Hughes provides a systematic protocol — observe specific behaviors, categorize into quadrants, score needs hierarchy — that produces consistent assessments across practitioners. The system's value is consistency over insight.

Kahneman's own structured interview protocol (Chapter 21) translates the Meehl finding into hiring: score six independent traits sequentially, then "close your eyes" for an overall impression. The sequential scoring prevents the halo effect (one positive trait contaminating all others) and produces the consistency that formulas require.

Fisher's objective criteria principle in Getting to Yes mirrors the Meehl logic: instead of relying on the parties' subjective judgments of fairness (which are inconsistent and self-serving), use external standards — market rates, precedent, expert opinion — that produce consistent evaluations.

The Implementation Playbook

Hiring: Replace unstructured interviews with Kahneman's structured protocol. Define 6 job-relevant traits before the interview. Score each trait 1-5 during the interview before moving to the next. Only after all traits are scored, form an overall impression. This simple procedure outperforms unstructured interviews because it prevents the halo effect and ensures consistency.

Investment Decisions: Build a scoring model with 5-8 factors (market size, team quality, unit economics, competitive moat, etc.), weight them equally, and score every opportunity on the same rubric. The algorithm won't catch every unicorn, but it will prevent the systematic overweighting of charismatic founders, exciting narratives, and WYSIATI-driven enthusiasm.

Medical Diagnosis: Clinical decision support tools already implement this principle — the question is whether practitioners actually use them or override them with "clinical judgment." The data says: override only when you have information the algorithm doesn't have. Never override because "this case feels different."

Performance Reviews: Create a standardized rubric with specific, observable criteria. Score each criterion independently. Resist the urge to adjust scores to match your overall impression — that's the halo effect defeating the algorithm. The rubric-based review is less satisfying (it feels mechanical) but more accurate (it's consistent).

Personal Decisions: When choosing between apartments, job offers, or any multi-attribute decision, list the attributes, weight them (or use equal weights — Meehl showed this works surprisingly well), score each option on each attribute, and go with the highest total score. Your "gut feeling" is useful for identifying factors you forgot to include — but once the factors are listed, the algorithm beats the gut.

Key Takeaway

The Meehl finding is not about replacing human judgment with machines — it's about recognizing that human judgment's greatest weakness is inconsistency, and that even the crudest consistency produces better outcomes than the most brilliant inconsistency. Experts who use formulas as a first pass and then adjust for specific factors outperform both pure algorithms and pure intuition. The formula is the floor, not the ceiling — but most experts never reach the floor because they skip the formula entirely.

Continue Exploring

[[Kahneman-Klein Two-Condition Test]] — When expert intuition CAN be trusted (regular environments + adequate practice)

[[Structured Interview Protocol]] — The specific hiring implementation of the Meehl principle

[[WYSIATI]] — Why experts' intuitive judgments are swayed by whatever information is most salient

📚 From Thinking, Fast and Slow by Daniel Kahneman — Get the book