education, lies and statistics, poverty, statistics, war

Lies, Damned Lies, and Statistics (17): The Correlation-Causation Problem and Omitted Variable Bias, aka “Jumping to Conclusions”

correlation vs causation

correlation vs causation

(source)

Some more detailed information after my casual remark on the correlation-causation problem. Here’s a fictitious example of what is meant by “Omitted Variable Bias“, a type of statistical bias that illustrates this problem. Suppose we see from Department of Defense data that male U.S. soldiers are more likely to be killed in action than female soldiers. Or, more precisely and in order to avoid another statistical error, the percentage of male soldiers killed in action is larger than the percentage of female soldiers. So there is a correlation between the gender of soldiers and the likelihood of being killed in action.

One could – and one often does – conclude from such a finding that there is a causation of some kind: the gender of soldiers increases the chances of being killed in action. Again more precisely: one can conclude that some aspects of gender – e.g. a male propensity for risk taking – leads to higher mortality.

However, it’s here that the Omitted Variable Bias pops up. The real cause of the discrepancy between male and female combat mortality may not be gender or a gender related thing, but a third element, an “omitted variable” which doesn’t show in the correlation. In our fictional example, it may be the type of deployment: it may be that male soldiers are more commonly deployed in dangerous combat operations, whereas female soldiers may be more active in support operations away from the front-line.

OK, time for a real example. It has to do with home-schooling. In the U.S., many parents decide to keep their children away from school and teach them at home. For different reasons: ideological ones, reasons that have to do with their children’s special needs etc. The reasons are not important here. What is important is that many people think that home-schooled children are somehow less well educated (parents, after all, aren’t trained teachers). However, proponents of home-schooling point to a study that found that these children score above average in tests. However, this is a correlation, not necessarily a causal link. It doesn’t prove that home-schooling is superior to traditional schooling. Parents who teach their children at home are, by definition, heavily involved in their children’s education. The children of such parents do above average in normal schooling as well. The omitted variable here is parents’ involvement. It’s not the fact that the children are schooled at home that explains their above average scores. It’s the type of parents. Instead of comparing home-schooled children to all other children, one should compare them to children from similar families in the traditional system.

Greg Mankiw believes he has found another example of Omitted Variable Bias in this graph plotting test scores for U.S. students against their family income:

sat scores by income

(source, the R-square for each test average/income range chart is about 0.95)

[T]he above graph … show[s] that kids from higher income families get higher average SAT scores. Of course! But so what? This fact tells us nothing about the causal impact of income on test scores. … This graph is a good example of omitted variable bias … The key omitted variable here is parents’ IQ. Smart parents make more money and pass those good genes on to their offspring. Suppose we were to graph average SAT scores by the number of bathrooms a student has in his or her family home. That curve would also likely slope upward. (After all, people with more money buy larger homes with more bathrooms.) But it would be a mistake to conclude that installing an extra toilet raises yours kids’ SAT scores. … It would be interesting to see the above graph reproduced for adopted children only. I bet that the curve would be a lot flatter. Greg Mankiw (source)

Meaning that adopted children, who usually don’t receive their genes from their new families, have equal test scores, no matter if they have been adopted by rich or poor families. Meaning in turn that the wealth of the family in which you are raised doesn’t influence your education level, test scores or intelligence.

However, in his typical hurry to discard all possible negative effects of poverty, Mankiw may have gone a bit too fast. While it’s not impossible that the correlation is fully explained by differences in parental IQ, other evidence points elsewhere. I’m always suspicious of theories that take one cause, exclude every other type of explanation and end up with a fully deterministic system, especially if the one cause that is selected is DNA. Life is more complex than that. Regarding this particular matter, take a look back at this post, which shows that education levels are to some extent determined by parental income (university enrollment is determined both by test scores and by parental income, even to the extent that people from high income families but with average test scores, are slightly more likely to enroll in university than people from poor families but with high test scores).

What Mankiw did, in trying to avoid the Omitted Variable Bias, was in fact another type of bias, one which we could call the Singular Variable Bias: assuming that a phenomenon has a singular cause. In honor of Professor Mankiw (who does some good work, see here for example), I propose that henceforth we call it the Mankiw Bias.

More posts in this series.

Standard

18 thoughts on “Lies, Damned Lies, and Statistics (17): The Correlation-Causation Problem and Omitted Variable Bias, aka “Jumping to Conclusions”

  1. Pingback: Statistical Jokes (2): Fun With Correlation, Ctd. « P.A.P. Blog – Human Rights Etc.

  2. Pingback: Lies, Damned Lies, and Statistics (23): The Omitted Variable Bias, Ctd. « P.A.P. Blog – Human Rights Etc.

  3. Pingback: Why is There So Much Poverty in the World? An Overview of Some of the Possible Causes of Poverty « P.A.P. Blog – Human Rights Etc.

  4. Pingback: Lies, Damned Lies, and Statistics (27): Jumping to Conclusions, Ctd. « P.A.P. Blog – Human Rights Etc.

  5. Pingback: The Single Customer Metric Trap « Iterative Path

  6. Pingback: Human Rights Nonsense (8): Heightism or Height Discrimination « P.A.P. Blog – Human Rights Etc.

  7. Pingback: The Relative Cost of Freedom and Dictatorship « P.A.P. Blog – Human Rights Etc.

  8. Pingback: Should Taxation Be a Tool For Economic Efficiency or For Social Justice? « P.A.P. Blog – Human Rights Etc.

  9. Pingback: Statistical Jokes (12): Birthdays Are Healthy | P.a.p.-Blog, Human Rights Etc.

  10. Pingback: Human Rights Facts (202): The “Criminal Immigrant” Stereotype, Ctd. | P.a.p.-Blog, Human Rights Etc.

  11. Pingback: Lies, Damned Lies, and Statistics (33): The Omitted Variable Bias, Ctd. | p.a.p.-blog | human rights etc.

  12. Pingback: It’s official! Internet overuse causes brain damage! Oh wait…no, it doesn’t… | The Science Bit

  13. Pingback: The Causes of Poverty (58): Low Average Intelligence in Poor Countries? | P.a.p.-Blog, Human Rights Etc.

  14. Pingback: Statistical Jokes (44): Correlation Doesn’t Imply Causation, Ctd. | P.a.p.-Blog, Human Rights Etc.

  15. Pingback: Why Do We Need Human Rights? (36): The Economic Case Against Democracy | P.a.p.-Blog, Human Rights Etc.

  16. Pingback: What is Democracy? (67): The Form of Government That Offers the Best Protection Against Human Rights Violations | P.a.p.-Blog, Human Rights Etc.

  17. Pingback: Causality vs correlation | The Nightly Brew

  18. Pingback: What is Freedom? (18): Freedom is a Happiness Pump | P.a.p.-Blog // Human Rights Etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s