Lies, Damned Lies, and Statistics (8): Failure to Divide by Population

comments 5
lies and statistics / statistics / work

I often see graphs that contain a time series of some sort, but the numbers are just plain numbers, not normalized by population. Here’s an example of a graph from the Bush-era, flaunting the supposedly beneficial effects of Bush’s labor policy on job growth (green line, “jobs on the rise”, number of jobs in thousands):

graph number of jobs

(source)

Just presenting the numbers of job without relating them to the population, is meaningless. Maybe the population grew faster than the number of jobs, in which case the growth exhibited here is in fact a decrease. Or the population shrunk, in which case the growth in the number of jobs was even bigger.

Here’s the correct graph, showing that employment did increase under Bush, but decreased during the last years of his presidency:

employment-population ratio

employment-population ratio

“Population” can mean actual population (i.e. people or residents), but can also mean any other relevant basis of comparison. For example:

The following statistics suggest that 16-year-olds are safer drivers than people in their twenties, and that octogenarians are very safe:

misleading stat accidents

(source)

As the following graph shows, the reason 16-year-old and octogenarians appear to be safe drivers is that they don’t drive nearly as much as people in other age groups:

misleading stat accidents2

(source)

Another example is the national debt statistic. Often the graph shows just the national debt in dollar, without relating it to GDP. Whereas the absolute amounts do have some relevancy, it’s better to express the debt as a percentage of GDP because a bigger economy can carry a bigger debt (a poor household may go bankrupt with a debt of $10,000, whereas a rich household can live with a debt of perhaps $100,000).

Take this graph for instance:

us national debt corrected for inflation

(source)

Now compare it to this one:

National debt as a  percentage of gdp

(source)

Or this, slightly more recent one, including the latest recession:

National debt as percentage of GDP

National debt as percentage of GDP

(source)

And a final example: looking at the relative safety of air travel and road travel and the probability of dying in either a road accident or a plane accident, you can also find divergent data depending on how you divide: number of casualties per trip, per miles traveled, per hours traveled etc.

More posts in this series.

5 Comments

  1. Andreas R says

    Hi Filip,

    Isn’t the first employment graph only a small section of the second (2003-2005)? The employment drop mentioned in 2006-2008 would not be shown even if the graph was correct.

    • You’re right, the two graphs aren’t comparable in terms of periods covered. That’s because the sources are different. But I guess the recent drop in employment would have shown up in the first one had it covered the recent period.

  2. Pingback: Lies, Damned Lies, and Statistics (13): You’re Not Measuring What You Think You Are « P.A.P. Blog – Human Rights Etc.

  3. Pingback: Lies, Damned Lies, and Statistics (17): The Correlation-Causation Problem and Omitted Variable Bias, aka “Jumping to Conclusions” « P.A.P. Blog – Human Rights Etc.

  4. Pingback: Lies, Damned Lies, and Statistics (13): You’re Not Measuring What You Think You Are | P.a.p.-Blog, Human Rights Etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s