Although less common than its sister lie – manipulating the y-axis in graphs – manipulation of the x-axis does occur.

But first a technical note: “bins” are clusters of subpopulations for which the frequency of some characteristic is measured. Together, the bins form a histogram or a graphical representation showing the distribution of a characteristic for an entire population (like a survey group). Here’s an example:

A survey of 31 black cherry tress revealed that three of them had a height between 60 and 65 feet; 8 had a height between 70 and 75 feet etc. There are 6 bins on this graph’s x-axis, probably because the person analyzing the survey data thought that 6 would be an adequate number. And indeed, dividing a population of 31 into 20 or 2 subgroups would probably not result in interesting numbers, at least not in this case.

Working with bins means that the x-axis shows a split of the surveyed population into smaller groups according to certain ranges of the characteristic that was surveyed (height in this case), making it possible to see how many individuals (trees in this case) belong to a certain range or subgroup. Notice that in this example the bins are

- not too numerous
- not too few
- of equal size (always a range of 5 feet)
- consecutive and
- non-overlapping.

As they should be. (The size shouldn’t always be equal, but often is).

Many histograms have a “bell-shape” like in this example (in which case they show what is called a “normal distribution“), but they can also have other shapes, depending on the population and the characteristic surveyed. A survey of the frequency of a certain disease among the population of a country, with the population divided into bins according to individuals’ age, would be skewed to the left since older people – on the right – may suffer more frequently from the disease.

Since all this is probably old news to most of you, let’s go straight to an example of manipulation of bins. Such manipulation often involves tinkering with the ranges of certain bins, so that the different bins are no longer of the same size. The following example is about income shares across the population of the U.S. Technically, the graph below is not a histogram because the y-axis shows cumulated income for ranges of income groups rather than frequencies, but for our purposes it’s equivalent:

###### (source, source)

This graph is then used by the Wall Street Journal to argue against increased taxation of the rich as a means to close the budget deficit, because supposedly that’s not where the money is. Or, better, the money is there, otherwise they wouldn’t be rich, but there are just not enough of them; taxing the middle class would be better according to the WSJ because it’s they who have all the money … at least if you believe their graph. The problem is that the highest bar in their graph is for people making $100-$200K, whereas the bar immediately to the left of this one is for the income range of $75K to $100K – an income range only one-quarter the size. No surprise that the bar for $100-$200K is so much larger than the rest…

If you want to argue that taxing the rich *does* make it possible to bring in a lot more revenue, then you could use this alternative graph, made from the same data:

###### (source)

Or this one:

###### (source)

More alternative presentations of the same data are here.

It all depends what you mean by “rich” and “middle class”, but claiming - as does the WSJ – that $200K is still “middle class” is stretching the point.

More posts in this series are here.

Pingback: Irish Left Review · Filip Spagnoli | Lies, Damned Lies, and Statistics

Pingback: Statistical Jokes (42): Graphs and Axes | P.a.p.-Blog, Human Rights Etc.

Pingback: Lies, Damned Lies, and Statistics (43): Cherry Picking Time Frames | P.a.p.-Blog // Human Rights Etc.