lies and statistics, statistics

Lies, Damned Lies, and Statistics (6): Statistical Bias in the Design and Execution of Surveys

dilbert statistician

(source)

Statisticians can – wittingly or unwittingly – introduce bias in their work. Take the case of surveys for instance. Two important steps in the design of a survey are the definition of the population and the selection of the sample. As it’s often impossible (and undesirable) to question a whole population, statisticians usually select a sample from the population and ask their questions only to the people in this sample. They assume that the answers given by the people in the sample are representative of the opinions of the entire population.

Bias can be introduced

  • at the moment of the definition of the population
  • at the moment of the selection of the sample
  • at the moment of the execution of the survey¬†(as well as at other moments of the statistician’s work, which I won’t mention here).

Population

Let’s take a fictional example of a survey. Suppose statisticians want to measure public opinion regarding the level of respect for human rights in the country called Dystopia.

First, they set about defining their “population”, i.e. the group of people whose “public opinion” they want to measure. “That’s easy”, you think. So do they, unfortunately. It’s the people living in this country, of course, or is it?

Not quite. Suppose the level of rights protection in Dystopia is very low, as you might expect. That means that probably many people have fled the country. Including in the survey population only the residents of the country will then overestimate the level of rights protection. And there is another point: dead people can’t talk. We can assume that many victims of rights violations are dead because of them. Not including these dead people in the survey will also artificially push up the level of rights protection. (I’ll mention in a moment how it is at all possible to include dead people in a survey; bear with me).

Hence, doing a survey and then assuming that the people who answered the survey are representative for the whole population, means discarding the opinions of refugees and dead people. If those opinions were included the results would be different and more correct. Of course, in the case of dead people it’s obviously impossible to include their opinions, but perhaps it would be advisable to make a statistical correction for it. After all, we know their answers: people who died because of rights violations in their country presumably wouldn’t have a good opinion of their political regime.

Sample

And then there are the problem linked to the definition of the sample. An unbiased sample should represent a fully random subset of the entire and correctly defined population (needless to say that if the population is defined incorrectly, as in the example above, then the sample is by definition also biased even if no sampling mistakes have been made). That means that every person in the population should have an equal chance of being chosen. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. The latter is very likely in my Dystopia example. People who are too afraid to talk won’t talk. The harsher the rights violations, the more people who will fail to cooperate. So you have a perverse effect that very cruel regimes may score better on human rights surveys that modestly cruel regimes. The latter are cruel, but not cruel enough to scare the hell out of people.

The classic sampling error is from a poll on the 1948 Presidential election in the U.S.

On Election night, the Chicago Tribune printed the headline DEWEY DEFEATS TRUMAN, which turned out to be mistaken. In the morning the grinning President-Elect, Harry S. Truman, was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of a phone survey. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (source)

truman holding the newspaper with the headline dewey defeats truman

(source)

Execution

Another reason why bias in the sampling may¬†occur is the way in which the surveys are executed. If the government of Dystopia allows statisticians to operate on its territory, it will probably not allow them to operate freely, or circumstances may not permit them to operate freely. So the people doing the interviews are not allowed to, or don’t dare to, travel around the country. Hence they themselves deselect entire groups from the survey, distorting the randomness of the sample. Again, the more repressive the regime, the more this happens. With possible adverse effects. The people who can be interviewed are perhaps only those living in urban areas, close to the residence of the statisticians. And those living there may have a relatively large stake in the government, which makes them paint a rosy image of the regime.

More posts in this series.

Standard

23 thoughts on “Lies, Damned Lies, and Statistics (6): Statistical Bias in the Design and Execution of Surveys

  1. Pingback: Moral Dilemma (1): Stopping a Suicide Bomber « P.A.P. Blog – Human Rights Etc.

  2. Pingback: Moral Dilemma (2): The Immortality Pill « P.A.P. Blog – Human Rights Etc.

  3. Pingback: Moral Dilemma (3): Sacrificing Your Son « P.A.P. Blog – Human Rights Etc.

  4. Pingback: Moral Dilemma (4): Unequal Human Beings « P.A.P. Blog – Human Rights Etc.

  5. Pingback: Lies, Damned Lies, and Statistics (27): Jumping to Conclusions, Ctd. « P.A.P. Blog – Human Rights Etc.

  6. Pingback: Moral Dilemma (5): What Does Justice Require? « P.A.P. Blog – Human Rights Etc.

  7. Pingback: Moral Dilemmas (10): The Morality of Targeted Killing of Terrorists « P.A.P. Blog – Human Rights Etc.

  8. Pingback: Lies, Damned Lies, and Statistics (28): Push Polls « P.A.P. Blog – Human Rights Etc.

  9. Pingback: Statistical Jokes (10): Sampling « P.A.P. Blog – Human Rights Etc.

  10. Pingback: Lies, Damned Lies, and Statistics (31): Common Problems in Opinion Polls « P.A.P. Blog – Human Rights Etc.

  11. Pingback: Moral Dilemma (12): Greatest Happiness for the Greatest Number « P.A.P. Blog – Human Rights Etc.

  12. Pingback: Moral Dilemma (11): The Human Rights of Future Generations and the Future Human Rights of Existing Generations « P.A.P. Blog – Human Rights Etc.

  13. Pingback: Moral Dilemma (8): The Plank Of Carneades « P.A.P. Blog – Human Rights Etc.

  14. Pingback: Moral Dilemma (6): Involuntary Organ Donor « P.A.P. Blog – Human Rights Etc.

  15. Pingback: Moral Dilemma (13): The Responsibility of Small Contributions « P.A.P. Blog – Human Rights Etc.

  16. Pingback: Moral Dilemma (14): Lenman’s Dog « P.A.P. Blog – Human Rights Etc.

  17. Pingback: Moral Dilemma (15): Separating Siamese Twins « P.A.P. Blog – Human Rights Etc.

  18. Pingback: Moral Dilemma (16): The Senator and the Bus Ticket | P.a.p.-Blog, Human Rights Etc.

  19. Pingback: Moral Dilemma (7): Saving the Violinist | P.a.p.-Blog, Human Rights Etc.

  20. Pingback: Moral Dilemma (14): Lenman’s Dog | P.a.p.-Blog, Human Rights Etc.

  21. Pingback: Lies, Danmed Lies, and Statistics (37): When Surveyed, People Express Opinions They Don’t Hold | P.a.p.-Blog | Human Rights Etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s