Can You Trust Polling Results?

From the lecture series: Meaning from Data — Statistics Made Clear

By Michael Starbird, Ph.D., The University of Texas at Austin

What’s the one essential ingredient for a trustworthy poll? While there are plenty of pitfalls than can taint results, the key for validity rests in randomness.

Image of politician for polling article

Start With A Sample

When used in statistics, the word population refers to the entirety of the collection of people or things that are of interest. A sample is a subset of the total population.

In general, the goal is to infer information about the whole population from information about the sample. In other words, it’s not in our interest to know only about the people who are asked in the sample. What we’re really interested in is those aspects of the entire population.

This is a transcript from the video series Meaning from Data: Statistics Made Clear. Watch it now, on The Great Courses Plus.

Randomness Is The Key To Good Sampling

Image of dice for random polling

If you choose the sample randomly, the advantage is that using probability you can make inferences about how well the opinions of the sample do, in fact, represent the opinions of the whole population.

On the other hand, if you intentionally choose certain groups to reflect what you believe to be reflective of reality, you may bring your own biases to the selection process, and those biases are then going to be reflected in the people whom you ask. Representative of the whole population means that the sample should have the same characteristics that the whole population does.

The whole concept of choosing the sample randomly is that you have a better chance that the proportion of people in the sample with a certain opinion will be, in fact, the same as the entire population.

Learn more about induction within polling and scientific reasoning

Roosevelt Versus Landon

The most familiar occasion where this comes up is before an election, when pollsters try to find out what proportion of the voters will vote for the Democratic candidate and what proportion will vote for the Republican candidate.

Image of literary digest
The fiasco created by publishing wildly inaccurate polling predictions by The Literary Digest caused the publication to shut down permanently.

There are several major pitfalls in the way sampling can be done. In the 1936 U.S. presidential election, the two primary contenders for the presidency were the incumbent, Franklin Delano Roosevelt, and the Republican opponent, Alfred Landon. At the time, the magazine The Literary Digest had for several elections conducted polls to predict who would win the coming election. They had successfully predicted the outcomes in several elections, so this was a major poll.

In the 1936 election, The Literary Digest sent out 10 million voting surveys, and they received 2.4 million replies. Based on those surveys, The Literary Digest predicted that Landon would win in a landslide, with 370 electoral votes to Roosevelt’s 161.

Well, you may not recall reading about President Landon in your American history books. Obviously he did not win the presidency.

In fact, the only correct aspect of The Literary Digest’s prediction was that the election was a landslide, but unfortunately for them, the landslide was the other way. Roosevelt won the election with 62 percent of the popular vote and by an incredible 523 electoral votes to 8 for Landon.

Learn more about what makes aggregation more effective than any single poll

Obviously, The Literary Digest’s sampling method was not representative of the whole population.

 Pitfall 1: A Wealthy Audience

What went wrong? Well, one thing was that The Literary Digest got their samples from several different kinds of lists. One list was the subscribers to their own magazine. They also looked at car registration records, and that was an available list of a lot of names, and they sent their surveys to those people. They also used telephones.

The people to whom The Literary Digest had sent their survey were likely wealthy people and obviously their opinions were not representative of the population at large.

The people to whom The Literary Digest had sent their survey were likely wealthy people and obviously their opinions were not representative of the population at large.

The year 1936 was in the middle of the Great Depression, and many people were having financial problems and were cutting back on their budgets. Probably one of the first things to go in tight times would be one’s subscription to The Literary Digest. In addition, not many people owned cars or telephones. These were luxury items for a lot of people in 1936. Because of this, the people to whom The Literary Digest had sent their survey were likely wealthy people and obviously their opinions were not representative of the population at large.

Learn more about gathering data from which deductions can be drawn confidently

Pitfall 2: A Voluntary Response

The Literary Digest poll’s second pitfall was that it was a voluntary response survey.

The magazine sent out all these surveys, and only some people replied. The problem with this is that sometimes people who send back replies have a particular bias. Instead of sending back replies in the same proportion, maybe some people with a certain opinion are more apt to reply. The bias that can come from voluntary responses may not just give an answer that’s a little off, but it can give a completely erroneous view of reality.

Because of this story, The Literary Digest, which otherwise would simply be lost in the dustbin of history, will now live on forever in statistics textbooks as a great example of bias in sampling.

The Father of Random Polling

photo of George Horace Gallup, founder of the Gallup polls.
George Horace Gallup, founder of the Gallup polls.

A success that came from this Literary Digest fiasco is the story of George Gallup.

At the time, Gallup was a young statistician just starting out, and he did his own poll for the 1936 election. He took a survey of 50,000 people and made two predictions of his own for the election.

  1. He correctly predicted that Roosevelt would win the election.
  2. He also predicted that The Literary Digest poll would be wrong and estimated how wrong they would be before their poll came out.

He was one of the people who introduced the concept of randomness in political polling as a key feature of sampling techniques. That is absolutely one of the fundamental criteria to look for when you’re evaluating whether a sample survey is, in fact, a good one.

Learn more about sampling; a technique for inferring features of a whole population from information about some of its members

The Gold Standard

Randomness is a basic ingredient of essentially all of the standard statistical techniques, and the reason it’s an ingredient is because the analysis of randomness and probability that allow us to apply mathematics to the understanding of the results that we get.

The most basic way to get an accurate sample is to take a sample that’s called a simple random sample, which is, as the name implies, simply to take the entire population you’re interested in, and say how many people you want to survey and randomly select them from that group, and then get the answer from each member of that selected sample.

Of course, there are lots of problems in getting the answer from that selected sample. But the simple random sample is the gold standard for finding a representative sample.

Common Questions About Polling Results and Statistics

Q: What is a poll in statistics?

Political polling is used both to predict a campaign’s results and to give a candidate or supporters of that candidate a metric by which to measure the candidate’s results. The first poll, called a benchmark, helps a candidate to design a campaign strategy by identifying such factors as the candidate’s overall popularity, the demographics of the people most likely to vote for that candidate, and the issues that matter most to the candidate’s main audience.

Q: What is an exit poll?

An exit poll is conducted after an election. As voters leave the polling station, reporters ask them who they voted for. This is used to predict election results, since the votes can sometimes take a few days to count.

Q: What is polling data?

Polling data is used by a political candidate to gather information about how well he/she is resonating with potential voters at the start of and during a campaign. It provides the candidate with information such as the demographical features of the individuals who would most likely vote for the candidate and allow the candidate to test the popularity of various messages. His/her overall approval rating is also a good indicator of whether or not it is worth staying in the campaign because running a political campaign is very expensive.

Q: Why are push polls used?

Push polls are intended to “push” an issue to the forefront of the voter’s mind. For example, a push poll might ask potential voters to evaluate candidates based on their support of healthcare.

This article was updated on 9/28/2019

Keep Reading
Strategy and Luck Play Roles in 50th Annual World Series of Poker
As Sports Betting Moves Online, Re-Examining Games of Chance
Microsoft to Release Software Kit for Voters to Track Ballots