Statistics and Averages: How Numbers Can Mislead

FROM THE LECTURE SERIES: UNDERSTANDING THE MISCONCEPTIONS OF SCIENCE

By Don Lincoln, Ph.D., University of Notre Dame

Mark Twain once wrote: “There are three kinds of lies: lies, damned lies, and statistics.” But what is it that makes it so easy to misuse or misunderstand statistics?

Image of a graphs and pie charts to represent the concept of statistics.
Statistics can be difficult to understand sometimes, and at other times, people can use statistics to lie. Everything depends on how you look at the figures. (Image: tadamichi/Shutterstock)

What Really Is the ‘Average’?

Statistics can be pretty tricky. Doing a statistical evaluation of a set of data can be a complex process, but the most common thing that is reported is the average. Most people use the word ‘average’ to talk about figures, without thinking about what that means. People don’t understand that averages can be extremely misleading.

Suppose the owner of a factory tells a job applicant that the starting salary is low, about $20,000 a year. “But,” says the owner, “don’t worry. The average person here makes nearly $63,000 a year.”

That number sounds a lot better, so the applicant takes the job in hopes of quickly working their way up to a better salary. Did the applicant do a sensible thing?

It turns out that the factory employs 100 workers making $20,000 a year, 20 floor supervisors making $30,000 a year, and 2 shift supervisors making $60,000 a year. If you take the average of those employees, the average salary is $22,000 a year, not $63,000. So, how did the owner arrive at that figure when talking to the future employee?

Well, what the owner didn’t say was that he included his own $5 million pay in the equation. If you add the owner’s salary, and then divide by the total number of people, including the owner, then you get the $63,000 average. No one who actually works for the owner makes this much money! Thus, the employee was scammed by the employer.

This is a transcript from the video series Understanding the Misconceptions of Science. Watch it now, on The Great Courses Plus.

Mean, Median, and Mode

People think that ‘average’ means something like ‘normal’ or ‘most common’. But usually, it is what mathematicians call the ‘arithmetic mean’. In the case of the salary example, this simply means the combined salary of everyone at the factory divided by the total number of people at the factory.

Graph showing the median in a data set.
The median is a type of average; in a data set, it is the number where half the data is above the average and the other half is below average. (Image: VectorMine/Shutterstock)

There are other definitions of what can be called an average. There is the median, which is the number where half the people are above the average and half are below. We can modify our earlier example of the factory to illustrate this.

Suppose that the total number of people employed in the firm, including the employer, shift supervisors, and floor supervisors, remains the same. Now let’s assume that there were two kinds of factory workers employed. Let’s further assume that 61 of them make $20,000 a year and 39 make $22,000 a year. If you include the supervisors, then there would be 122 employees. Now, this group of 122 can be divided into 2 groups of 61 people. Since there are 61 people in the company who make $20,000 a year, this means that the median salary would be somewhere between $20,000 and $22,000.

Apart from the median, there is another type of average, called the ‘mode’. The mode is simply the category with the most people in it. In the above example case, the biggest group of people is the group of people making $20,000 a year. This would mean that the mode is also $20,000 a year.

The bottom line is that the word ‘average’ doesn’t always mean ‘normal’. The manner in which the average is calculated can affect the end result. A person looking at statistical data needs to know what the underlying data looks like to really understand it. A skewed distribution, which means distributions with a set of figures–or even just one figure–much larger or much smaller than most of the other figures, is not the only way to get weird results.

Learn more about why all numbers are interesting.

The Problem of the 130-pound Baby!

Suppose that members of America’s best professional football team ended up married to the team’s cheerleaders. Now, suppose all of the wives got pregnant together and even gave birth on the same day. After a month or so to recover, the women decided to have a night out and leave the children with the dads.

A toy seesaw with one end heavier than the other.
The average of a heavy weight and a light weight won’t tell you the truth that the weights are so different. (Image: Vladimir Prusakov/Shutterstock)

Let’s calculate the average, by which we mean the arithmetic mean, weight for both groups. Let’s assume the babies weigh about 10 pounds each. Let’s suppose the football player dads weigh about 250 pounds, and the women all weigh about 130 pounds. If we calculate the average weight of the women, then the answer is 130 pounds.

But what is the average weight of the people who are staying at home? Well, it’s just the total weight, which is 10 (the baby’s weight) plus 250 (the dad’s weight), which equals 260. This 260 divided by 2 gives you an average weight of 130 pounds! But we know that the people at home are not a 130-pound baby and a 130-pound man!

So, averages can be very misleading. You need to know the distributions of the figures: how many units, and how much each unit measures. If you don’t know how the word ‘average’ is being used in a statistical analysis, you’re liable to be lied to.

Learn more about Simpson’s paradox and similar errors of statistical reasoning.

Common Questions about Statistics and Average

Q. Why is the term ‘average’ misunderstood by most people?

Most people think that ‘average’ means ‘normal’, or ‘most common’, but it actually means the mathematical mean.

Q. What is the median?

The median is the number where half the data is greater than it, and half the data is lesser than it.

Q. How is the mode calculated?

The mode is simply the group with the most number of items. For example, in a class of 30, if 17 students get the same score, that will be the mode.

Q. What kind of data affects the average?

Outliers in a group, who are much above or much below the other data, affect the average the most.

Keep Reading
How Gathering Data can Reduce Uncertainty
Did Famous Genetic Scientist Gregor Mendel Fake His Data?
Is Little Data the Next Big Data?