Did Famous Genetic Scientist Gregor Mendel Fake His Data?

FROM A LECTURE SERIES BY Professor Michael Starbird, Ph.D

Statistical analysis shows the founder of modern genetics may have falsified data in his famous pea experiment in order to better correspond with his expectations.

Image of peas on plateScience proceeds by developing models of what we think the world is like and then looking for data that test those models. How? By comparing experimental results with predictions. In many cases, scientific theory is tested by comparing the theory’s experimental results to its prediction.

One famous example of this: the case of Gregor Mendel’s famous experiments with peas.

 The Monk’s Model

Image of Gregor Mendel, the founder of the science of genetics
Gregor Mendel, the founder of the science of genetics

In the middle of the 19th century, Mendel was a monk conducting many experiments now viewed as seminal with respect to our understanding of heredity: how characteristics of parents are passed on to children. In particular, Mendel had a model in mind where each of the parents had two genes, of which they each contributed one to the offspring plant. Then, those two genes made up the genetic basis for the plant’s actual appearance.

Mendel worked with many different characteristics, one of which was the color of the pod. He believed that he had some plants with two yellow genes. He took these yellow plants, and he crossed them with plants he assumed to have two green genes. (Yellow is the dominant gene, so if you have a plant with a yellow and a green gene, or two yellow genes, the plant will appear yellow.)

When Mendel crossed plants with both yellow genes (homozygous, meaning both genes are the same) with plants that had both green genes, all the offspring had one of the genes from each parent in its genetic makeup. And all of them appeared yellow – because yellow was the dominant gene.

Four Possible Results

The interesting part of the experiment occurs at the next generation.

Suppose you take the plants that resulted from the previous breeding (each of which was heterozygous, meaning they had a yellow gene and a green gene) and combined them to form potential offspring. The theory Mendel proposed predicted that each parent would randomly contribute one or the other gene to the daughter plant.

Image of Gregor Mendel's pea experiment outcomes
Diagram illustrating the results of two generations of cross-breeding two homozygous pea plants, one green and one yellow

As a result, there are four possible things that could happen in this kind of an experiment.

  1. Both parents could contribute yellow genes.
  2. The first parent could contribute the green gene and the second parent the yellow gene.
  3. The first parent could contribute the yellow gene and the second parent the green gene.
  4. Both parents could contribute green genes.

(For the plants with two green genes, the pod would appear green.)

The Experiment Begins

Here’s how these experiments proceeded.

Mendel crossed a bunch of heterozygous plants and looked at the percentage of offspring plants that were yellow and the percentage that were green.

His expectation: One-quarter of the offspring plants were green, and three-quarters of them were yellow. Among the yellow pods, he expected one-third of these to be homozygous (with two yellow genes) and the other two-thirds to be heterozygous (one green and one yellow gene).

Suppose you did an experiment in which there were 200 plants expected in each of these quadrants. You would expect that in doing this experiment many times—which Mendel did—you would have expectations of this kind of an outcome (if this were the size of the experiment).

If you like this article, consider sharing it!

Heterozygous versus Homozygous

But how did Mendel know whether a plant was heterozygous or homozygous?

Mendel took the yellow plants and bred them with themselves 10 times. By breeding them with themselves, of course (if it was homozygous), he would always get a yellow plant.

However, if he had a heterozygous plant, he reasoned the chances were very good that if in 10 breedings one of the self-breedings would contribute both green genes, the plant would come out green. That would be an indication the plant he started with was a heterozygous plant, that it had a green and yellow gene.

Mendel collected a great deal of data, which all supported his theory. In many instances, with data of this size, he found there were 201 plants he had classified as being homozygous with yellow. But the ratios were all very close.

Fact-Checking Mendel

Image of Ronald Fisher, English statistician and biologist
Ronald Fisher, English statistician and biologist

In 1936, Ronald Fisher wrote a paper in which he investigated Mendel’s data. In particular, he noted that Mendel’s data were too good to be true.

Remember: When you’re dealing with a random process, you don’t expect the answers to always be exactly according to expectation. You expect a distribution of the answers.

Most of the time, the answers will be within a certain distance of the expectation. But a certain fraction of the time, you’d expect to have outliers. You’d expect to have rare occurrences.

One of the things Ronald Fisher pointed out was that the number of experiments in which Mendel’s data were very close to expectation was too great to be believed.

Faulty Data

Here’s an example of the reasoning Fisher used.

Suppose you flip a coin 1,000 times. You know that, on average, the mean of the distribution of the flips is going to be 500. But you also know that if you actually flip a coin 1,000 times, often the number of heads will be less than 500 or more than 500.

This is a transcript from the video series Meaning From Data: Statistics Made Clear. It’s available for audio and video download here.

Fisher noticed that Mendel’s data tended to give more outcomes that were within one standard deviation of the mean than would be expected. You expect outcomes of an experiment that involve random chance to lie within one standard deviation of the mean—in a normal distribution, about 68% of the time. But that means that about 32% of the time, you expect the results of that random experiment to have values outside of one standard deviation from the mean.

Yet it turned out that the data reported by Mendel had too high a frequency of being too close to expectation. Fisher argued the data were not properly constructed.

Misclassification and Randomness

Fisher went on to make another claim about the results from Mendel’s data.

The strategy by which Mendel chose to classify a plant as heterozygous was to perform the self-breeding experiment 10 times. There was a chance (a small one), that if you took a heterozygous plant and, by randomness alone, bred it with itself 10 times, every one of those 10 times it would contribute a yellow gene and would be yellow every one of those 10 times.

Image of peas on plate
Fisher’s analysis of Mendel’s data set for green and yellow peas is a famous case study in statistics.

In fact, it’s not a difficult computation to see exactly what proportion of the times that would happen. Namely, when you cross-breed a heterozygous plant, the chances are three out of four that, of the two genes contributed, at least one will be yellow. When you cross a heterozygous plant with itself, there’s a 3/4 chance it will be yellow. If you do it 10 times, there’s a 3/410 chance it will be yellow every single time, which is 5.6%.

So the probability of misclassifying a heterozygous plant is 5.6%. What that means is that Mendel could have misclassified 22 plants as homozygous that actually were heterozygous.

So, the actual expectation from the experiment should have been 222.5, not 200, which was the actual expected outcome for the plants that are really homozygous.

Too Good to Be True

The effect of this is when Mendel reported an experiment—and this is a specific example of one experiment of many—in which the reported number, 201, is very close to the quasi-expectation of 200. But you see that it’s rather distant from what should have been expected, including the falsely classified heterozygous plants that should have been falsely classified as homozygous.

One of the things that Ronald Fisher pointed out was that the number of experiments in which Mendel’s data were very close to expectation was too great to be believed.

This is the kind of reasoning Fisher used in his article to show that the data Mendel got were too good. In fact, in this case, Mendel wasn’t subtle enough to realize he should have been expecting 222.5 instead of 200 in that box.

So, Mendel’s data came out more like the 200 than what he really should have found.

From the lecture series Meaning from Data: Statistics Made Clear
Taught by Professor Michael Starboard Ph.D.
Photo of Gregor Mendel courtesy of Hugo Iltis – Wellcome Library, London

2 Comments

  1. Oh Please give poor Mendel the benefit of the doubt. He had assistants who, at best, were both marginally trained and weakly motivated. Remember Mendel’s genetics had “adverse” implications for current religious beliefs. However, sitting around watching pea plants grow and sorting the peas into colors sure beat working in the grape fields for the assistants. “Just pick any pile Charlie, Gregor will never know”.

  2. EVERY scientist is human and a product of their times. So Mendel was a bit wrong, but science continuously corrects itself.

Comments are closed.