Statistical analysis shows the founder of modern genetics may have falsified data in his famous pea experiment in order to better correspond with his expectations.
Science proceeds by developing models of what we think the world is like and then looking for data that test those models. How? By comparing experimental results with predictions. In many cases, scientific theory is tested by comparing the theory’s experimental results to its prediction.
One famous example of this: the case of Gregor Mendel’s famous experiments with peas.
The Monk’s Model
In the middle of the 19th century, Mendel was a monk conducting many experiments now viewed as seminal with respect to our understanding of heredity: how characteristics of parents are passed on to children. In particular, Mendel had a model in mind where each of the parents had two genes, of which they each contributed one to the offspring plant. Then, those two genes made up the genetic basis for the plant’s actual appearance.
Mendel worked with many different characteristics, one of which was the color of the pod. He believed that he had some plants with two yellow genes. He took these yellow plants, and he crossed them with plants he assumed to have two green genes. (Yellow is the dominant gene, so if you have a plant with a yellow and a green gene, or two yellow genes, the plant will appear yellow.)
When Mendel crossed plants with both yellow genes (homozygous, meaning both genes are the same) with plants that had both green genes, all the offspring had one of the genes from each parent in its genetic makeup. And all of them appeared yellow – because yellow was the dominant gene.
Four Possible Results
The interesting part of the experiment occurs at the next generation.
Suppose you take the plants that resulted from the previous breeding (each of which was heterozygous, meaning they had a yellow gene and a green gene) and combined them to form potential offspring. The theory Mendel proposed predicted that each parent would randomly contribute one or the other gene to the daughter plant.
As a result, there are four possible things that could happen in this kind of an experiment.
- Both parents could contribute yellow genes.
- The first parent could contribute the green gene and the second parent the yellow gene.
- The first parent could contribute the yellow gene and the second parent the green gene.
- Both parents could contribute green genes.
(For the plants with two green genes, the pod would appear green.)
The Experiment Begins
Here’s how these experiments proceeded.
Mendel crossed a bunch of heterozygous plants and looked at the percentage of offspring plants that were yellow and the percentage that were green.
His expectation: One-quarter of the offspring plants were green, and three-quarters of them were yellow. Among the yellow pods, he expected one-third of these to be homozygous (with two yellow genes) and the other two-thirds to be heterozygous (one green and one yellow gene).
Suppose you did an experiment in which there were 200 plants expected in each of these quadrants. You would expect that in doing this experiment many times—which Mendel did—you would have expectations of this kind of an outcome (if this were the size of the experiment).
If you like this article, consider sharing it!
Heterozygous versus Homozygous
But how did Mendel know whether a plant was heterozygous or homozygous?
Mendel took the yellow plants and bred them with themselves 10 times. By breeding them with themselves, of course (if it was homozygous), he would always get a yellow plant.
However, if he had a heterozygous plant, he reasoned the chances were very good that if in 10 breedings one of the self-breedings would contribute both green genes, the plant would come out green. That would be an indication the plant he started with was a heterozygous plant, that it had a green and yellow gene.
Mendel collected a great deal of data, which all supported his theory. In many instances, with data of this size, he found there were 201 plants he had classified as being homozygous with yellow. But the ratios were all very close.
In 1936, Ronald Fisher wrote a paper in which he investigated Mendel’s data. In particular, he noted that Mendel’s data were too good to be true.
Remember: When you’re dealing with a random process, you don’t expect the answers to always be exactly according to expectation. You expect a distribution of the answers.
Most of the time, the answers will be within a certain distance of the expectation. But a certain fraction of the time, you’d expect to have outliers. You’d expect to have rare occurrences.
One of the things Ronald Fisher pointed out was that the number of experiments in which Mendel’s data were very close to expectation was too great to be believed.
Here’s an example of the reasoning Fisher used.
Suppose you flip a coin 1,000 times. You know that, on average, the mean of the distribution of the flips is going to be 500. But you also know that if you actually flip a coin 1,000 times, often the number of heads will be less than 500 or more than 500.
Fisher noticed that Mendel’s data tended to give more outcomes that were within one standard deviation of the mean than would be expected. You expect outcomes of an experiment that involve random chance to lie within one standard deviation of the mean—in a normal distribution, about 68% of the time. But that means that about 32% of the time, you expect the results of that random experiment to have values outside of one standard deviation from the mean.
Yet it turned out that the data reported by Mendel had too high a frequency of being too close to expectation. Fisher argued the data were not properly constructed.
Misclassification and Randomness
Fisher went on to make another claim about the results from Mendel’s data.
The strategy by which Mendel chose to classify a plant as heterozygous was to perform the self-breeding experiment 10 times. There was a chance (a small one), that if you took a heterozygous plant and, by randomness alone, bred it with itself 10 times, every one of those 10 times it would contribute a yellow gene and would be yellow every one of those 10 times.
In fact, it’s not a difficult computation to see exactly what proportion of the times that would happen. Namely, when you cross-breed a heterozygous plant, the chances are three out of four that, of the two genes contributed, at least one will be yellow. When you cross a heterozygous plant with itself, there’s a 3/4 chance it will be yellow. If you do it 10 times, there’s a 3/410 chance it will be yellow every single time, which is 5.6%.
So the probability of misclassifying a heterozygous plant is 5.6%. What that means is that Mendel could have misclassified 22 plants as homozygous that actually were heterozygous.
So, the actual expectation from the experiment should have been 222.5, not 200, which was the actual expected outcome for the plants that are really homozygous.
Too Good to Be True
The effect of this is when Mendel reported an experiment—and this is a specific example of one experiment of many—in which the reported number, 201, is very close to the quasi-expectation of 200. But you see that it’s rather distant from what should have been expected, including the falsely classified heterozygous plants that should have been falsely classified as homozygous.
One of the things that Ronald Fisher pointed out was that the number of experiments in which Mendel’s data were very close to expectation was too great to be believed.
This is the kind of reasoning Fisher used in his article to show that the data Mendel got were too good. In fact, in this case, Mendel wasn’t subtle enough to realize he should have been expecting 222.5 instead of 200 in that box.
So, Mendel’s data came out more like the 200 than what he really should have found.