A prominent Georgia college has developed a COVID risk assessment tool. Georgia Tech’s “COVID-19 Event Risk Assessment Planning Tool” lets users browse the nation by county to determine the likelihood of coming in contact with an infected person. How do we know sample data is representative?
According to its website, Georgia Tech’s risk assessment tool uses “real-time U.S. and state-level estimates” for its data. “This map shows the risk level of attending an event, given the event size and location (assuming 10:1 ascertainment bias),” the website said. “The risk level is the estimated chance (0-100%) that at least one COVID-19 positive individual will be present at an event in a county, given the size of the event.”
Event size is determined with a sliding bar on the left side of the map, ranging from 10 people to 10,000 people.
Making sense of data and statistics can be a difficult task. And, how do we know that statistics accurately represent a larger number?
The Height of Randomness
It sounds paradoxical that attaining specific meaning can come from randomness, but it’s actually quite logical.
“Randomness is a central idea of statistical inference and the goal of statistical inference is to answer two questions: How close? and How confident?” said Dr. Michael Starbird, Professor of Mathematics and University Distinguished Professor at The University of Texas at Austin. “The shape, the center, and the spread of a sample of a small number of elements from the population [on a visual graph]—how close are they to the actual population’s shape, center, and spread? And how confident are we that we are that close?”
In determining the average height of a person, for example, Dr. Starbird said that there’s no way to get everyone’s height and average them all out. There are too many people, deaths, growing children, and so on. Instead, picking someone from a crowd is a better place to start. There are outliers—”It could happen that we choose some NBA basketball player who’s seven feet tall,” he said—but that’s unlikely, partially because they are the extremes.
To increase the likelihood of finding accurate data for the average height of a person, it’s even better to widen your scope and measure several people. The randomness of them means that the larger the sample size, the clearer the picture. In other words, measuring one person at random may result in someone who’s four feet, six inches tall or someone who’s seven feet tall, but measuring both and averaging them results in a height of five feet, nine inches—a much more reasonable average, if not 100% accurate.
“They would tend to cancel each other out by randomness,” Dr. Starbird said.
One manner in which we consider probability is in the American criminal justice system, Dr. Starbird said. We, as jurors, go into a courtroom with the belief that the system is generally fair and just and that a defendant isn’t necessarily guilty of a crime.
“Then, evidence is presented, and as part of the evidence, the witness may actually say, ‘It looked like the person who was in the neighborhood of the crime,'” Dr. Starbird said. “[Or] ‘This person was driving a car that looked like the car that was seen to be fleeing.’ Evidence is deduced about the individual on trial. If the jurors feel that the evidence is unlikely, given the innocence of the person, then the jury finds that person guilty.”
To put Dr. Starbird’s analogy simply, a juror may not assume a defendant’s guilt in a crime, but as more and more evidence points to the defendant, jurors must decide how likely it is that another person, matching the defendant’s description and driving a similar car, was in the area of the crime at the same time.
By sampling random data and using a bit of logic and reasoning, sample data can help guide us in the right direction to determine everything from the likeliness of a medicine’s side effects, a court defendant’s fate, and the probability of a COVID-infected person attending an event of any size in any county in the country.
Dr. Michael Starbird contributed to this article. Dr. Starbird is Professor of Mathematics and University Distinguished Teaching Professor at The University of Texas at Austin. He received his BA from Pomona College in 1970 and his PhD in Mathematics from the University of Wisconsin–Madison in 1974.