There is an inspiring article that highlights thinking outside of the box when it comes to solving problems. The article compares a gifted class of middle school students with a remedial sort of vocational class, the article was measuring the creativity of the students. The experiment asked the students in each class one question: *How do you weigh a giraffe?*

The students in the gifted class were not so much gifted as successful, and they were used to succeeding and used to pleasing their teacher. They panicked because they didn’t know how to answer this question. This was way before the Internet, and the students couldn’t go online and look it up.

Meanwhile, in the vocational class, almost immediately some kid just blurted out and said, “Hey, I know what to do. Just take a chainsaw, and chainsaw that giraffe into chunks. Then weigh the chunks.”

Chainsawing the giraffe is an attitude that a good problem solver should have because you want to be fun, and you also want to be a little bit bad. Breaking rules is a good thing when we’re not talking about actual cruelty to animals here. We’re just talking about thinking outside the box or breaking mathematical rules.

Learn more about looking at solving problems in a whole new way with The Art and Craft of Mathematical Problem Solving

To break the mathematical rules, we need a healthy dose of the 3 C’s.

Concentration, creativity, and confidence are psychological attributes that are important for just about everything, but they’re vital for solving problems. How do we enhance them? All three of them are linked, but confidence is the least important of the three because it’s truly derived from the other two.

As your concentration ability increases, and if your creativity gets stronger, then you’ll naturally become more confident.

To master concentration, you must set aside a quiet time and place for your work. You need to relax. You need to develop good work habits, and you need to find problems to concentrate on that are interesting to you, approachable by you, and addictive. Pretty much, that will do the trick. In order to build up your concentration, you want to build up from level 1, which is a minute or so of concentration, at least to level 3, getting up to an hour.

Collect a stock of back-burner problems. Start cultivating problems that you cannot solve. Make sure they’re interesting and then you’ll think about them. If you can find a problem that’s exciting to you, that annoys you, that sort of gnaws at you, then you’ll think about it. Interesting problems will force you to become a better concentrator.

*To master concentration, you must set aside a quiet time and place for your work.*

Click To Tweet

Concentration leads to confidence, which frees you to explore, which facilitates investigation and creativity.

To build both your confidence and creativity, you need to be disciplined about using those interesting problems. You need to set up a problem-solving routine, some workplace, a lucky pen, and then you should keep to your routine to get your mind in a relaxed state.

Then, occasionally, deliberately break your routine. If you like to work in the morning, work late at night. If you like quiet, go to a noisy café. If you like to work in a restaurant, go sit in a library, etc.

You should also, as a strategy, specifically think about peripheral vision. The peripheral vision strategy is to realize that many problems cannot be solved with direct focus. It’s just like your eyes. Your fovea has very good focus, but it has less sensitivity than the perimeter of your eyes, the periphery of your vision.

*Peripheral vision strategy—many problems cannot be solved with direct focus. *

Click To Tweet

Many problems need to percolate in your unconscious in this way. You need to cultivate a good supply of back burners, and just get in the habit of not solving problems. The more you do this, the more you’ll get into a state of investigative, purposeful contemplation, and the more powerful your mind will get.

Let’s look at a tool made famous by Carl Gauss. He was a prodigy, and as a teenager, he solved a problem that had been unsolved since Hellenistic times. He found a way to construct a regular heptadecagon, 17-gon, using compass and straightedge. The rest of his career was not much different. What Gauss could do in an afternoon was equivalent to what an ordinary mathematician could do in a lifetime.

When he was 10, he was faced with the problem: How do you find the sum of the numbers 1 + 2 + 3 up to 100? How do you compute this in 1787 when there are no calculators? Well, what little Gauss did was to pair the terms, the beginning term and the end term, (1 + 100); and then the second term and the next to last term, (2 + 99); and then (3 + 98); (4 + 97); and so on down to (50 + 51). Each of those pairs adds up to 101, and there are 50 such pairs. Thus, the sum is 5050, and that’s pretty clever. This is called Gaussian pairing and is an example of a powerful and useful tool.

Wishful thinking is one of your first strategies for this because pretending to solve a problem, even an easier one, keeps you happy. It allows you to keep thinking about solving problems. Even delusion helps – deluding yourself into thinking that you’ve solved a problem actually allows you to solve it later because you can relax and be happy. Making yourself happy and confident, even if it’s through such a transparent thing as delusion, is fine.

A corollary of wishful thinking is a very sensible idea, which I just call the “make it easier” strategy. The idea is completely common sense. If your problem is too hard, just make it easier by removing the hard part. Either make the size smaller or remove an element that makes it hard. For example, if it involves square roots, remove them temporarily.

What you should keep in mind is strategy and tactic is what makes someone a good problem solver, not the tools. Now, if you’ve never seen the Gaussian pairing tool, which Gauss used to sum the numbers from 1 to a 100, you are undoubtedly impressed. Gaussian pairing are quite clever, but they are tools and tools are just tricks. These are things that can be acquired. What you should keep in mind is that strategy and tactic are what makes someone a good problem solver, not the tools.

And you should use these new ideas. Any time you see a new, interesting idea, learn it, use it, and make it yours. Ideas are collective human property. They are not private property. Don’t forget that what you’re doing is chainsawing the giraffe. It’s okay to mess around and break some rules.

Learn more: The Problem Solver’s Mind-Set

Let’s look at a quickie: a problem that requires “think outside the box” thinking.

If you consider the problem of nine dots in a grid and ask how do you join them all by drawing no more than four absolutely straight lines? If you think outside the box , as demonstrated in the diagram, it’s pretty obvious what to do.

As long as you go outside the box, you’re able to get all 9 dots. It’s a fun and challenging problem if you’ve never seen it before.

*Thinking outside the box helps you become a good problem solver. *

Click To Tweet

People may be endowed unequally with confidence, creativity, and power of concentration, but all of these are trainable skills. It’s possible to practice them and improve them, but in order to do so, you will need to see lots of creativity in action and you need lots of open-ended opportunity to experiment.

Taught by Professor Paul Zeitz, Ph.D.

In this full lecture, discover methods that teach you to visualize numbers in a whole new light.

Taught by Professor James Tanton, Ph.D.

Let’s say you buy a lottery ticket; what are the chances that you’re going to be rich for the rest of your life?

You walk across a golf course in a stormy day; what are the chances you’ll be hit by lightning?

What are the chances that your investments will allow you to live happily for the rest of your days?

You have a fever; you have a cough. What are the chances that it’s a serious disease rather than something trivial?

All these are real-life examples of situations where we’re confronted with possibilities whose outcomes we do not know. In fact, I would argue that many or most parts of our lives—and the world and trying to understand the world—involve situations where we don’t know what’s going to happen. They involve the uncertain and the unknown.

It would be nice to say, “Well, our challenge in life is to get rid of uncertainty and be in complete control of everything.” That is not going to happen. One of life’s real challenges is to deal with the uncertain and the unknown in some sort of an effective way; and that is the realm of probability.

Probability accomplishes the really amazing feat of giving a meaningful numerical description of things that we admit we do not know, of the uncertain and the unknown. It gives us information that we actually can act on.

For example, when we hear there’s an 80% chance of rain, what do we do? We take an umbrella. Of course, if it doesn’t rain, we say, “Well, there was a 20% chance it wouldn’t rain. That’s okay.” If it rains, we say, “Oh, yes, the prediction was right. There was an 80% chance of rain.”

Probability is a rather subtle kind of a concept because it can come out one way or the other, and still a probabilistic prediction can be viewed as correct—but decisions made on probability have all sorts of ramifications.

When we make medical decisions, for example, we are making decisions that are based on probabilities, and yet they have extremely serious consequences, including life and death consequences.

In the case of the rain, all we risk is getting wet. But in many areas of making decisions on the basis of probability, there are very serious consequences. When we make medical decisions, for example, we are making decisions that are based on probabilities, and yet they have extremely serious consequences, including life and death consequences.

Back before probability was viewed as commonplace as it is today— between 1750 and 1770 in Paris, there was a smallpox epidemic for which a vaccine was developed. Unfortunately, the inoculations were rather risky. They reckoned that there was a 1 in 200 chance of death from taking the inoculation, but on the other hand, there was a 1 in 7 chance of dying eventually from the disease. So making that kind of decision is a very dramatic question where we’re weighing probabilities.

If you took that inoculation and you died immediately from smallpox, did you make the right decision or not? Well, of course, you don’t want to be among the 1 in 200 that died from the inoculation. On the other hand, on the basis of probability, it was the right decision. There are many controversies about this kind of thing and in today’s world with lawsuits and all this would be a very serious kind of an issue to undertake.

Well, in many arenas of life, our understanding of the world comes down to understanding processes and outcomes that are probabilistic in nature, that really come about from random chance, that things are happening by randomness alone. Over the last couple of centuries, the scientific descriptions of our world increasingly have included probabilistic components in them.

Many aspects of physics all involve questions of probability. Things we imagine—molecules causing things to happen by the aggregate force of probabilistic occurrences—quantum mechanics — thermodynamics. At the very foundations of our knowledge of these studies is the theory of probability.

Biology, genetics, and evolution are all based very centrally on random behavior, as well. In fact, in all of these areas, the goal is to make definite, predictable, measurable statements about what’s going to happen that are the result of describing random behavior.

The description of random behavior is how we, as scientists and mathematicians, define the world. This is a major paradigm shift in the way science has worked for the last 150 years. As time goes on, there continues to be an increase in the role of probability and randomness at the center of scientific descriptions.

Probability gives us a specific statement about what to expect when things happen at random. But how can it be effective when, by definition, random outcomes of one trial or one experiment are completely unknown? Well, if you repeat those trials many, many times and look at them in the aggregate, that’s when you begin to see glimses of regularity. It’s the job of probability to put a meaningful numerical value on the things that we admit we don’t know.

Taught by Professor Michael Starbird, Ph.D.

When used in statistics, the word population refers to the entirety of the collection of people or things that are of interest. A sample is a subset of the total population.

In general, the goal is to infer information about the whole population from information about the sample. In other words, it’s not in our interest to know only about the people who are asked in the sample. What we’re really interested in is those aspects of the entire population.

Learn More: Induction in Polls and Science

If you choose the sample randomly, the advantage is that using probability you can make inferences about how well the opinions of the sample do, in fact, represent the opinions of the whole population.

On the other hand, if you intentionally choose certain groups to reflect what you believe to be reflective of reality, you may bring your own biases to the selection process, and those biases are then going to be reflected in the people whom you ask. Representative of the whole population means that the sample should have the same characteristics that the whole population does.

The whole concept of choosing the sample randomly is that you have a better chance that the proportion of people in the sample with a certain opinion will be, in fact, the same as the entire population.

The most familiar occasion where this comes up is before an election, when pollsters try to find out what proportion of the voters will vote for the Democratic candidate and what proportion will vote for the Republican candidate.

There are several major pitfalls in the way sampling can be done. In the 1936 U.S. presidential election, the two primary contenders for the presidency were the incumbent, Franklin Delano Roosevelt, and the Republican opponent, Alfred Landon. At the time, the magazine *The* *Literary Digest* had for several elections conducted polls to predict who would win the coming election. They had successfully predicted the outcomes in several elections, so this was a major poll.

Learn More: Political Polls—How Weighted Averaging Wins

In the 1936 election, *The Literary Digest* sent out 10 million voting surveys, and they received 2.4 million replies. Based on those surveys, *The* *Literary Digest* predicted that Landon would win in a landslide, with 370 electoral votes to Roosevelt’s 161.

Well, you may not recall reading about President Landon in your American history books. Obviously he did not win the presidency.

In fact, the only correct aspect of *The* *Literary Digest*’s prediction was that the election was a landslide, but unfortunately for them, the landslide was the other way. Roosevelt won the election with 62 percent of the popular vote and by an incredible 523 electoral votes to 8 for Landon.

Obviously, *The* *Literary Digest*’s sampling method was not representative of the whole population.

What went wrong? Well, one thing was that *The* *Literary Digest* got their samples from several different kinds of lists. One list was the subscribers to their own magazine. They also looked at car registration records, and that was an available list of a lot of names, and they sent their surveys to those people. They also used telephones.

The people to whom

TheLiterary Digesthad sent their survey were likely wealthy people and obviously their opinions were not representative of the population at large.

The people to whom *The* *Literary Digest* had sent their survey were likely wealthy people and obviously their opinions were not representative of the population at large.

The year 1936 was in the middle of the Great Depression, and many people were having financial problems and were cutting back on their budgets. Probably one of the first things to go in tight times would be one’s subscription to *The* *Literary Digest*. In addition, not many people owned cars or telephones. These were luxury items for a lot of people in 1936. Because of this, the people to whom *The* *Literary Digest* had sent their survey were likely wealthy people and obviously their opinions were not representative of the population at large.

*The* *Literary Digest* poll’s second pitfall was that it was a voluntary response survey.

This is a transcript from the video series

Meaning from Data: Statistics Made Clear. It’s available for audio and video download here.

Because of this story, *The* *Literary Digest*, which otherwise would simply be lost in the dustbin of history, will now live on forever in statistics textbooks as a great example of bias in sampling.

A success that came from this *Literary Digest* fiasco is the story of George Gallup.

At the time, Gallup was a young statistician just starting out, and he did his own poll for the 1936 election. He took a survey of 50,000 people and made two predictions of his own for the election.

- He correctly predicted that Roosevelt would win the election.
- He also predicted that
*The**Literary Digest*poll would be wrong and estimated how wrong they would be before their poll came out.

He was one of the people who introduced the concept of randomness in political polling as a key feature of sampling techniques. That is absolutely one of the fundamental criteria to look for when you’re evaluating whether a sample survey is, in fact, a good one.

Learn More: Samples—The Few, The Chosen

Randomness is a basic ingredient of essentially all of the standard statistical techniques, and the reason it’s an ingredient is because the analysis of randomness and probability that allow us to apply mathematics to the understanding of the results that we get.

The most basic way to get an accurate sample is to take a sample that’s called a simple random sample, which is, as the name implies, simply to take the entire population you’re interested in, and say how many people you want to survey and randomly select them from that group, and then get the answer from each member of that selected sample.

Of course, there are lots of problems in getting the answer from that selected sample. But the simple random sample is the gold standard for finding a representative sample.

Which would you say is bigger: the complete works of Shakespeare or an ordinary DVD? The complete works of Shakespeare fit in a big book, of roughly 10 million bytes. But any DVD, or any digital camera, for that matter, will hold upwards of four gigabytes, which is 4 billion bytes. A DVD is 400 times bigger. All the printed words in the Library of Congress would be 10 trillion bytes, 10 terabytes. That’s one very large wall full of DVDs, but it’s also about the size of a single high-end personal hard drive. That is, you might carry all the books in the Library of Congress on a single device the size of just one book.

Big Data: How Data Analytics Is Transforming the World – Stream it now on The Great Courses Plus

And data is not merely being stored: We access a lot of data over and over. Google alone returns to the web each day, to process another 20 petabytes. What’s that? It’s 20,000 terabytes, 20 million gigabytes, 20 quadrillion bytes. How big do you want to go? Google’s daily processing gets us to one exabyte every 50 days. And 250 days of Google processing may be equivalent to all the words ever spoken by humankind to date, which have been estimated at five exabytes. And nearly one thousand times bigger is the entire content of the World Wide Web, estimated at upwards of one zettabyte, which is 1 trillion gigabytes. That’s 100 million times larger than the Library of Congress. Of course, there is a great deal more that is not on the web.

But let’s turn to the velocity of data. Let’s start a clock, to see what this feels like. Not only is there a lot of data, it’s coming at very high rates. High-speed Internet connections offer speeds 1,000 times faster than dial-up modems connected by ordinary phone lines. Here are some things that are happening every minute of the day. YouTube users upload 72 hours of new video content. In the United States alone, there are 100,000 credit card transactions. Google receives over 2 million search queries. And 200 million email messages are sent. It can be hard to wrap one’s mind around such numbers. How much data is being generated? Let’s turn to Facebook. In only 15 minutes, the amount of photos uploaded to Facebook is greater than the number of photographs stored in the New York public photo archives. That’s every 15 minutes! Now think about the data over a day, a week, or a month.

The cost of a gigabyte in the 1980s was about a million dollars. So, a smartphone with 16 gigabytes of memory would be a $16 million device

Finally, there is variety. One reason for this can stem from the need to look at historical data. But data today may be more complete than data of yesterday. The cost of a gigabyte in the 1980s was about a million dollars. So, a smartphone with 16 gigabytes of memory would be a $16 million device. Today, someone might comment that 16 gigabytes really isn’t that much memory. This is why yesterday’s data may not have been stored or have been stored in a suitable format compared to what can be stored today. Now, consider satellite imagery. The images come in large variety of aspect ratios. While I know that a satellite image will contain pixels, I don’t necessarily know what is in the picture, or not in the picture. I don’t necessarily know where to look. I may not even know what to look for.

So, we stand in a data deluge that is showering large **volumes** of data at high **velocities** with a lot of **variety**. With all this data comes information and with that information comes the potential for innovation. Steve Jobs, charismatic co-founder of Apple, was diagnosed with a pancreatic cancer in 2003. He became one of the first people in the world to have his entire DNA sequenced, as well as that of his tumor. It cost him a six-figure sum but now he had his entire DNA. Why? When doctors pick medication, they hope the patient’s DNA is sufficiently similar to the patient in the drug trial. Steve Jobs’s doctors knew his genetic makeup and could carefully pick treatments. When one treatment became ineffective, they could move to another. While Jobs eventual died from his illness, having all the data and all that information added years to his life.

Human beings tend to distribute information through what is called a transactive memory system, and we used to do this by asking each other

We all have immense amounts of data available to us every day. Search engines almost instantly return information on what can seem like a boundless array of topics. For millennia, humans have relied on each other to recall information. The Internet is changing that and how we perceive and recall details in the world. Human beings tend to distribute information through what is called a transactive memory system, and we used to do this by asking each other. Now, we also have lots of transactions with smartphones and other computers. They can even talk to us. In a study covered in *Scientific American*, Daniel Wegner and Adrian Ward discuss how the Internet can deliver information quicker than our own memories can. Have you tried to remember something and meanwhile a friend types it into a smartphone, gets the answer, and if it is a place already has directions? In a sense, the Internet is an external hard drive for our memories.

So, we have a lot of data, with more coming. We aren’t just interested in the data; we are looking at data analysis, and we want to learn something valuable we didn’t already know. For example, UPS must decide on a delivery route for packages to save time and gas. Consider 20 drop-off points; which route is the best? Seems simple enough, but checking all possible routes isn’t that easy. You have 20 choices for the first stop, 19 for the second, and so forth. In all, there are about 2 times 10 to the 18^{th} power. How big is that number? That’s five times the estimated age of the universe. Clearly, we aren’t checking that number of combinations on a computer each time a driver needs a route. Keep in mind, that’s only 20 stops.

UPS has about 55,000 drivers every day. Until recently, UPS drivers had a general route to follow. It allowed for decisions on the part of the driver. UPS now has a program called ORION, or On-Road Integrated Optimization and Navigation to help. It uses math to decide on routes. They can be counterintuitive but save time in the end. It doesn’t find* the* best route, but a lot of research has been done to find good solutions to this problem. Keep in mind, UPS has a harder problem than simply finding a route to save time. They also must consider other variables like promised delivery times. How much can this save? Consider these two numbers. Thirty million dollars: that’s the cost to UPS per year if each driver drives just one more mile each day than necessary. Eighty-five million: the number of miles the analytics tools of UPS are saving per year. Data analysis doesn’t always involve exploring a data set that is given. Sometimes, questions arise and data hasn’t even been gathered. Then, the key is knowing what question to ask, and what data to collect.

As an example, let’s join Oren Etzioni on a flight from Seattle to Los Angeles for his younger brother’s wedding. Wanting to save money, Oren bought his ticket months before the “I dos” were said. During the flight, Oren asked neighboring passengers about their ticket price. Most had paid less, even though many had bought their tickets later. For some of us, this might simply tell us not to worry so much about choosing close to the date of a flight. But Oren was Harvard’s first undergraduate to major in computer science. He graduated in 1986. To him, this was a problem for a computer to solve. He’d seen the world this way before. He helped build MetaCrawler, which was one of the first search engines. InfoSpace bought it. He made a comparison-shopping website, also snatched up. Another startup was bought by Reuters.

So, Oren gave 12,000 price observations grabbed by his computer programs from a travel website over 41 days. He ended up with something that could save customers money, and not just by comparing current prices. It didn’t know why airlines were pricing the way they did, but it could help predict whether fares were more likely to go up or down in the near future. When it became a venture capital-backed startup called Farecast, it began crunching 200 billion flight-price records. Then? Microsoft bought it in 2008, for $110 million, and integrated it into the Bing search engine. What made it possible to predict future fares? Data—lots of it. How big and what’s big enough depends, in part, on what you are asking and how much data you can handle. Then, you must consider how you can approach the question. UPS can’t look for the optimal answer. But they can save millions of dollars finding much better answers. Again, they can do this by asking questions only answerable with the data that is streaming in and available in today’s data explosion.

Taught by Professor Tim Chartier, Ph.D.

Phot of Steve Jobs Matthew Yohe [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

Imagine an experiment in randomness. Take a coin and flip it 200 times, and each time record whether it’s a heads or a tails, putting down Hs for the heads and Ts for the tails. Now, suppose you ask a person to just write down a random list of 200 Hs and Ts, and you put up both lists on a blackboard, one made by actually flipping a coin, and the other made by a human. Even though they may both look like an ocean of Hs and Ts, there is a way to tell which one is truly random, and which is human generated.

Learn more: Our Random World—Probability Defined

The thing to do is look for strings of long sequences where there are all Hs in a row or all Ts in a row. In the 200 Hs and Ts generated by randomly flipping a coin, you might see at least four or five long sequences of Hs or Ts: six Hs in a row here, five Ts there—a lot of streaks of many things in a row.

How often will a human being write more than four strings of the same letter in a row when they’re trying to be random?

Now consider the list generated by the human being. How often will a human being write more than four strings of the same letter in a row when they’re trying to be random? Well, we sort of resist this, because we don’t think that’s very random. They think you’ve got to sort of alternate—H-T-H-T—and so here in a human generated one you would see very few strings of Hs and Ts in a row.

This is a transcript from the video seriesWhat Are the Chances? Probability Made Clear. Watch it now, on The Great Courses Plus.

As a matter of fact, when you flip a coin 200 times, the probability of having at least one string of six or longer of Hs or Ts is roughly 96 percent—very likely. The probability of having at least one string of five is 99.9 percent—it’s essentially certain. You’d be very unlikely to flip a coin that many times without getting these long strings, and if you actually simulate this on the computer, you’ll see that this plays out, that you just almost always get long strings.

One of the common misconceptions that a lot of people have about randomness is illustrated by the coin flipping experiment. Let’s say that you flip a coin many times, and just randomly it happened that 10 times in a row you got heads. Well, doesn’t it seem like the next time it’s more apt to be a tails? It does to most people. And the answer is, of course, that the coin doesn’t know what it’s just done. To the coin, every flip is a new flip, and it’s just as likely to be a heads as a tails after it’s done 10 heads in a row, as it was to get a heads than a tails if it had done none of them.

Take a coin, and more than a million times, you flip the coin 11 times. Obviously you do this with a computer.

To demonstrate this, you can simulate the following experiment. Take a coin, and more than a million times, you flip the coin 11 times. Obviously you do this with a computer. Computers are great, by the way; they don’t care—a million times, they’ll just go ahead and do it. So you just do it a million times, and what do you get? To make it easy, you actually flip the coin 11 times for 1,024,000 times, because every 1,024 times is the probability of getting 10 heads in a row. In other words, if you do the experiment of flipping the coin 1,024,000 times, and each time you flip it 11 times, you expect that the first 10 will all be heads about 1,000 times.

Learn more: Probability Is in Our Genes

So you run the computer simulation a first time, and the number of times you get 10 heads in the first simulation is 1,008: extremely close to 1,000. What happened to the 11^{th} coin? Well, 521 times it turned out to be a head also, and 487 times it turned out to be a tail. There’s no memory. Approximately half the time heads, half the time tails.

If you do it again, the first 10 might be heads 983 times, and then the 11^{th} flip heads 473 times and tails 510 times. During a third experiment, 1,031 times it came out heads 10 times in a row, and of those, 502 had the next coin be a heads, and 529 a tails. The coin has no memory. After it’s gotten 10 heads in a row, it’s just as likely to be heads the next time as it was the first time you flipped that coin.

There is another counterintuitive aspect of probability, and it’s really interesting to think about what is rare, and how we view rarity in probability. Suppose you got dealt the following hand: the two of spades, the nine of spades, the jack of clubs, the eight of spades, and the five of hearts. Well, it probably doesn’t strike you as an impressive hand, one you write home about, but it is. One out of 2,598,960—that’s the probability of getting that hand.

Now if you were dealt the ace, king, queen, jack, ten of spades—a royal flush in spades—what’s the probability of getting this royal flush in spades? Exactly the same—1 out of 2,598,960—and yet you would write home to your mother about this hand for sure. Your previous hand was just an average hand, and yet in your whole life of playing cards, you know what? You will probably never get that hand again, because its probability is almost zero—1 out of 2,598,960. So this is one of the counterintuitive concepts of probability: that rare events happen all the time, but you may not recognize them as significant.

Learn more: Probability Everywhere

Rare events absolutely happen by chance alone. The most-common rare event that you see mentioned in the newspapers every day is the lottery. The probability of winning the Powerball lottery is approximately 1 out of 146,000,000. This is the big multistate lottery in some states. One out of 146,000,000. That chance is so remote you’d think it would never happen; but it happens regularly. Why? Because a lot of people try. A lot of people buy random numbers and some of them then occasionally win. If you try something that’s rare often enough, then it will actually come to pass.

This concept—that rare things will actually happen if you repeat them enough and you look for them enough—was encapsulated in an observation that was first made by the astronomer Sir Arthur Eddington in 1929, and he was describing some features of the second law of thermodynamics. He wrote the following:

If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.

However, you can find patterns in random writing, and in fact an enterprising author made a lot of money a few years ago when he wrote *The Bible Code*. What the author of *The Bible Code* did was take the Bible, written in Hebrew, and find patterns of words by skipping a certain number of letters, and in that pattern of skips they would find words written out. One example was “Atomic holocaust Japan 1945.” He said that this was an example of how the Bible showed the future.

The truth is that this is just a matter of probability. If you take all possible sequences of different lengths, you can by randomness alone find surprising things, and just to demonstrate it, people debunking this analysis found patterns in *War and Peace* and so on. This is another challenging part of probability, namely that if you look for rare things but you have a lot of places to look, you’ll tend to find them.

These are some of the challenges of looking at and asking what is random in the world.

In probability, randomness refers to events that occur in no apparent order and are not causally related.

True randomness means that something unfolds purely by chance rather than intentionality, free from human interference.

Cryptography, gambling, statistical sampling, and computer simulation are all purposes for using a random number generator.

Many people claim that they can “outsmart” the lottery or predict winning combinations. People even sell tools to this aim, but these tools are most likely a waste of money. To the best of anyone’s knowledge, the process of choosing the winning lottery numbers operates on the principle of randomness.

Our Random World—Probability Defined

Can You Trust Polling Results?

Mind Expanding Ideas of Metaphysics

The earliest known example of cryptography was found in Egyptian hieroglyphics around 2500 B.C.E. This may have been more for amusement than actually for secret communication. The earliest simple substitution ciphers, known as monoalphabetic substitution ciphers, may have been used by Hebrew scholars around 550 B.C.E.

In a monoalphabetic substitution cipher, one letter is substituted for another.

In a monoalphabetic substitution cipher, one letter is substituted for another. For example, every occurrence of the letter A might be replaced by the letter W, all Bs might be replaced by the letter M, all Cs might get replaced by the letter R, and so forth, down the alphabet. With this particular scheme that I just made up, the word CAB would be encoded to read “RWM.”

A Caesar cipher is a special case of the monoalphabetic substitution cipher in which each letter is replaced by the letter a fixed number of positions down the alphabet. For example, if we replace A by C—notice that C is two letters away from A—then B would be replaced by two letters away from it, which would be D; C would be replaced by two letters away, which would be E; and so forth, all the way down to Z. When we get to Z, we come back to the beginning of the alphabet; so for Z, we go two letters later, which would be B. The Caesar cipher is named after Julius Caesar, who, in the 1^{st} century B.C.E., used such a cipher with a shift of three to communicate with his generals. Such monoalphabetic encryption schemes are very easy to break.

In the basic ciphers, to decode an encrypted message, one reverses the encryption process. Thus, if people know how to encode a message sent to us, then they also have the power to decode other messages.

Wouldn’t it be great to have a coding scheme such that when people use it to send us messages, the encoding process is easy for them to use while at the same time, we’re absolutely certain that we’re the only ones able to decode the messages?

In this fantasy cipher, we wouldn’t have to trust our friends at all. If they lose the codebook and it gets into the wrong hands, it would not jeopardize the coding scheme. In other words, in our fantasy cipher, knowing how to encode messages would not provide any information as to how to decode it.

If this fantasy were real, then there would be no need to keep the encoding process a guarded secret. Instructions describing how to encode messages could be made public, and only the decoding process would need to be kept secret. In fact, in this fantasy, the encoded messages themselves could be made public as well. Our friends could take out ads in *The New York Times* with an encrypted message directed to us. Everyone would see it, but we’d be the only people who would know how to decode it.

The problem is, if a nemesis of ours sees a secret message sent, why couldn’t he take the encryption process—which we ourselves made public—and just run that process backward to decode the message made just for us? This is a problem. To make this fantasy a reality, we would need to have a secret hidden within the public encryption process. So, even though we make this process public, there’s something secret.

Such amazing ciphers are known as public key codes, because the key for encryption is made public.

We’re now ready to apply number theoretic concepts to show that just such a crypto-fantasy can be a reality. The main question remains: How can the encryption scheme be at once public—everyone knows how to encode messages—and private—only the rightful receiver can decode the messages? Such amazing ciphers are known as public key codes, because the key for encryption is made public.

We’ll make our fantasy a reality by combining the concepts of prime numbers together with modular arithmetic in an extremely clever and elegant way. We’ll begin with a metaphor that captures the idea of this modern encryption scheme. Take a brand-new deck of 52 playing cards. If you were to take it and perform eight perfect shuffles, also known as faro shuffles—you cut the deck exactly in half: 26 and 26—and then shuffle without making a mistake. If you make eight perfect shuffles, then look at the cards, magically, they’ll return to their original order. It’s absolutely amazing, and I urge you to try this for yourself, but if you try it, you have to be able to perform eight perfect shuffles in a row.

Suppose now that we performed just five perfect shuffles: the order of the cards would look thoroughly mixed up, without any semblance of pattern or structure. However, we know a systematic method that would return this mess back to a familiar, less chaotic pattern. We’d perform three more shuffles, bringing the number of shuffles up to eight, and voilà—the cards are transformed from a random mess back to their original order.

If anyone looks at them, it looks jumbled, but we know exactly what to do.

Notice that we could employ this shuffling idea to produce an encryption scheme. Our friend could write her message to us, one letter on each card; so she could say: M, A, T, H, and so forth. Then she would just shuffle a certain pre-agreed amount of times. Let’s say five. So she performs five perfect shuffles, and then she delivers the deck of cards to us. If anyone looks at them, it looks jumbled, but we know exactly what to do. We would shuffle three times and then we would be able to read the message. Of course, if we were to use this encryption scheme, anyone sending us a coded message could decode any other message sent to us as easily as we could. Easy, assuming that we can do perfect shuffles. To have such a scheme truly fulfill our encryption fantasy, we would need to first figure out how to mathematically shuffle our message and then how to make that shuffling process public without allowing others to unshuffle our message.

Here’s the moment where we introduce our number theory. The public feature arises from the fact that factoring extremely large natural numbers is impossible, for all practical purposes, despite the reality that we know that such a factorization is possible in theory. So now we’re going to start to make a distinction between practice and theory.

To see the basic idea behind this public-versus-secret dichotomy, suppose that someone announced the number 6 and also revealed a secret. The secret is that this number is the product of exactly two primes. Can we uncover the secret? Of course:

6 = 2 × 3. There. In some sense, we just broke the code. What if, instead of 6, the announced number that’s the product of two primes was 91? Can we break this code? With some thought, maybe a little bit of arithmetic, we could figure out that 91 is 7 × 13, and thus we’ve broken this code as well, although it took us a little bit longer.

What if the announced number was 2,911? Can we break this code? No, not so easily. But if we use a calculator or a computer, we’d be able to discover that 2,911 equals 41 × 71, and we’ve broken that code, too.

What if the announced number was a 100-digit number? For all practical purposes, even knowing that this number is, in fact, a product of exactly two primes, we would have no way of determining what the two factors are. In fact, even computers have limits to the size of numbers that they can factor. In this way, notice that we can both announce a piece of information publicly—namely, this enormous number—and yet, from a practical point of view, within that public information is a secret that only we, as the receiver, know.

This reality is how individuals will be able to announce an encryption process without revealing the decryption process. To encrypt messages, people need only use the huge natural number. However, to decrypt or decode an encoded message, the receiver will need the prime factors of that huge number, which, for all practical purposes, is a true secret.

Taught by Edward B. Burger, Southwestern University

The focus of this article is on sums. You will learn how to quickly add all the numbers up to 1000 and back down, learn about sums of odd numbers and of even numbers, and even establish Galileo’s results on ratios of sums of numbers—all through the use of a single picture.

We can circle groups of dots in pictures to make sense of division. For example, the division problem 18 ÷ 3 is asking the following question: How many groups of 3 can you find in a picture of 18 dots? There are 6 of them, so 18 ÷ 3 = 6.

We can push this visual picture further and make sense of some complicated division problems. For example, what is 808 ÷ 98? We can see that the answer has to be 8 with a remainder of 24.

You can imagine looking for groups of 100, rather than 98. (The number is 98 is too difficult.) If we visualize this, we see that there will be 8 of these groups, with 8 dots left over.

But each group of 100 is itself off by 2 dots—we wanted groups of 98—so we have an extra 16 dots floating around. That makes for 8 groups of 98 and a remainder of 16 and 8, which is equal to 24 dots. Therefore, 808 ÷ 98 = 8 with a remainder of 24.

When asked to do 34 − 18, we can certainly do the traditional algorithm and get the answer, 16.

But can’t we just see in our minds that the answer has to be 2 + 10 + 4, which is 16? Line up a row of 34 blocks and a row of 18 blocks side by side.

Now we can see that the 2 rows differ by 2 and 10 and 4 blocks, so the difference is 16.

In the same way, 1012 − 797 has to be 3 and 200 and 12—which is 215. From 797 to 800 is 3, from 800 to 1000 is 200, and there is an extra 12, for a total of 215.

This flexibility of thought helps with subtraction in general. For example, consider 1005 − 387.

We have a lot of borrowing to do if we follow the traditional approach: 5 − 7, 0 − 8, and 0 − 3 all need borrows.

But we can make this work simpler.

We are looking for the difference between 1005 dots and 387 dots. Let’s make 1005 friendlier and turn it into 1000. Remove 5 from each and just compute the difference between 1000 and 382 instead. Now we can see the answer: 8 + 10 + 600, or 618.

But if we still want to do the traditional algorithm, then we can remove 1 more dot from each pile and make the problem 999 − 381.

Now we can do the algorithm without any borrows: 9 − 1, 9 − 8, and 9 − 3. This way, we’ve made the problem much easier to do, even if someone insists that we use the algorithm.

Isn’t multiplication really a geometry problem? Isn’t 24 × 13, for example, just asking for the area of a rectangle that is 24 units wide and 13 units high?

Then why not just chop up the rectangle into pieces that are manageable? For example, think of 24 as 20 and 4, and 13 as 10 and 3.

Then we see that 24 × 13 must be the areas of the individual pieces added together: 200 + 40 + 60 + 12 = 312.

In a 5-by-5 grid of squares, there are 25 small 1×1 squares within the grid. But we can count 2×2 squares as well. There are 16 of these in total. If we count the 3×3 squares, there turns out to be 9 of those. And there are 4 of the 4×4 squares. Finally, there is 1 large 5×5 square.

So, there are 25 1×1 squares, 16 2×2 squares, 9 3×3 squares, 4 4×4 squares, and 1 5×5 square. Each count of squares is itself a square number!

Why does counting squares on a square grid give square-number answers? Let’s focus on the lower-left corners of the squares we’re counting. For example, of the 2×2 squares, the following are some possible lower-left corners can be seen in figure 1.14.

Let’s draw all of the possible lower-left corners. Now we see that there is a square array of them, 4 × 4 of them, which is 16. Thus, there are 16 2×2 squares.

Let’s view the 5‑by‑5 grid as an array of dots as in figure 1.16. This is certainly a picture of 25 dots, but can you see in this picture the sum 1 + 2 + 3 + 4 + 5 + 4 + 3 + 2 +1?

Look at the diagonals: 1, 2, 3, 4, 5, 4, 3, 2, 1.

The sum we seek matches the diagonals of the square. There are 25 dots in all, so without doing any arithmetic, we can say that the value of the sum must be 25.

What is the sum of all the numbers 1 + 2 + 3 + … up to 10 and back down again?

This sum must come from the diagonals of a 10‑by‑10 array of dots. Again, without any arithmetic, the value of the sum must be 10 squared (10^{2}): 100.

What is the sum of all the numbers from 1 to 1000 and back down again? It must be 1000 squared, from a 1000-by-1000 array of dots. That’s 1 million.

If you were to compute this on a calculator—1 + 2 + 3 + …—it would take forever. But the answer is available to us quickly via this picture.

1 + 2 + 3 + … + 998 + 999 + 1000 + 999 + 998 + … + 3 + 2 + 1 = 1000 × 1000 = 1,000,000

There is a general formula for the sum of numbers.

1 + 2 + 3 + … + n = n2 + n ÷ 2

The sum of the first *n* numbers, 1 + 2 + 3 all the way up to some number *n*, is (*n*^{2} + *n*) ÷ 2. For example, the sum of the first 5 numbers, 1 + 2 + 3 + 4 + 5, is 5^{2} + 5 = 25 + 5 = 30, and 30 ÷ 2 = 15. And we can check that 1 + 2 + 3 + 4 + 5 is indeed 15.

Where does this formula come from, and why is it true?

Our 5×5 array of dots gave us something akin to this result. We have that 1 + 2 + 3 + 4 + 5 + 4 + 3 + 2 + 1 = 25. Can we get from this answer to just 1 + 2 + 3 + 4 + 5?

If we look at what we have, we see that the sum we want, 1 + 2 + 3 + 4 + 5, is the left half of the equation.

1 + 2 + 3 + … + n = n2 + n ÷ 2

1 + 2 + 3 + 4 + 5 + 4 + 3 + 2 + 1 = 25

Actually, half is not quite right. The right portion of the equation is missing a 5. It’s just the sum 1 + 2 + 3 + 4. We want to see an additional 5, so let’s add a 5 on the left—and to keep things balanced, we need to add a 5 to the right as well.

1 + 2 + 3 + 4 + 5 **+ 5** + 4 + 3 + 2 + 1 = 25 **+ 5**

Now we see 2 copies of what we want. Twice the sum we seek is 25 + 5. So, this means that the sum itself is half of this. 1 + 2 + 3 + 4 + 5 is indeed (5^{2} + 5) ÷ 2. And this matches the general formula. There is nothing special about the number 5. The same ideas show that the sum of the first *n* counting numbers must be half of *n*^{2} + *n*.

2 × (1 + 2 + 3 + 4 + 5) = 25 + 5

1 + 2 + 3 + 4 + 5 = 25 + 5 ÷ 2

Look at the 5-by-5 grid of dots again. Do you see the sum 1 + 3 + 5 + 7 + 9, the sum of the first 5 odd numbers?

We can certainly circle these groups randomly and make them fit.

But such a random picture isn’t enlightening. We want to see a picture that isn’t locked into this particular example of 25 dots. We want a picture that speaks to a higher truth and clearly holds for all possible square arrays. Mathematicians are always on the lookout for this sort of thing, and symmetry is often a pointer to higher truths.

Do you see 1 + 3 + 5 + 7 + 9 in the 5-by-5 array of dots in a way that speaks to a higher truth? Think L shapes.

The sum of the first 5 odd numbers is hidden in the 5-by-5 array as Ls. The sum 1 + 3 + 5 + 7 + 9 must be 5^{2}, or 25.

In the same way, the sum of the first 10 odd numbers, 1 + 3 + 5 + 7 + 9 + 11 + 13 + 15 + 17 + 19, sit in a 10-by-10 array of dots and therefore must have an answer of 100, the count of dots in that array.

In general, the sum of the first *n* odd numbers must be *n*^{2}.

Galileo lived at the turn of the 16^{th} century and is revered today for his work in science and mathematics, thought to make fractions out of the odd numbers. For example, take the first 5 odd numbers and use their sum for the numerator of a fraction and the sum of the next 5 odd numbers for its denominator. This gives a fraction that simplifies to 1/3.

Do the same for the first 2 odd numbers, followed by the next 2. You get 1/3 again.

Do it again for the first 10 odd numbers, and the next 10. It’s 1/3 again!

Galileo observed that all the fractions made out of the odd numbers this way are equal. They all equal 1/3. These fractions are today called the Galilean ratios. There is a connection between the ratios and the L shapes in squares. Figure 1.23 is purely visual proof of the Galilean ratios.

The first 5 L shapes, the sum of the first 5 odd numbers, makes 1 block of 25 dots. The next 5 L shapes for the next 5 odd numbers makes 3 blocks of 25 dots. So, the first 5 odd numbers make for 1/3 of the next 5 odd numbers.

Are there results about sums of even numbers, too? For example, we have a picture for the sum of the first 5 odd numbers. Can we get from this a picture of the first 5 even numbers, 2 + 4 + 6 + 8 + 10?

Just add a dot to each L shape!

This has turned the 5-by-5 square into a rectangle. The sum of the first 5 even numbers must be the 5×5 we had before plus 5 more, 5^{2} + 5, which is 30.

In general, the sum of the first *n* even numbers must come from the picture of *n*^{2} dots plus an extra *n* dots: *n*^{2} + *n*.

We’re coming full circle, because we have seen the expression *n*^{2} + *n* before.

Take the sum of the first 5 even numbers. It equals 5^{2} + 5.

Now divide everything by 2: 2 ÷ 2, 4 ÷ 2, 6 ÷ 2, 8 ÷ 2, 10 ÷ 2, and (5^{2} + 5) ÷ 2.

And we’re back to the formula 1 + 2 + 3 + 4 + 5 = (5^{2} + 5) ÷ 2.

We have come full circle. We’re back to the general formula for the sum of numbers.

- a. What is the sum of the first 1000 counting numbers?

b. What is the sum of the first 1000 odd numbers? (What is the thousandth odd number?)

c. What is the sum of the first 1000 even numbers?

2 Draw a picture to show that the sum of the first 3 odd numbers must be 1/8 the sum of the next 6 odd numbers.

- a.

b. The one-thousandth odd number is 1999 and the sum of the first 1000 odd numbers: 1 + 3 + 5 + … + 1999, is 1000^{2 }= 1,000,000.

c. The sum of the first 1000 even numbers: 2 + 4 + 6 + … + 2000, is 1000^{2} + 1000 = 1,001,000. (Divide this by 2 and get back to the sum of the first 1000 counting numbers!)

- (See FIGURE 1.27.) In general, we have:

Taught by Professor James Tanton, Ph.D.

At the beginning of this game, I give you $100 and a button. Imagine that 100 viewers in 100 rooms across the country are reading this, and that each has been given $100, just like you, and a button, just like you. In a moment, I’m going to ask you to decide whether to push your button. That’s the only decision that you’ll have to make, and in doing so, you’ll be deciding upon a strategy.

In every game, every player has a strategy. If you’re a rational player, you’re going to try to adopt the strategy that will maximize your expected payoff given what you know—or think you know—about the other players in the game.

In every game, every player has a strategy. If you’re a rational player, you’re going to try to adopt the strategy that will maximize your expected payoff given what you know—or think you know—about the other players in the game. But you don’t know enough yet to know whether to push. What does it do? Pushing this button has two effects: one that affects you and one that affects everybody else. Actually, if nobody’s actions affected anyone else, it wouldn’t be a game. Games are interactive.

When you push your button, the first thing it does is to take $2 away from every other player. You push your button, and just like that, everyone else is down to $98. You still have your $100. Sounds rather vicious on your part. But other people may press their buttons, too, you know, and every time they do, you lose $2 along with everybody else. If 60 other people pushed their buttons, you’re going to lose 2 × 60, or $120. Given that I gave you only $100 to start with, you’re going to end up $20 in debt. You’ll have to pay up. Except that there’s a way out of this for you.

I said that pressing your button has two effects, and the second one targets you. If other people press their buttons and cause you damage, pushing your button will cut that damage in half. A moment ago, I said that if 60 people push, you lose $120, but if 60 other people push and you do, too, then you only lose $60. You’re still $40 to the good. You’ve done $2 damage to everybody else, but you’ve saved yourself $60. Are you going to push?

You probably made some assumptions about this game, reasonable ones, as it turns out. You’ve assumed that everybody else’s button works the same way that yours does. It does because this game is symmetric; everyone is in the same boat. Also, you’ve probably assumed that everybody else has the same information that you do, that is, that the structure of the game is common knowledge to everyone. Actually, being common knowledge means quite a bit more than that. It’s not just that everybody knows the rules of the game; it’s that everybody knows that everybody knows the rules of the game, and everybody knows that everybody knows that everybody knows the rules of the game, and … you get the idea. And you’re right: Everyone knows the same information that you do.

I want you to think carefully now and decide what you’re going to do: push or not push.

A hundred people, I don’t know. Some of them are going to push. No matter what the other people do, I’m at least as well off pushing as not pushing.

You’re probably entertaining several different lines of thought right about now. One line of reasoning is this: We all know how the game works; it’s obvious. If nobody pushes the button, everybody gets $100. I might not even be concerned about being a nice person, but I don’t have to be. We can all get $100. I’d be crazy to push. That’s a good argument. A second line of argument, maybe even more compelling, is this: A hundred people, I don’t know. Some of them are going to push. No matter what the other people do, I’m at least as well off pushing as not pushing. If I don’t push, I could end up $100 in debt. If I push, at least I end up breaking even. Heck, I’m a good person, and if I’m thinking about pushing, I can imagine what the other people will do—I have to push in self-defense.

Here’s the third line: I’m not going to push. I’m not pushing because it’s the right thing to do in a moral sense. I could lose up to $100. I could go $100 in debt, but it’s worth it for the sake of my ethics.

Or you may decide: Eh, $100; it’s not that much money. It would be too much fun to just stir things up and see what happens. Push the button. Or you may have a competitive streak, and you know that if you don’t push, everybody who does will end up ahead of you. Maybe you don’t have much of a taste for being a chump. Of these five lines of reasoning, it’s interesting to know which, if any, actually are rational. That’s a question that we’ll be visiting over and over as this course goes on.

Okay, it’s time to decide. I really wish that I could tally the votes that are coming in in real time, but of course, I can’t. What I can and will do is tell you, after you make your choice, the results of similar games that I’ve played with other people. Make your choice and, please, state it out loud; keep yourself honest. Push or don’t push. Three, two, one; done.

With groups of strangers who have no training in game theory, generally somewhere between 30% and 70% of the people push the button. That’s a pretty wide range, but if you take the average, you get 50%.

If you didn’t push, that means that you’re now broke. If you pushed, you still have $50.

This might not make that much of an impact on you; after all, this was just a game. No, that’s the wrong way to say that. What I mean to say is, of course, this was just pretend, but the game is real. It’s not pretend. We’re not talking about child’s play here. We defined a set of possible moves by which players interacted with each other; they had common knowledge of the structure of the game; and they made rational decisions about strategies that led to their best expected payoff. These components—players, strategies, payoffs, and common knowledge—are what make a game a game in the game-theoretic sense. And if you change the context of this game by replacing the players with countries and by changing, pushing the button to being willing to engage in military conflict, then we have something that is much more than just a diversion.

Later in this course, we’ll find out how game theory says this game should be played. But at the moment, what we know is how it is played. The variety of responses that we’ve seen in this game—between 30% and 70% pushing—show that one of two things must be the case: Either the theory of game theory isn’t sufficiently common knowledge that people are comfortable choosing rightly, or maybe this game is an inherently dangerous one. Maybe we need to find a way to keep pushing the button from being so tempting an option, because if 30% to 70% of the people in the nuclear version of this game decide to press the button, we’re all in for a very, very bad time.

In any case, the name “game theory” may be an unfortunate one. A more descriptive name would be “strategic interaction decision making.” Game theory sounds like child’s play, and it’s not.

Taught by Professor Scott P. Stevens, James Madison University