(Back to index)

# Monkeys, typewriters and Shakespeare

When a concept of physics or mathematics is too difficult for the regular layman to understand, someone will come up with an analogy close to real life to describe this concept or phenomenon. If this analogy is catchy enough, it will then basically be engraved in stone, so to speak, and people will start repeating it blindly, without really understanding it.

The problem with analogies is that they are usually very weak representations of the physical or mathematical phenomena behind them. This means that they can usually be understood in the wrong way and all kinds of misconceptions may arise. Often the wording of the analogy is not precise enough and people get the wrong ideas.

Sometimes the analogies themselves are simply bad and do not correspond to their underlaying theory very well, if at all. Sometimes analogies are simply wrong, period.

One widely spread analogy which, in my opinion, at least borders the wrong is the one related to monkeys, typewriters and Shakespeare. There are actually slight variations of this analogy, changing the number of monkeys and the amount of time, but one typical version goes like this:

A million monkeys hammering a million typewriters for a million years will eventually write the entire works of Shakespeare.

Some versions try to fix the fallacy in that statement by changing "a million years" with "an infinite amount of time".

What this analogy is actually trying to say is that given an evenly distributed (truly) random number generator, popping an unlimited amount of values from it will make the probability of the appearance of any given finite subsequence unlimitedly high. (Note that this is not the same as "any given finite subsequence will eventually appear.")

However, the analogy is horribly flawed. There are numerous errors in it:

### A monkey is a bad random number generator

Using precisely monkeys in the analogy is an extremely poor choice. This is because "monkeys hammering typewriters" is a very poor random number generator.

The theory needs an evenly distributed random number generator. This means that all values have the same probability. An animal hitting a typewriter is probably one of the poorest choices for this. It may well be that some letters (like 'A') are never or very rarely hit while others are being hit almost constantly.

### A million years is not enough

Even if we assumed that the monkeys were a true evenly distributed random number generator, why precisely a million years is enough for the works of Shakespeare to pop up? There's certainly no law of mathematics or physics that says this. It's perfectly possible for the monkeys to hammer their typewriters for a million years and not come up with the works of Shakespeare, even if they create truly random sequences.

This is the reason why some people try to fix this problem by changing it to "an infinite amount of time". However, that doesn't leave the analogy flawless either:

### Why a million monkeys?

If we assume that a monkey is a true evenly-distributed random number generator and that it had an infinite amount of time, why do we need a million of them? One would be enough. It doesn't make any difference.

However, even that is not completely flawless:

### Might, not will

Even if we had a true evenly-distributed random number generator and an infinite amount of time, stating that it will eventually generate the entire works of Shakespeare is wrong. It might generate them, but there's no guarantee.

Why is it wrong? It is wrong for the simple reason that there exists an infinite amount of random sequences not containing the entire works of Shakespeare. It is possible for the random number generator to go through those sequences before it generates Shakespeare's works. Since there's an infinite amount of other sequences it will thus take an infinite amount of time to go through them and the works will thus never be generated.

Thus, it is possible for the works of Shakespeare to be generated, and the probability is actually unlimitedly high, but there's no absolute guarantee that it will happen. It is thus wrong to say that "it will eventually generate the works of Shakespeare".

The misconception that the works of Shakespeare must appear at some point is actually closely related to the so-called gambler's fallacy. This fallacy is the mistaken conception that past events affect future probabilities. The classical example is coin tossing: If we are tossing a coin repeatedly and it has given us 9 heads already, the fallacy is to think that the probability of getting tails in the next toss is very large. Naturally this is not so: The probability of getting tails in the next toss is still 50%. Past tosses do not affect this probability in any way.

The monkeys-typing-Shakespeare statement can be simplified to a coin-tossing statement: "If we toss a coin an infinite number of times, then getting tails must happen at some point."

This is a fallacious statement. Sure, the probability of getting an infinite number of consecutive heads is infinitely small, but each toss still has the 50% chance. There's no law of mathematics which would say that in repeated tosses a certain solution must appear. Yes, it's extremely likely that it will appear, but there's no law that says that it definitely must appear. It's perfectly possible that we get an infinite number of heads. Each toss has 50% of probability, and past tosses do not affect this.

### Some mathematical background

In order to avoid some confusion, let me clarify a few things:

What the monkey analogy is trying to describe is, basically, the property of the mathematical definition of an evenly-distributed random number generator. This definition is closely related to the concept of infinity.

Basically, what it's saying is that if we have an infinite sequence of evenly-distributed random characters (including spaces to separate words, etc), the entire works of Shakespeare must appear in that sequence (an infinite number of times, no less). If they didn't appear, then the randomness would not be evenly distributed, by definition. If the works didn't appear, then there would be some kind of bias in the random number generator, making it not evenly-distributed.

Where the analogy fails is trying to bring the concept (related to infinity) into a concrete physical form. If you take any finite portion of that sequence, no matter how large, the probability of Shakespeare's works appearing in that sequence is less than 1.

If you had an evenly-distributed random number generator producing random characters, it may never produce Shakespeare's works, no matter how long you wait. You can expand the sequence as much as you want, without any upper limit, and there would still be no guarantee that the works will appear. As stated, each new character produced by the generator is not affected by previously produced characters, and the probability for each new character to break the desired outcome is always the same. It doesn't matter how many times you take a new random character, the probability doesn't change.

The concept of infinity that the mathematical definition is describing is a more abstract concept than can be described with any real-life analogy. The analogy wrongly presents the notion that a physical random number generator must eventually produce the entire works of Shakespeare. This is not so: No matter how long you run the generator, there is no guarantee.

It is possible for the generator to produce the works. And in fact, the probability of this increases with the amount of generated characters. In other words, if you were to set the generator to generate a trillion random characters, the probability would be larger than if you set it to generate a half trillion. However, no matter how long you run it, the probability will never reach 1, and there will never be any guarantee. You could literally wait forever and never get the works.