The purpose of this chapter is to show that if a monkey types infinitely, Shakespeare’s Hamlet and any other works one may wish to add to the list will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence, together with the notion of limit superior and limit inferior of a sequence of sets.
- Borel-Cantelli lemma
- limit superior of sets
- limit imferior of sets
- limit theorems of probability
Consider a monkey named Sue who is given a word processor with symbols. We shall assume that these symbols include the 26 letters of the English alphabet (upper and lower case), all the Greek letters, the numbers 0 through 9, a blank space, all the standard punctuation marks (,.; − etc.), and mathematical symbols (, , , , etc.); imagine, in fact, that is so large that the keyboard is capable of typing just anything we might fancy, in any language. (A L
If Sue is handed such a machine and pounds away, randomly, it is clear that most of what she types will be complete gibberish. A stray segment of sense such as “dog eat HAy” or even “Ronnie ReAGan our 40th PresiDof USA” would surprise no one, but how often, we ask, would she successfully manage to type the Constitution of the United States, or Shakespeare’s , or the fundamental mathematical works of the 2026 Fields Medalist(s)?
The purpose of this chapter is to show that if Sue types infinitely, the above works (and any others that one may choose to add to the list) will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a serious graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence.
The reader is invited, while reading this chapter, to let his/her imagination run wild, and concoct a plethora of similar examples. A somewhat mundane objection may be raised immediately: how can Sue (or anyone else for that matter) type indefinitely? We shall not dwell on this nonmathematical problem, but will remark instead (and prove a little later) that Sue’s never-ending assignment is mathematically equivalent to the task of randomly selecting a number from the interval .
We would like to mention that our problem is related to the famous “Problem of a printed line,” a popular account of which can be found in George Gamow’s classic book . The solution presented there, however, is entirely deterministic and of a finite character: the automatic printing press considered by Gamow does not print indefinitely, and the probabilities of various outcomes are not calculated.
2. Limit superior and limit inferior of a sequence of sets
Consider any sequence of subsets of a set . Points of will be denoted by . We know that
It follows that
To better understand this somewhat complicated set, we first let and note that for some , say . Letting , we see that must belong to some , where . Continuing in this fashion, we see that if and only if for infinitely many ‘s.
The set is called the limit superior of the sequence , and is denoted by , or, , or, rather appropriately, by , where i.o. stands for “infinitely often.”
In a similar fashion, we observe that the set
is a collection of those points that belong to all but a finite number of the ‘s. is called the limit inferior of the sequence and is usually denoted by or . We prefer the notation (a.b.f.o means “all but finitely often”). Elementary symbol manipulation may be used to prove that . It is easier to note, however, that if belongs to all but finitely many ‘s it must necessarily belong to an infinite number of them. The above fact is just one of the many similarities between lim sups and lim infs of sets, on the one hand, and of real numbers, on the other (recall that ). Likewise and must both exist, that is,
as must and , with
If , the sequence is said to have a limit, which we define to be the common value of and and denote by (note, again, the analogy with real sequences).
A useful dual relation between these two sets is
where denotes the complement of the set A. Here is an informal proof of the first of these two facts: iff belongs to all but finitely many ‘s iff just a finite number of the ‘s iff iff .
A few examples should help familiarize the reader with the above notions: the second and the third are taken from :
Example 1. If then . Likewise, if , then .
Example 2. If and , then and .
Example 3. if is the unit circle with center at , then
In what follows, the set will be taken to be the sample space (or set of possible realizations) of a random experiment (one whose outcome cannot be predicted in advance). We shall assume that each subset of that we encounter is . In other words, each set will be assumed to belong to the sigma algebra of events, which is a class of subsets of satisfying the conditions
If , then and
If , then .
This restriction ensures that sets such as and are themselves measurable, so that we may meaningfully talk of their probabilities and We next move on to a key concept in probability:
Definition: The sequence of events will be said to be independent, if for each finite subcollection ,
Stated informally, this means that the occurrence (or nonoccurrence) of any finite subcollection does not affect the probability of occurrence of another disjoint collection .
The events that represent the successive outcomes of an infinite coin-tossing experiment are usually assumed, on intuitive and empirical grounds, to be independent. We shall make the same assumption regarding Sue’s successive choices of a keyboard’s key.
The Borel-Cantelli lemma is a two-pronged theorem, which asserts that the probability of occurrence of an infinite number of the independent events is zero or one:
If is any sequence of events, then implies that .
If is an independence sequence, then equals 0 or 1 according as the series converges or diverges.
The following lemma can be proved using elementary properties of probability measures:
Lemma 2.2. If is an increasing (decreasing) sequence of events then
Proof of Theorem 2.1
For any , we note that
On letting , we see that
proving part (a).
b. We shall prove that . Let be arbitrary. Note that
where the last inequality follows from the fact that . Lemma 2.1 now gives us that
which proves that
proving the result.
As seen by the dates on Refs. [3, 4], the Borel-Cantelli lemmas are classical, and now part of virtually all graduate level books on Probability such as . Since then, for over 100 years, the literature on the lemmas has focused on weakening the independence requirement in the second lemma, or looking at more complicated probability models that yield the same conclusions. See for example [8, 9, 10]. What distinguishes this work from these and others is that we provide a very down-to-earth application that forces the reader to come to terms with the notions of independence and infinity, as opposed to the finite samples one has in statistical situations. It is a paper that we feel can cause amusement, astonishment, false disbelief, and, ultimately, understanding. With this backdrop, we are now in a position to start establishing the claim made at the beginning of this chapter:
Corollary 2.3. If Sue types indefinitely, by successively and independently choosing one of the available keys, then any specific work containing a total of characters (including all the blanks, of course) will be typed by her indfinitely often, with a probability of 1.
Proof. Let be the event that Sue’s first random choices lead to the work being typed correctly. It is clear that . Similarly, let denote a successful completion of the task between the and keystrokes. In general, denotes the completion of a flawless job between and the random strokes. It is evident that and that the sequence is independent. Since it follows by part (b) of the Borel-Cantelli lemma that . Since the probability that the work is typed correctly infinitely often is at least as large as , the proof is complete. (Notice how the events are defined using disjoint blocks; this guarantees their independence.)
Lemma 2.4. If ) = 1 (n = 1,2,...), then .
Proof: Boole’s inequality states that
Thus for each . The required conclusion is obtained on letting and using Lemma 2.2.
Corollary 2.5. If Sue types indefinitely, then every piece of writing (of any finite length whatsoever; published, unpublished, or yet to be written by yet unborn individuals; meaningful or gibberish) will be typed by her, infinitely often each, with a probability of 1.
Proof. Denote the works by , and set By Lemma 2.2, for each and thus, by Lemma 2.4, , as claimed. (Note: this implies that the probability that some work is typed finitely often (or never) is zero.)
Example 4. (Statistical tests of hypotheses) If a fair coin is tossed infinitely often, a sequence of consecutive heads will appear infinitely often with probability 1.
Now, if a coin (of unknown origin) were tossed a million times, and a head appeared each time, the “null” statistical hypothesis
would be summarily rejected at most conventional (5, 1, 0.00001%) levels of significance. The point to note, however, is that such “extreme” and “erratic” behavior will be exhibited on an infinite number of occasions by any fair coin (and by all coins with with a probability of 1.
Similarly, if a fair coin is tossed infinitely often, an -long alternating sequence (is arbitrary) will appear infinitely often, almost certainly. This fact may be compared with the conclusion of a standard nonparametric statistical procedure, the run test: the fair coin hypothesis would be vigorously rejected, using this test, if a large number of coin tosses yielded an alternating sequence of heads and tails.
3. A probability model for infinite coin tossing
In the above discussion, we often concluded that a particular event (e.g., Hamlet is typed infinitely often) occurred with probability 1. One fundamental question that we did not address, however, was the following: just what probability model describes infinite coin tossing or simian typewriting? Put another way, what are the sample spaces associated with these two experiments? And what exactly is the probability of an event defined to be? We realize then, in retrospect, that we had put the cart before the horse; various events were shown to have probability 1, by assuming the existence of a logically consistent probability (measure) on a sample space that had not been fully described. This practice is fairly standard in the teaching of probability; for example, sequences of independent and identically distributed (i.i.d.) random variables are often introduced as mathematical objects before their existence is proved, using Kolmogorov’s famous Consistency Theorem. Such an approach is often beneficial; as Billingsley  wrote, “It is instructive... to see the show in rehearsal as well as in performance.”
We shall start by noting that three tosses of a fair coin lead to the eight-point sample space
It seems reasonable to assign probability 1/8 to each of these eight points; thus the probability of any subset may be defined by
Our analysis is thus complete, and can easily be extended to any finite number of coin tosses. The situation gets rapidly more complicated if the coin is tossed endlessly. This experiment cannot be conceived, carried out, or justified “in practice,” and our neat conclusions would be rendered meaningless if we were unable to mathematically model our procedure. Happily, however, this is not the case. We simply let
A typical element of might be . It is well known (and easily proved) that is an uncountable set. It seems reasonable, then, to assign probability zero to each sample point.
The next step is crucial. We identify each element of with the real number in the interval [0,1] that has the same binary expansion. For example, the sample outcome THTHTH ... is identified with the real number 0.01010101 ... which equals
A problem arises immediately: Numbers of the form , where and are positive integers, do not have a unique binary representation. In other words, two different sample outcomes such as HTTTTT... and THHHH... would correspond to the same real number 1/2 (since 1/2 = 0.0111... = 0.100...), and the correspondence between and [0,1] would not, consequently, be one-to-one. We note, however, that numbers of the form constitute a denumerable set, and that there are two sample outcomes that correspond to each such number. If one, but not both, of each of these outcomes were to be removed from , we would be left with a one-to-one map from a censored sample space onto [0,1]. Moreover, our assumption regarding individual sample points forces to equal zero. Thus, if a set of zero probability is thrown out from the original sample space, we may let and derive great satisfaction from the knowledge that this would not change the answer to any of our probability calculations.
It is possible to show, in a somewhat non-rigorous fashion (i.e., without using much measure theory), or rigorously, by introducing Lebesgue measure, that infinite coin tossing is mathematically equivalent to choosing a number randomly from the interval [0,1]. It can be shown, in a completely analogous way, that infinite random typewriting is equivalent to the single random choice of a number in [0,1]. We need of course, to consider the -ary representation of numbers in [0,1], instead of their binary expansion (where ithe number of typewriter keys). However, we shall not do so here.
Example 5. (Random Numbers) Let the random variable denote the random choice of a number from [0,1]. Then
where is the ‘th rational. Since, for each , we have that
This result may be compared with a mundane fact of “reality”: If a person, computer, pointer, or random number generator were asked to choose , limitations of measurement accuracy (or decimal point restrictions) would systematically exclude irrational X’s, leading to the “conclusion” that
We would like to next state a thrilling result, called Borel’s law of normal numbers : A number in [0,1] is said to be normal, if its decimal representation has, asymptotically, an equal frequency of the digits 0 through 9:
for each . Borel’s law states that
which is somewhat surprising, since it is awfully hard to think of a single number that is normal (the number 0.012345678910111213..., obtained by writing each integer successively, is known to be normal; the proof is not trivial).
The Borel-Cantelli lemma yields several consequences that may, at first glance, seem to contradict Borel’s normal number law:
Almost all the numbers in [0,1] (i.e., all except some with zero Lebesgue measure) have decimal expansions that contain infinitely many chains of length 1000, say, that contain no numbers except 2,3, and 4. The nice part is, of course, that almost all of these numbers are normal as well, and so on.
The moral of the Borel-Cantelli lemma should, by now, be quite clear: “The realization of a truly random infinite procedure will, with probability one, contain infinitely many segments that exhibit extreme ‘non-randomness’, of all sizes, patterns and intensities.” The Borel-Cantelli lemma is, after all, a limit theorem of probability, and a quote from the classic treatise of Gnedenko and Kolmogorov  might be in order as well: “In reality, however, the epistemological value of the theory of probability is revealed only by limit theorems. Moreover, without limit theorems, it is impossible to understand the real content of the primary concept of all our sciences—the concept of probability.”
4. Conclusions and future developments
The main results of this chapter, accessible to a second-year undergraduate, are Corollaries 2.3 and 2.5. They follow from the Borel-Cantelli lemmas and Boole’s inequality, respectively. Corollary 2.3 states that in an infinite sequence of keystrokes, any fixed-length “work” appears infinitely often with probability 1. Most undergraduates that the author has taught have great difficulty believing this fact, since most statistical tests, for example, are based on finite samples. Corollary 2.5 goes one step further, proving that every finite-length piece of work, even those yet unwritten, will each appear infinitely often with probability 1. The undergraduate reader will undoubtedly appreciate the “power of infinity” on reading this chapter, while graduate students will enjoy a nonpractical yet deep application of the Borel-Cantelli lemmas.
Example 4 makes a contrast between the finite situation and the infinite one. An important practical problem in this regard would be to use Poisson approximations as in  to find the approximate probability that a specific work occurs times in keystrokes and to use this process as the basis of a statistical test for randomness.