The Borel-Cantelli Lemmas, and Their Relationship to Limit Superior and Limit Inferior of Sets (or, Can a Monkey Really Type Hamlet?)

The purpose of this chapter is to show that if a monkey types infinitely, Shakespeare ’ s Hamlet and any other works one may wish to add to the list will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence, together with the notion of limit superior and limit inferior of a sequence of sets.


Introduction
Consider a monkey named Sue who is given a word processor with N symbols. We shall assume that these symbols include the 26 letters of the English alphabet (upper and lower case), all the Greek letters, the numbers 0 through 9, a blank space, all the standard punctuation marks (,.; À etc.), and mathematical symbols (∞, Ð , ), ∇, etc.); imagine, in fact, that N is so large that the keyboard is capable of typing just anything we might fancy, in any language. (A LAT E X editor could do much of that too, but not in all languages!) If Sue is handed such a machine and pounds away, randomly, it is clear that most of what she types will be complete gibberish. A stray segment of sense such as "dog eat HAy" or even "Ronnie ReAGan our 40th PresiDεητ of USA" would surprise no one, but how often, we ask, would she successfully manage to type the Constitution of the United States, or Shakespeare's Hamlet, or the fundamental mathematical works of the 2026 Fields Medalist(s)?
The purpose of this chapter is to show that if Sue types infinitely, the above works (and any others that one may choose to add to the list) will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a serious graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence.
The reader is invited, while reading this chapter, to let his/her imagination run wild, and concoct a plethora of similar examples. A somewhat mundane objection may be raised immediately: how can Sue (or anyone else for that matter) type indefinitely? We shall not dwell on this nonmathematical problem, but will remark instead (and prove a little later) that Sue's never-ending assignment is mathematically equivalent to the task of randomly selecting a number from the interval 0, 1 ½ . We would like to mention that our problem is related to the famous "Problem of a printed line," a popular account of which can be found in George Gamow's classic book [5]. The solution presented there, however, is entirely deterministic and of a finite character: the automatic printing press considered by Gamow does not print indefinitely, and the probabilities of various outcomes are not calculated.

Limit superior and limit inferior of a sequence of sets
Consider any sequence A n f g ∞ n¼1 of subsets of a set Ω. Points of Ω will be denoted by ω. We know that To better understand this somewhat complicated set, we first let n ¼ 1 and note that ω ∈ A k for some k ≥ 1, say k ¼ k 1 . Letting n ¼ k 1 , we see that ω must belong to some A k 2 , where k 2 ≥ k 1 . Continuing in this fashion, we see that ω ∈ ∩ ∞ n¼1 ∪ ∞ k¼n A k if and only if ω ∈ A k for infinitely many k's.
The set ∩ ∞ n¼1 ∪ ∞ k¼n A k is called the limit superior of the sequence A n f g ∞ n¼1 , and is denoted by lim sup A n , or, lim A n , or, rather appropriately, by A n i:o: f g, where i.o. stands for "infinitely often." In a similar fashion, we observe that the set is a collection of those points ω that belong to all but a finite number of the A n 's. ∪ ∞ n¼1 ∩ ∞ k¼n A k is called the limit inferior of the sequence A n f g ∞ n¼1 and is usually denoted by lim inf A n or lim A n . We prefer the notation A n a:b:f:o f g(a.b.f.o means "all but finitely often"). Elementary symbol manipulation may be used to prove that lim A n ⊂ lim A n . It is easier to note, however, that if ω belongs to all but finitely many A n 's it must necessarily belong to an infinite number of them. The above fact is just one of the many similarities between lim sups and lim infs of sets, on the one hand, and of real numbers, on the other (recall that lim inf n!∞ a n ≤ lim sup n!∞ a n ). Likewise lim inf n!∞ a n and lim sup n!∞ a n must both exist, that is, À∞ ≤ lim inf a n ≤ lim supa n ≤ ∞, as must lim A n and lim A n , with the sequence A n f g is said to have a limit, which we define to be the common value of A n a:b:f:o f gand A n i:o: f gand denote by lim n!∞ A n (note, again, the analogy with real sequences).
A useful dual relation between these two sets is where A c denotes the complement of the set A. Here is an informal proof of the first of these two facts: ω ∈ lim A c n iff ω belongs to all but finitely many A c n 's iff ω ∈ just a finite number of the A n 's iff ω ∉ lim A n iff ω ∈ lim A n À Á c .
A few examples should help familiarize the reader with the above notions: the second and the third are taken from [1]: Example 3. if A n is the unit circle with center at À1 ð Þ n n , 0 , then In what follows, the set Ω will be taken to be the sample space (or set of possible realizations) of a random experiment (one whose outcome cannot be predicted in advance). We shall assume that each subset of Ω that we encounter is measurable. In other words, each set A will be assumed to belong to the sigma algebra A of events, which is a class of subsets of Ω satisfying the conditions This restriction ensures that sets such as lim A n and lim A n are themselves measurable, so that we may meaningfully talk of their probabilities P A n i:o: We next move on to a key concept in probability: Definition: The sequence of events A n f g ∞ n¼1 will be said to be independent, if for each finite subcollection A n 1 , A n 2 , … , A n k , Stated informally, this means that the occurrence (or nonoccurrence) of any finite subcollection A n 1 , A n 2 , … , A n k È É does not affect the probability of occurrence of another disjoint collection A m 1 The events A n f g ∞ n¼1 that represent the successive outcomes of an infinite cointossing experiment are usually assumed, on intuitive and empirical grounds, to be independent. We shall make the same assumption regarding Sue's successive choices B n f g ∞ n¼1 of a keyboard's key. The Borel-Cantelli lemma is a two-pronged theorem, which asserts that the probability of occurrence of an infinite number of the independent events A n f g ∞ n¼1 is zero or one: [3,4]).
a. If A n f g ∞ n¼1 is any sequence of events, then is an independence sequence, then P A n i:o: ð Þequals 0 or 1 according as the series P ∞ n¼1 P A n ð Þ converges or diverges.
The following lemma can be proved using elementary properties of probability measures: a. For any n, we note that On letting n ! ∞, we see that where the last inequality follows from the fact that 1 À x ≤ e Àx x ≥ 0 ð Þ. Lemma 2.1 now gives us that proving the result.
As seen by the dates on Refs. [3,4], the Borel-Cantelli lemmas are classical, and now part of virtually all graduate level books on Probability such as [1]. Since then, for over 100 years, the literature on the lemmas has focused on weakening the independence requirement in the second lemma, or looking at more complicated probability models that yield the same conclusions. See for example [8][9][10]. What distinguishes this work from these and others is that we provide a very down-toearth application that forces the reader to come to terms with the notions of independence and infinity, as opposed to the finite samples one has in statistical situations. It is a paper that we feel can cause amusement, astonishment, false disbelief, and, ultimately, understanding. With this backdrop, we are now in a position to start establishing the claim made at the beginning of this chapter:

. (Statistical tests of hypotheses)
If a fair coin is tossed infinitely often, a sequence of 10 6 consecutive heads will appear infinitely often with probability 1. Now, if a coin (of unknown origin) were tossed a million times, and a head appeared each time, the "null" statistical hypothesis H 0 : The coin is fair p ¼ 1=2 ð Þ would be summarily rejected at most conventional (5, 1, 0.00001%) levels of significance. The point to note, however, is that such "extreme" and "erratic" behavior will be exhibited on an infinite number of occasions by any fair coin (and by all coins with P H ð Þ> 0Þ, with a probability of 1. Similarly, if a fair coin is tossed infinitely often, an n-long alternating sequence HTHT … HT (n is arbitrary) will appear infinitely often, almost certainly. This fact may be compared with the conclusion of a standard nonparametric statistical procedure, the run test: the fair coin hypothesis would be vigorously rejected, using this test, if a large number of coin tosses yielded an alternating sequence of heads and tails.

A probability model for infinite coin tossing
In the above discussion, we often concluded that a particular event (e.g., Hamlet is typed infinitely often) occurred with probability 1. One fundamental question that we did not address, however, was the following: just what probability model describes infinite coin tossing or simian typewriting? Put another way, what are the sample spaces associated with these two experiments? And what exactly is the probability of an event defined to be? We realize then, in retrospect, that we had put the cart before the horse; various events were shown to have probability 1, by assuming the existence of a logically consistent probability (measure) on a sample space that had not been fully described. This practice is fairly standard in the teaching of probability; for example, sequences X n f g ∞ n¼1 of independent and identically distributed (i.i.d.) random variables are often introduced as mathematical objects before their existence is proved, using Kolmogorov's famous Consistency Theorem. Such an approach is often beneficial; as Billingsley [2] wrote, "It is instructive... to see the show in rehearsal as well as in performance." We shall start by noting that three tosses of a fair coin lead to the eight-point sample space Ω ¼ HHH, HHT, HTH, HTT, THH, THT, TTH, TTT f g It seems reasonable to assign probability 1/8 to each of these eight points; thus the probability P A ð Þ of any subset A may be defined by Our analysis is thus complete, and can easily be extended to any finite number of coin tosses. The situation gets rapidly more complicated if the coin is tossed endlessly. This experiment cannot be conceived, carried out, or justified "in practice," and our neat conclusions would be rendered meaningless if we were unable to mathematically model our procedure. Happily, however, this is not the case. We simply let Ω ¼ ω : ω is an infinite sequence of H 0 s and T 0 s f g ¼ ω : ω is an infinite sequence of 1 0 s and 0 0 s f g A typical element of Ω might be ω ¼ 0010011 … . It is well known (and easily proved) that Ω is an uncountable set. It seems reasonable, then, to assign probability zero to each sample point. The next step is crucial. We identify each element of Ω with the real number in the interval [0,1] that has the same binary expansion. For example, the sample outcome THTHTH ... is identified with the real number 0.01010101 ... which equals A problem arises immediately: Numbers of the form k=2 n , where k and n are positive integers, do not have a unique binary representation. In other words, two different sample outcomes such as HTTTTT... and THHHH... would correspond to the same real number 1/2 (since 1/2 = 0.0111... = 0.100...), and the correspondence between Ω and [0,1] would not, consequently, be one-to-one. We note, however, that numbers of the form k=2 n constitute a denumerable set, and that there are two sample outcomes that correspond to each such number. If one, but not both, of each of these outcomes were to be removed from Ω, we would be left with a one-to-one map from a censored sample space Ω 0 onto [0,1]. Moreover, our assumption regarding individual sample points forces P ΩnΩ 0 ð Þto equal zero. Thus, if a set of zero probability is thrown out from the original sample space, we may let Ω ¼ 0, 1 ½ and derive great satisfaction from the knowledge that this would not change the answer to any of our probability calculations.
It is possible to show, in a somewhat non-rigorous fashion (i.e., without using much measure theory), or rigorously, by introducing Lebesgue measure, that infinite coin tossing is mathematically equivalent to choosing a number randomly from the interval [0,1]. It can be shown, in a completely analogous way, that infinite random typewriting is equivalent to the single random choice of a number in [0,1]. We need of course, to consider the N-ary representation of numbers in [0,1], instead of their binary expansion (where N ithe number of typewriter keys). However, we shall not do so here.
Example 5. (Random Numbers) Let the random variable X denote the random choice of a number from [0,1]. Then where r n is the n'th rational. Since, P X ¼ r n ð Þ¼0 for each n, we have that This result may be compared with a mundane fact of "reality": If a person, computer, pointer, or random number generator were asked to choose X, limitations of measurement accuracy (or decimal point restrictions) would systematically exclude irrational X's, leading to the "conclusion" that P X is rational ð Þ¼1!
We would like to next state a thrilling result, called Borel's law of normal numbers [3]: A number in [0,1] is said to be normal, if its decimal representation has, asymptotically, an equal frequency of the digits 0 through 9:  for each j ¼ 0, 1, 2, ::::9. Borel's law states that which is somewhat surprising, since it is awfully hard to think of a single number that is normal (the number 0.012345678910111213..., obtained by writing each integer successively, is known to be normal; the proof is not trivial).
The Borel-Cantelli lemma yields several consequences that may, at first glance, seem to contradict Borel's normal number law: Almost all the numbers in [0,1] (i.e., all except some with zero Lebesgue measure) have decimal expansions that contain infinitely many chains of length 1000, say, that contain no numbers except 2,3, and 4. The nice part is, of course, that almost all of these numbers are normal as well, and so on.
The moral of the Borel-Cantelli lemma should, by now, be quite clear: "The realization of a truly random infinite procedure will, with probability one, contain infinitely many segments that exhibit extreme 'non-randomness', of all sizes, patterns and intensities." The Borel-Cantelli lemma is, after all, a limit theorem of probability, and a quote from the classic treatise of Gnedenko and Kolmogorov [6] might be in order as well: "In reality, however, the epistemological value of the theory of probability is revealed only by limit theorems. Moreover, without limit theorems, it is impossible to understand the real content of the primary concept of all our sciences-the concept of probability."

Conclusions and future developments
The main results of this chapter, accessible to a second-year undergraduate, are Corollaries 2.3 and 2.5. They follow from the Borel-Cantelli lemmas and Boole's inequality, respectively. Corollary 2.3 states that in an infinite sequence of keystrokes, any fixed-length "work" appears infinitely often with probability 1. Most undergraduates that the author has taught have great difficulty believing this fact, since most statistical tests, for example, are based on finite samples. Corollary 2.5 goes one step further, proving that every finite-length piece of work, even those yet unwritten, will each appear infinitely often with probability 1. The undergraduate reader will undoubtedly appreciate the "power of infinity" on reading this chapter, while graduate students will enjoy a nonpractical yet deep application of the Borel-Cantelli lemmas.
Example 4 makes a contrast between the finite situation and the infinite one. An important practical problem in this regard would be to use Poisson approximations as in [7] to find the approximate probability that a specific work occurs x times in n keystrokes and to use this process as the basis of a statistical test for randomness.