Open access peer-reviewed chapter

The Borel-Cantelli Lemmas, and Their Relationship to Limit Superior and Limit Inferior of Sets (or, Can a Monkey Really Type Hamlet?)

Written By

Anant P. Godbole

Submitted: March 6th, 2020 Reviewed: June 4th, 2020 Published: July 9th, 2020

DOI: 10.5772/intechopen.93121

Chapter metrics overview

577 Chapter Downloads

View Full Metrics


The purpose of this chapter is to show that if a monkey types infinitely, Shakespeare’s Hamlet and any other works one may wish to add to the list will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence, together with the notion of limit superior and limit inferior of a sequence of sets.


  • Borel-Cantelli lemma
  • limit superior of sets
  • limit imferior of sets
  • independence
  • limit theorems of probability

1. Introduction

Consider a monkey named Sue who is given a word processor with Nsymbols. We shall assume that these symbols include the 26 letters of the English alphabet (upper and lower case), all the Greek letters, the numbers 0 through 9, a blank space, all the standard punctuation marks (,.; − etc.), and mathematical symbols (, , , , etc.); imagine, in fact, that Nis so large that the keyboard is capable of typing just anything we might fancy, in any language. (A LaTEX editor could do much of that too, but not in all languages!)

If Sue is handed such a machine and pounds away, randomly, it is clear that most of what she types will be complete gibberish. A stray segment of sense such as “dog eat HAy” or even “Ronnie ReAGan our 40th PresiDεητof USA” would surprise no one, but how often, we ask, would she successfully manage to type the Constitution of the United States, or Shakespeare’s Hamlet, or the fundamental mathematical works of the 2026 Fields Medalist(s)?

The purpose of this chapter is to show that if Sue types infinitely, the above works (and any others that one may choose to add to the list) will eachbe typed, not once, not twice, but infinitely oftenwith a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemmaand will come as no surprise to anyone who has taken a serious graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence.

The reader is invited, while reading this chapter, to let his/her imagination run wild, and concoct a plethora of similar examples. A somewhat mundane objection may be raised immediately: how can Sue (or anyone else for that matter) type indefinitely? We shall not dwell on this nonmathematical problem, but will remark instead (and prove a little later) that Sue’s never-ending assignment is mathematically equivalent to the task of randomly selecting a number from the interval 01.

We would like to mention that our problem is related to the famous “Problem of a printed line,” a popular account of which can be found in George Gamow’s classic book [1]. The solution presented there, however, is entirely deterministic and of a finite character: the automatic printing press considered by Gamow does not print indefinitely, and the probabilities of various outcomes are not calculated.


2. Limit superior and limit inferior of a sequence of sets

Consider any sequence Ann=1of subsets of a set Ω. Points of Ωwill be denoted by ω. We know that




It follows that


To better understand this somewhat complicated set, we first let n=1and note that ωAkfor some k1, say k=k1. Letting n=k1, we see that ωmust belong to some Ak2, where k2k1. Continuing in this fashion, we see that ωn=1k=nAkif and only if ωAkfor infinitely many k‘s.

The set n=1k=nAkis called the limit superiorof the sequence Ann=1, and is denoted by limsupAn, or, lim¯An, or, rather appropriately, by Ani.o., where i.o. stands for “infinitely often.”

In a similar fashion, we observe that the set


is a collection of those points ωthat belong to all but a finite numberof the An‘s. n=1k=nAkis called the limit inferiorof the sequence Ann=1and is usually denoted by liminfAnor lim¯An. We prefer the notation Ana.b.f.o(a.b.f.o means “all but finitely often”). Elementary symbol manipulation may be used to prove that lim¯Anlim¯An. It is easier to note, however, that if ωbelongs to all but finitely many An‘s it must necessarily belong to an infinite number of them. The above fact is just one of the many similarities between lim sups and lim infs of sets, on the one hand, and of real numbers, on the other (recall that liminfnanlimsupnan). Likewise liminfnanand limsupnanmust both exist, that is,


as must lim¯Anand lim¯An, with


If Ana.b.f.o=Ani.o., the sequence Anis said to have a limit, which we define to be the common value of Ana.b.f.oand Ani.o.and denote by limnAn(note, again, the analogy with real sequences).

A useful dual relation between these two sets is




where Acdenotes the complement of the set A. Here is an informal proof of the first of these two facts: ωlim¯Anciff ωbelongs to all but finitely many Anc‘s iff ωjust a finite number of the An‘s iff ωlim¯Aniff ωlim¯Anc.

A few examples should help familiarize the reader with the above notions: the second and the third are taken from [2]:

Example 1. If A1A2,then lim¯An=lim¯An=n=1An. Likewise, if A1A2A3, then lim¯An=lim¯An=n=1An.

Example 2. If A2n1=1n1and A2n=11n, then lim¯An=0and lim¯An=11.

Example 3. if Anis the unit circle with center at 1nn0, then




In what follows, the set Ωwill be taken to be the sample space(or set of possible realizations) of a random experiment (one whose outcome cannot be predicted in advance). We shall assume that each subset of Ωthat we encounter is measurable. In other words, each set Awill be assumed to belong to the sigma algebraAof events, which is a class of subsets of Ωsatisfying the conditions

  • ϕA

  • If AA, then AcAand

  • If A1,A2,A, then n=1AnA.

This restriction ensures that sets such as lim¯Anand lim¯Anare themselves measurable, so that we may meaningfully talk of their probabilities PAni.o.and PAna.b.f.o.We next move on to a key concept in probability:

Definition: The sequence of events Ann=1will be said to be independent, if for each finite subcollection An1,An2,,Ank,


Stated informally, this means that the occurrence (or nonoccurrence) of any finite subcollection An1An2Ankdoes not affect the probability of occurrence of another disjoint collection Am1Am2Am.

The events Ann=1that represent the successive outcomes of an infinite coin-tossing experiment are usually assumed, on intuitive and empirical grounds, to be independent. We shall make the same assumption regarding Sue’s successive choices Bnn=1of a keyboard’s key.

The Borel-Cantelli lemmais a two-pronged theorem, which asserts that the probability of occurrence of an infinite number of the independent events Ann=1is zero or one:

Theorem 2.1.(The Borel-Cantelli lemma,[3, 4]).

  1. If Ann=1is any sequence of events, thenn=1PAn<implies thatPAni.o.=0.

  2. IfAnn=1is an independence sequence, thenPAni.o.equals0 or1 according as the seriesn=1PAnconverges or diverges.

The following lemma can be proved using elementary properties of probability measures:

Lemma 2.2.IfAnn=1is an increasing (decreasing) sequence of events thenlimnPAn=Pn=1AnPn=1An.

Proof of Theorem 2.1

  1. For any n, we note that


    On letting n, we see that


    proving part (a).

  2. We shall prove that PAnca.b.f.o=0. Let m,nmnbe arbitrary. Note that


    where the last inequality follows from the fact that 1xexx0. Lemma 2.1 now gives us that


    which proves that


    proving the result.

As seen by the dates on Refs. [3, 4], the Borel-Cantelli lemmas are classical, and now part of virtually all graduate level books on Probability such as [2]. Since then, for over 100 years, the literature on the lemmas has focused on weakening the independence requirement in the second lemma, or looking at more complicated probability models that yield the same conclusions. See for example [5, 6, 7]. What distinguishes this work from these and others is that we provide a very down-to-earth application that forces the reader to come to terms with the notions of independence and infinity, as opposed to the finite samples one has in statistical situations. It is a paper that we feel can cause amusement, astonishment, false disbelief, and, ultimately, understanding. With this backdrop, we are now in a position to start establishing the claim made at the beginning of this chapter:

Corollary 2.3.If Sue types indefinitely, by successively and independently choosing one of theNavailable keys, then any specific work containing a total ofMcharacters (including all the blanks, of course) will be typed by her indfinitely often, with a probability of 1.

Proof.Let A1be the event that Sue’s first Mrandom choices lead to the work being typed correctly. It is clear that PA1=1NM. Similarly, let A2denote a successful completion of the task between the M+1stand 2Mthkeystrokes. In general, Andenotes the completion of a flawless job between n1M+1stand the nMthrandom strokes. It is evident that PAn=1NMn1and that the sequence Ann=1is independent. Since n=1PAn=n=11NM=,it follows by part (b) of the Borel-Cantelli lemma that PAni.o.=1. Since the probability that the work is typed correctly infinitely often is at least as large as PAni.o., the proof is complete. (Notice how the events Ann=1are defined using disjoint blocks; this guarantees their independence.)

Lemma 2.4.IfP(Cn) = 1 (n = 1,2,…), thenPn=1Cn=1.

Proof:Boole’s inequality states that


Thus Pk=1nCk=1for each n. The required conclusion is obtained on letting n,and using Lemma 2.2.

Corollary 2.5.If Sue types indefinitely, then every piece of writing (of any finite length whatsoever; published, unpublished, or yet to be written by yet unborn individuals; meaningful or gibberish) will be typed by her, infinitely often each, with a probability of 1.

Proof.Denote the works by B1,B2,., and set An=Bni.o.,n=12.By Lemma 2.2, PAn=1for each nand thus, by Lemma 2.4, Pn=1An=1, as claimed. (Note: this implies that the probability that somework is typed finitely often (or never) is zero.)

Example 4. (Statistical tests of hypotheses) If a fair coin is tossed infinitely often, a sequence of 106consecutive heads will appear infinitely often with probability 1.

Now, if a coin (of unknown origin) were tossed a million times, and a head appeared each time, the “null” statistical hypothesis


would be summarily rejected at most conventional (5, 1, 0.00001%) levels of significance. The point to note, however, is that such “extreme” and “erratic” behavior willbe exhibited on an infinite number of occasions by any fair coin (and by all coins with PH>0),with a probability of 1.

Similarly, if a fair coin is tossed infinitely often, an n-long alternating sequence HTHTHT(nis arbitrary) will appear infinitely often, almost certainly. This fact may be compared with the conclusion of a standard nonparametric statistical procedure, the run test: the fair coin hypothesis would be vigorously rejected, using this test, if a large number of coin tosses yielded an alternating sequence of heads and tails.


3. A probability model for infinite coin tossing

In the above discussion, we often concluded that a particular event (e.g., Hamletis typed infinitely often) occurred with probability 1. One fundamental question that we did not address, however, was the following: just what probability model describes infinite coin tossing or simian typewriting? Put another way, what are the sample spaces associated with these two experiments? And what exactly is the probability of an event defined to be? We realize then, in retrospect, that we had put the cart before the horse; various events were shown to have probability 1, by assumingthe existence of a logically consistent probability (measure) on a sample space that had not been fully described. This practice is fairly standard in the teaching of probability; for example, sequences Xnn=1of independent and identically distributed (i.i.d.) random variables are often introduced as mathematical objects before their existence is proved, using Kolmogorov’s famous Consistency Theorem. Such an approach is often beneficial; as Billingsley [8] wrote, “It is instructive… to see the show in rehearsal as well as in performance.”

We shall start by noting that three tosses of a fair coin lead to the eight-point sample space


It seems reasonable to assign probability 1/8 to each of these eight points; thus the probability PAof anysubset Amay be defined by


Our analysis is thus complete, and can easily be extended to any finite number of coin tosses. The situation gets rapidly more complicated if the coin is tossed endlessly. This experiment cannot be conceived, carried out, or justified “in practice,” and our neat conclusions would be rendered meaningless if we were unable to mathematically model our procedure. Happily, however, this is not the case. We simply let


A typical element of Ωmight be ω=0010011. It is well known (and easily proved) that Ωis an uncountable set. It seems reasonable, then, to assign probability zero to each sample point.

The next step is crucial. We identify each element of Ωwith the real number in the interval [0,1] that has the same binary expansion. For example, the sample outcome THTHTH … is identified with the real number 0.01010101 … which equals


A problem arises immediately: Numbers of the form k/2n, where kand nare positive integers, do not have a unique binary representation. In other words, two different sample outcomes such as HTTTTT… and THHHH… would correspond to the same real number 1/2 (since 1/2 = 0.0111… = 0.100…), and the correspondence between Ωand [0,1] would not, consequently, be one-to-one. We note, however, that numbers of the form k/2nconstitute a denumerable set, and that there are two sample outcomes that correspond to each such number. If one, but not both, of each of these outcomes were to be removed from Ω, we would be left with a one-to-one map from a censored sample space Ωonto [0,1]. Moreover, our assumption regarding individual sample points forces PΩ\Ωto equal zero. Thus, if a set of zero probability is thrown out from the original sample space, we may let Ω=01and derive great satisfaction from the knowledge that this would not change the answer to any of our probability calculations.

It is possible to show, in a somewhat non-rigorous fashion (i.e., without using much measure theory), or rigorously, by introducing Lebesgue measure, that infinite coin tossing is mathematically equivalent to choosing a number randomly from the interval [0,1]. It can be shown, in a completely analogous way, that infinite random typewriting is equivalent to the single random choice of a number in [0,1]. We need of course, to consider the N-ary representation of numbers in [0,1], instead of their binary expansion (where Nithe number of typewriter keys). However, we shall not do so here.

Example 5.(Random Numbers) Let the random variable Xdenote the random choice of a number from [0,1]. Then


where rnis the n‘th rational. Since, PX=rn=0for each n, we have that


This result may be compared with a mundane fact of “reality”: If a person, computer, pointer, or random number generator were asked to choose X, limitations of measurement accuracy (or decimal point restrictions) would systematically exclude irrational X’s, leading to the “conclusion” that


We would like to next state a thrilling result, called Borel’s law of normal numbers[3]: A number in [0,1] is said to be normal, if its decimal representation has, asymptotically, an equal frequency of the digits 0 through 9:


for each j=0,1,2,….9. Borel’s law states that


which is somewhat surprising, since it is awfully hard to think of a single number that isnormal (the number 0.012345678910111213…, obtained by writing each integer successively, is known to be normal; the proof is not trivial).

The Borel-Cantelli lemma yields several consequences that may, at first glance, seem to contradict Borel’s normal number law:

Almost all the numbers in [0,1] (i.e., all except some with zero Lebesgue measure) have decimal expansions that contain infinitely many chains of length 1000, say, that contain no numbers except 2,3, and 4. The nice part is, of course, that almost all of these numbers are normal as well, and so on.

The moral of the Borel-Cantelli lemma should, by now, be quite clear: “The realization of a truly random infinite procedure will, with probability one, contain infinitely many segments that exhibit extreme ‘non-randomness’, of all sizes, patterns and intensities.” The Borel-Cantelli lemma is, after all, a limit theoremof probability, and a quote from the classic treatise of Gnedenko and Kolmogorov [9] might be in order as well: “In reality, however, the epistemological value of the theory of probability is revealed only by limit theorems. Moreover, without limit theorems, it is impossible to understand the real content of the primary concept of all our sciences—the concept of probability.”


4. Conclusions and future developments

The main results of this chapter, accessible to a second-year undergraduate, are Corollaries 2.3 and 2.5. They follow from the Borel-Cantelli lemmas and Boole’s inequality, respectively. Corollary 2.3 states that in an infinite sequence of keystrokes, any fixed-length “work” appears infinitely often with probability 1. Most undergraduates that the author has taught have great difficulty believing this fact, since most statistical tests, for example, are based on finite samples. Corollary 2.5 goes one step further, proving that everyfinite-length piece of work, even those yet unwritten, will eachappear infinitely often with probability 1. The undergraduate reader will undoubtedly appreciate the “power of infinity” on reading this chapter, while graduate students will enjoy a nonpractical yet deep application of the Borel-Cantelli lemmas.

Example 4 makes a contrast between the finite situation and the infinite one. An important practical problem in this regard would be to use Poisson approximations as in [10] to find the approximate probability that a specific work occurs xtimes in nkeystrokes and to use this process as the basis of a statistical test for randomness.


  1. 1. Gamow G. One Two Three Infinity. New York: Bantam Books; 1971
  2. 2. Ash RB. Real Analysis and Probability. New York: Academic Press; 1972
  3. 3. Borel E. Les probabilités dénombrables et leurs applications arithmétiques. Rendiconti del Circolo Matematico di Palermo. 1909;27:247-271
  4. 4. Cantelli FP. Su due applicazioni di un teorema di G. Boole. Reale Academia Nazionale dei Lincei. 1917;26:295-302
  5. 5. Balakrishnan N, Stepanov A. Generalization of the Borel-Cantelli lemma. The Mathematical Scientist. 2010;35:6162
  6. 6. Stepanov A. On strong convergence. Communications in Statistics, Theory and Methods. 2015;44:1615-1620
  7. 7. Petrov VV. A generalization of the Borel-Cantelli lemma. Statistics & Probability Letters. 2004;67:233-239
  8. 8. Billingsley P. Probability and Measure. New York: John Wiley & Sons, Inc.; 1978
  9. 9. Gnedenko BV, Kolmogorov AN. Limit Theorems for Sums of Independent Random Variables. Boston: Addison Wesley; 1954
  10. 10. Godbole A, Schaffner A. Improved Poisson approximations for word patterns. Advances in Applied Probability. 1993;25:334-347

Written By

Anant P. Godbole

Submitted: March 6th, 2020 Reviewed: June 4th, 2020 Published: July 9th, 2020