Open access peer-reviewed chapter

The Borel-Cantelli Lemmas, and Their Relationship to Limit Superior and Limit Inferior of Sets (or, Can a Monkey Really Type Hamlet?)

Written By

Anant P. Godbole

Reviewed: 04 June 2020 Published: 09 July 2020

DOI: 10.5772/intechopen.93121

From the Edited Volume

Number Theory and Its Applications

Edited by Cheon Seoung Ryoo

Chapter metrics overview

892 Chapter Downloads

View Full Metrics


The purpose of this chapter is to show that if a monkey types infinitely, Shakespeare’s Hamlet and any other works one may wish to add to the list will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence, together with the notion of limit superior and limit inferior of a sequence of sets.


  • Borel-Cantelli lemma
  • limit superior of sets
  • limit imferior of sets
  • independence
  • limit theorems of probability

1. Introduction

Consider a monkey named Sue who is given a word processor with N symbols. We shall assume that these symbols include the 26 letters of the English alphabet (upper and lower case), all the Greek letters, the numbers 0 through 9, a blank space, all the standard punctuation marks (,.; − etc.), and mathematical symbols (, , , , etc.); imagine, in fact, that N is so large that the keyboard is capable of typing just anything we might fancy, in any language. (A LaTEX editor could do much of that too, but not in all languages!)

If Sue is handed such a machine and pounds away, randomly, it is clear that most of what she types will be complete gibberish. A stray segment of sense such as “dog eat HAy” or even “Ronnie ReAGan our 40th PresiDεητ of USA” would surprise no one, but how often, we ask, would she successfully manage to type the Constitution of the United States, or Shakespeare’s Hamlet, or the fundamental mathematical works of the 2026 Fields Medalist(s)?

The purpose of this chapter is to show that if Sue types infinitely, the above works (and any others that one may choose to add to the list) will each be typed, not once, not twice, but infinitely often with a probability of 1. This dramatic fact is a simple consequence of the Borel-Cantelli lemma and will come as no surprise to anyone who has taken a serious graduate-level course in Probability. The proof of this result, however, is quite accessible to anyone who has but a rudimentary understanding of the concept of independence.

The reader is invited, while reading this chapter, to let his/her imagination run wild, and concoct a plethora of similar examples. A somewhat mundane objection may be raised immediately: how can Sue (or anyone else for that matter) type indefinitely? We shall not dwell on this nonmathematical problem, but will remark instead (and prove a little later) that Sue’s never-ending assignment is mathematically equivalent to the task of randomly selecting a number from the interval 01.

We would like to mention that our problem is related to the famous “Problem of a printed line,” a popular account of which can be found in George Gamow’s classic book [1]. The solution presented there, however, is entirely deterministic and of a finite character: the automatic printing press considered by Gamow does not print indefinitely, and the probabilities of various outcomes are not calculated.


2. Limit superior and limit inferior of a sequence of sets

Consider any sequence Ann=1 of subsets of a set Ω. Points of Ω will be denoted by ω. We know that




It follows that


To better understand this somewhat complicated set, we first let n=1 and note that ωAk for some k1, say k=k1. Letting n=k1, we see that ω must belong to some Ak2, where k2k1. Continuing in this fashion, we see that ωn=1k=nAk if and only if ωAk for infinitely many k‘s.

The set n=1k=nAk is called the limit superior of the sequence Ann=1, and is denoted by limsupAn, or, lim¯An, or, rather appropriately, by Ani.o., where i.o. stands for “infinitely often.”

In a similar fashion, we observe that the set


is a collection of those points ω that belong to all but a finite number of the An‘s. n=1k=nAk is called the limit inferior of the sequence Ann=1 and is usually denoted by liminfAn or lim¯An. We prefer the notation Ana.b.f.o (a.b.f.o means “all but finitely often”). Elementary symbol manipulation may be used to prove that lim¯Anlim¯An. It is easier to note, however, that if ω belongs to all but finitely many An‘s it must necessarily belong to an infinite number of them. The above fact is just one of the many similarities between lim sups and lim infs of sets, on the one hand, and of real numbers, on the other (recall that liminfnanlimsupnan). Likewise liminfnan and limsupnan must both exist, that is,


as must lim¯An and lim¯An, with


If Ana.b.f.o=Ani.o., the sequence An is said to have a limit, which we define to be the common value of Ana.b.f.o and Ani.o. and denote by limnAn (note, again, the analogy with real sequences).

A useful dual relation between these two sets is




where Ac denotes the complement of the set A. Here is an informal proof of the first of these two facts: ωlim¯Anc iff ω belongs to all but finitely many Anc‘s iff ω just a finite number of the An‘s iff ωlim¯An iff ωlim¯Anc.

A few examples should help familiarize the reader with the above notions: the second and the third are taken from [2]:

Example 1. If A1A2, then lim¯An=lim¯An=n=1An. Likewise, if A1A2A3, then lim¯An=lim¯An=n=1An.

Example 2. If A2n1=1n1 and A2n=11n, then lim¯An=0 and lim¯An=11.

Example 3. if An is the unit circle with center at 1nn0, then




In what follows, the set Ω will be taken to be the sample space (or set of possible realizations) of a random experiment (one whose outcome cannot be predicted in advance). We shall assume that each subset of Ω that we encounter is measurable. In other words, each set A will be assumed to belong to the sigma algebraA of events, which is a class of subsets of Ω satisfying the conditions

  • ϕA

  • If AA, then AcA and

  • If A1,A2,A, then n=1AnA.

This restriction ensures that sets such as lim¯An and lim¯An are themselves measurable, so that we may meaningfully talk of their probabilities PAni.o. and PAna.b.f.o. We next move on to a key concept in probability:

Definition: The sequence of events Ann=1 will be said to be independent, if for each finite subcollection An1,An2,,Ank,


Stated informally, this means that the occurrence (or nonoccurrence) of any finite subcollection An1An2Ank does not affect the probability of occurrence of another disjoint collection Am1Am2Am.

The events Ann=1 that represent the successive outcomes of an infinite coin-tossing experiment are usually assumed, on intuitive and empirical grounds, to be independent. We shall make the same assumption regarding Sue’s successive choices Bnn=1 of a keyboard’s key.

The Borel-Cantelli lemma is a two-pronged theorem, which asserts that the probability of occurrence of an infinite number of the independent events Ann=1 is zero or one:

Theorem 2.1.(The Borel-Cantelli lemma, [3, 4]).

  1. If Ann=1is any sequence of events, thenn=1PAn<implies thatPAni.o.=0.

  2. IfAnn=1is an independence sequence, thenPAni.o.equals 0 or 1 according as the seriesn=1PAnconverges or diverges.

The following lemma can be proved using elementary properties of probability measures:

Lemma 2.2.IfAnn=1is an increasing (decreasing) sequence of events thenlimnPAn=Pn=1AnPn=1An.

Proof of Theorem 2.1

  1. For any n, we note that


    On letting n, we see that


    proving part (a).

  2. We shall prove that PAnca.b.f.o=0. Let m,nmn be arbitrary. Note that


    where the last inequality follows from the fact that 1xexx0. Lemma 2.1 now gives us that


    which proves that


    proving the result.

As seen by the dates on Refs. [3, 4], the Borel-Cantelli lemmas are classical, and now part of virtually all graduate level books on Probability such as [2]. Since then, for over 100 years, the literature on the lemmas has focused on weakening the independence requirement in the second lemma, or looking at more complicated probability models that yield the same conclusions. See for example [5, 6, 7]. What distinguishes this work from these and others is that we provide a very down-to-earth application that forces the reader to come to terms with the notions of independence and infinity, as opposed to the finite samples one has in statistical situations. It is a paper that we feel can cause amusement, astonishment, false disbelief, and, ultimately, understanding. With this backdrop, we are now in a position to start establishing the claim made at the beginning of this chapter:

Corollary 2.3.If Sue types indefinitely, by successively and independently choosing one of theNavailable keys, then any specific work containing a total ofMcharacters (including all the blanks, of course) will be typed by her indfinitely often, with a probability of 1.

Proof. Let A1 be the event that Sue’s first M random choices lead to the work being typed correctly. It is clear that PA1=1NM. Similarly, let A2 denote a successful completion of the task between the M+1st and 2Mth keystrokes. In general, An denotes the completion of a flawless job between n1M+1st and the nMth random strokes. It is evident that PAn=1NMn1 and that the sequence Ann=1 is independent. Since n=1PAn=n=11NM=, it follows by part (b) of the Borel-Cantelli lemma that PAni.o.=1. Since the probability that the work is typed correctly infinitely often is at least as large as PAni.o., the proof is complete. (Notice how the events Ann=1 are defined using disjoint blocks; this guarantees their independence.)

Lemma 2.4.IfP(Cn) = 1 (n = 1,2,…), thenPn=1Cn=1.

Proof: Boole’s inequality states that


Thus Pk=1nCk=1 for each n. The required conclusion is obtained on letting n, and using Lemma 2.2.

Corollary 2.5.If Sue types indefinitely, then every piece of writing (of any finite length whatsoever; published, unpublished, or yet to be written by yet unborn individuals; meaningful or gibberish) will be typed by her, infinitely often each, with a probability of 1.

Proof. Denote the works by B1,B2,., and set An=Bni.o.,n=12. By Lemma 2.2, PAn=1 for each n and thus, by Lemma 2.4, Pn=1An=1, as claimed. (Note: this implies that the probability that some work is typed finitely often (or never) is zero.)

Example 4. (Statistical tests of hypotheses) If a fair coin is tossed infinitely often, a sequence of 106 consecutive heads will appear infinitely often with probability 1.

Now, if a coin (of unknown origin) were tossed a million times, and a head appeared each time, the “null” statistical hypothesis


would be summarily rejected at most conventional (5, 1, 0.00001%) levels of significance. The point to note, however, is that such “extreme” and “erratic” behavior will be exhibited on an infinite number of occasions by any fair coin (and by all coins with PH>0), with a probability of 1.

Similarly, if a fair coin is tossed infinitely often, an n-long alternating sequence HTHTHT (n is arbitrary) will appear infinitely often, almost certainly. This fact may be compared with the conclusion of a standard nonparametric statistical procedure, the run test: the fair coin hypothesis would be vigorously rejected, using this test, if a large number of coin tosses yielded an alternating sequence of heads and tails.


3. A probability model for infinite coin tossing

In the above discussion, we often concluded that a particular event (e.g., Hamlet is typed infinitely often) occurred with probability 1. One fundamental question that we did not address, however, was the following: just what probability model describes infinite coin tossing or simian typewriting? Put another way, what are the sample spaces associated with these two experiments? And what exactly is the probability of an event defined to be? We realize then, in retrospect, that we had put the cart before the horse; various events were shown to have probability 1, by assuming the existence of a logically consistent probability (measure) on a sample space that had not been fully described. This practice is fairly standard in the teaching of probability; for example, sequences Xnn=1 of independent and identically distributed (i.i.d.) random variables are often introduced as mathematical objects before their existence is proved, using Kolmogorov’s famous Consistency Theorem. Such an approach is often beneficial; as Billingsley [8] wrote, “It is instructive… to see the show in rehearsal as well as in performance.”

We shall start by noting that three tosses of a fair coin lead to the eight-point sample space


It seems reasonable to assign probability 1/8 to each of these eight points; thus the probability PA of any subset A may be defined by


Our analysis is thus complete, and can easily be extended to any finite number of coin tosses. The situation gets rapidly more complicated if the coin is tossed endlessly. This experiment cannot be conceived, carried out, or justified “in practice,” and our neat conclusions would be rendered meaningless if we were unable to mathematically model our procedure. Happily, however, this is not the case. We simply let


A typical element of Ω might be ω=0010011. It is well known (and easily proved) that Ω is an uncountable set. It seems reasonable, then, to assign probability zero to each sample point.

The next step is crucial. We identify each element of Ω with the real number in the interval [0,1] that has the same binary expansion. For example, the sample outcome THTHTH … is identified with the real number 0.01010101 … which equals


A problem arises immediately: Numbers of the form k/2n, where k and n are positive integers, do not have a unique binary representation. In other words, two different sample outcomes such as HTTTTT… and THHHH… would correspond to the same real number 1/2 (since 1/2 = 0.0111… = 0.100…), and the correspondence between Ω and [0,1] would not, consequently, be one-to-one. We note, however, that numbers of the form k/2n constitute a denumerable set, and that there are two sample outcomes that correspond to each such number. If one, but not both, of each of these outcomes were to be removed from Ω, we would be left with a one-to-one map from a censored sample space Ω onto [0,1]. Moreover, our assumption regarding individual sample points forces PΩ\Ω to equal zero. Thus, if a set of zero probability is thrown out from the original sample space, we may let Ω=01 and derive great satisfaction from the knowledge that this would not change the answer to any of our probability calculations.

It is possible to show, in a somewhat non-rigorous fashion (i.e., without using much measure theory), or rigorously, by introducing Lebesgue measure, that infinite coin tossing is mathematically equivalent to choosing a number randomly from the interval [0,1]. It can be shown, in a completely analogous way, that infinite random typewriting is equivalent to the single random choice of a number in [0,1]. We need of course, to consider the N-ary representation of numbers in [0,1], instead of their binary expansion (where N ithe number of typewriter keys). However, we shall not do so here.

Example 5. (Random Numbers) Let the random variable X denote the random choice of a number from [0,1]. Then


where rn is the n‘th rational. Since, PX=rn=0 for each n, we have that


This result may be compared with a mundane fact of “reality”: If a person, computer, pointer, or random number generator were asked to choose X, limitations of measurement accuracy (or decimal point restrictions) would systematically exclude irrational X’s, leading to the “conclusion” that


We would like to next state a thrilling result, called Borel’s law of normal numbers [3]: A number in [0,1] is said to be normal, if its decimal representation has, asymptotically, an equal frequency of the digits 0 through 9:


for each j=0,1,2,….9. Borel’s law states that


which is somewhat surprising, since it is awfully hard to think of a single number that is normal (the number 0.012345678910111213…, obtained by writing each integer successively, is known to be normal; the proof is not trivial).

The Borel-Cantelli lemma yields several consequences that may, at first glance, seem to contradict Borel’s normal number law:

Almost all the numbers in [0,1] (i.e., all except some with zero Lebesgue measure) have decimal expansions that contain infinitely many chains of length 1000, say, that contain no numbers except 2,3, and 4. The nice part is, of course, that almost all of these numbers are normal as well, and so on.

The moral of the Borel-Cantelli lemma should, by now, be quite clear: “The realization of a truly random infinite procedure will, with probability one, contain infinitely many segments that exhibit extreme ‘non-randomness’, of all sizes, patterns and intensities.” The Borel-Cantelli lemma is, after all, a limit theorem of probability, and a quote from the classic treatise of Gnedenko and Kolmogorov [9] might be in order as well: “In reality, however, the epistemological value of the theory of probability is revealed only by limit theorems. Moreover, without limit theorems, it is impossible to understand the real content of the primary concept of all our sciences—the concept of probability.”


4. Conclusions and future developments

The main results of this chapter, accessible to a second-year undergraduate, are Corollaries 2.3 and 2.5. They follow from the Borel-Cantelli lemmas and Boole’s inequality, respectively. Corollary 2.3 states that in an infinite sequence of keystrokes, any fixed-length “work” appears infinitely often with probability 1. Most undergraduates that the author has taught have great difficulty believing this fact, since most statistical tests, for example, are based on finite samples. Corollary 2.5 goes one step further, proving that every finite-length piece of work, even those yet unwritten, will each appear infinitely often with probability 1. The undergraduate reader will undoubtedly appreciate the “power of infinity” on reading this chapter, while graduate students will enjoy a nonpractical yet deep application of the Borel-Cantelli lemmas.

Example 4 makes a contrast between the finite situation and the infinite one. An important practical problem in this regard would be to use Poisson approximations as in [10] to find the approximate probability that a specific work occurs x times in n keystrokes and to use this process as the basis of a statistical test for randomness.


  1. 1. Gamow G. One Two Three Infinity. New York: Bantam Books; 1971
  2. 2. Ash RB. Real Analysis and Probability. New York: Academic Press; 1972
  3. 3. Borel E. Les probabilités dénombrables et leurs applications arithmétiques. Rendiconti del Circolo Matematico di Palermo. 1909;27:247-271
  4. 4. Cantelli FP. Su due applicazioni di un teorema di G. Boole. Reale Academia Nazionale dei Lincei. 1917;26:295-302
  5. 5. Balakrishnan N, Stepanov A. Generalization of the Borel-Cantelli lemma. The Mathematical Scientist. 2010;35:6162
  6. 6. Stepanov A. On strong convergence. Communications in Statistics, Theory and Methods. 2015;44:1615-1620
  7. 7. Petrov VV. A generalization of the Borel-Cantelli lemma. Statistics & Probability Letters. 2004;67:233-239
  8. 8. Billingsley P. Probability and Measure. New York: John Wiley & Sons, Inc.; 1978
  9. 9. Gnedenko BV, Kolmogorov AN. Limit Theorems for Sums of Independent Random Variables. Boston: Addison Wesley; 1954
  10. 10. Godbole A, Schaffner A. Improved Poisson approximations for word patterns. Advances in Applied Probability. 1993;25:334-347

Written By

Anant P. Godbole

Reviewed: 04 June 2020 Published: 09 July 2020