Open access peer-reviewed chapter

# From Asymptotic Normality to Heavy-Tailedness via Limit Theorems for Random Sums and Statistics with Random Sample Sizes

Written By

Victor Korolev and Alexander Zeifman

Reviewed: 10 September 2019 Published: 22 October 2019

DOI: 10.5772/intechopen.89659

From the Edited Volume

## Probability, Combinatorics and Control

Edited by Andrey Kostogryzov and Victor Korolev

Chapter metrics overview

View Full Metrics

## Abstract

This chapter contains a possible explanation of the emergence of heavy-tailed distributions observed in practice instead of the expected normal laws. The bases for this explanation are limit theorems for random sums and statistics constructed from samples with random sizes. As examples of the application of general theorems, conditions are presented for the convergence of the distributions of random sums of independent random vectors with finite covariance matrices to multivariate elliptically contoured stable and Linnik distributions. Also, conditions are presented for the convergence of the distributions of asymptotically normal (in the traditional sense) statistics to multivariate Student distributions. The joint asymptotic behavior of sample quantiles is also considered.

### Keywords

• random sum
• random sample size
• multivariate normal mixtures
• heavy-tailed distributions
• multivariate stable distribution
• multivariate Linnik distribution
• Mittag-Leffler distribution
• multivariate Student distribution
• sample quantiles
• AMS 2000 Subject Classification: 60F05
• 60G50
• 60G55
• 62E20
• 62G30

## 1. Introduction

In many situations related to experimental data analysis, one often comes across the following phenomenon: although conventional reasoning based on the central limit theorem of probability theory concludes that the expected distribution of observations should be normal, instead, the statistical procedures expose the noticeable non-normality of real distributions. Moreover, as a rule, the observed non-normal distributions are more leptokurtic than the normal law, having sharper vertices and heavier tails. These situations are typical in the financial data analysis (see, e.g., Chapter 4 in [1] or Chapter 8 in [2] and references therein), in experimental physics (see, e.g., [3]), and other fields dealing with statistical analysis of experimental data. Many attempts were undertaken to explain this heavy-tailedness. Most significant theoretical breakthrough is usually associated with the results of B. Mandelbrot and others who proposed, instead of the standard central limit theorem, to use reasoning based on limit theorems for sums of random summands with infinite variances (see, e.g., [4]) resulting in non-normal stable laws as heavy-tailed models of the distributions of experimental data. However, first, in most cases the key assumption within this approach, the infiniteness of the variances of elementary summands can hardly be believed to hold in practice and, second, although more heavy-tailed than the normal law, the real distributions often turn out to be more light-tailed than the stable laws.

In this work, in order to give a more realistic explanation of the observed non-normality of the distributions of real data, an alternative approach based on limit theorems for statistics constructed from samples with random sizes is developed. Within this approach, it becomes possible to obtain arbitrarily heavy tails of the data distributions without assuming the non-existence of the moments of the observed characteristics.

This work was inspired by the publication of the paper [5] in which, based on the results of [6], a particular case of random sums was considered. One more reason for writing this work was the recent publication [7], the authors of which reproduced some results of [8, 9] without citing these earlier papers.

Here we give a more general description of the transformation of the limit distribution of a sum of independent random variables or another statistic (i.e., of a measurable function of a sample) under the replacement of the non-random number of summands or the sample size by a random variable. General limit theorems are proved (Section 3). Section 4 contains some comments on heavy-tailedness of scale mixtures of normal distributions. As examples of the application of general theorems, conditions are presented for the convergence of the distributions of random sums of independent random vectors with finite covariance matrices to multivariate elliptically contoured stable and Linnik distributions (Section 5). Also, conditions are presented for the convergence of the distributions of asymptotically normal (in the traditional sense) statistics to multivariate Student distributions (Section 6).

In Section 7, the joint asymptotic behavior of sample quantiles is considered. In applied researches related to risk analysis, such characteristic as VaR (Value-at-Risk) is very popular. Formally, VaR is a certain quantile of the observed risky value. Therefore, the joint asymptotic behavior of sample quantiles in samples with random sizes is considered in detail in Section 7 as one more example of the application of the general theorem proved in Section 3. In this section, we show how the proposed technique can be applied to the continuous-time case assuming that the sample size increases in time following a Cox process. One more interpretation of this setting is related with an important case where the sample size has the mixed Poisson distribution.

In classical problems of mathematical statistics, the size of the available sample, that is, the number of available observations, is traditionally assumed to be deterministic. In the asymptotic settings, it plays the role of infinitely increasing known parameter. At the same time, in practice very often the data to be analyzed are collected or registered during a certain period of time and the flow of informative events each of which brings a next observation forms a random point process. Therefore, the number of available observations is unknown till the end of the process of their registration and also must be treated as a (random) observation. For example, this is so in insurance statistics where, during different accounting periods, different numbers of insurance events (insurance claims and/or insurance contracts) occur and in high-frequency financial statistics where the number of events in a limit order book during a time unit essentially depends on the intensity of order flows. Moreover, contemporary statistical procedures of insurance and financial mathematics do take this circumstance into consideration as one of possible ways of dealing with heavy tails. However, in other fields such as medical statistics or quality control, this approach has not become conventional; yet, the number of patients with a certain disease varies from month to month due to seasonal factors or from year to year due to some epidemic reasons and the number of failed items varies from lot to lot. In these cases, the number of available observations as well as the observations themselves is unknown beforehand and should be treated as random to avoid underestimation of risks or error probabilities.

Therefore, it is quite reasonable to study the asymptotic behavior of general statistics constructed from samples with random sizes for the purpose of construction of suitable and reasonable asymptotic approximations. As this is so, to obtain non-trivial asymptotic distributions in limit theorems of probability theory and mathematical statistics, an appropriate centering and normalization of random variables and vectors under consideration must be used. It should be especially noted that to obtain reasonable approximation to the distribution of the basic statistics, both centering and normalizing values should be non-random. Otherwise, the approximate distribution becomes random itself and, for example, the problem of evaluation of quantiles or significance levels becomes senseless.

In asymptotic settings, statistics constructed from samples with random sizes are special cases of random sequences with random indices. The randomness of indices usually leads to the limit distributions for the corresponding random sequences being heavy-tailed even in the situations where the distributions of non-randomly indexed random sequences are asymptotically normal (see, e.g., [2, 8, 10]).

Many authors noted that the asymptotic properties of statistics constructed from samples with random samples differ from those of the asymptotically normal statistics in the classical sense. To illustrate this, we will repeatedly cite [11] where the following example is given. Let X 1 , , X n be order statistics constructed from the sample X 1 , , X n . It is well known (see, e.g., [12]) that in the standard situation the sample median is asymptotically normal. At the same time, in [11] it was demonstrated that if the sample size N n has the geometric distribution with expectation n , then the normalized sample median n X N n / 2 + 1 med X 1 has the limit distribution function

Ψ x = 1 2 1 + x 2 + x 2 E1

(the Student distribution with two degrees of freedom) which has such heavy tails that its moments of orders δ 2 do not exist. In general, as it was shown in [8], if a statistic that is asymptotically normal in the traditional sense is constructed on the basis of a sample with random size having negative binomial distribution, then instead of the expected normal law, the Student distribution with power-type decreasing heavy tails appears as an asymptotic law for this statistic.

## 2. Notation and definitions: auxiliary results

Let r N . We will consider random elements taking values in the r -dimensional Euclidean space R r .

Assume that all the random variables and random vectors are defined on one and the same probability space Ω A P . By the measurability of a random field, we will mean its measurability as a function of two variates, an elementary outcome and a parameter, with respect to the Cartesian product of the σ -algebra A and the Borel σ -algebra B R r of subsets of R r .

The distribution of a random vector ξ with respect to the measure P will be denoted L ξ . The weak convergence, the coincidence of distributions, and the convergence in probability with respect to a specified probability measure will be denoted by the symbols , = d , and P , respectively.

Let Σ be a positive definite matrix. The normal distribution in R r with zero vector of expectations and covariance matrix Σ will be denoted Φ Σ . This distribution is defined by its density

ϕ x = exp 1 2 x Τ Σ 1 x 2 π r / 2 Σ 1 / 2 , x R r .

The characteristic function f Y t of a random variable Y such that L Y = Φ Σ has the form

f Y t E exp i t Y = exp 1 2 t Σ t , t R r . E2

Consider a sequence S n n 1 of random elements taking values in R r . Let Ξ R r be the set of all nonsingular linear operators acting from R r to R r . The identity operator acting from R r to R r will be denoted I r . Assume that there exist sequences B n n 1 of operators from Ξ R r and a n n 1 of elements from R r such that

Y n B n 1 S n a n Y n E3

where Y is a random element whose distribution with respect to P will be denoted H , H = L Y .

Along with S n n 1 , consider a sequence of integer-valued positive random variables N n n 1 such that for each n 1 the random variable N n is independent of the sequence S k k 1 . Let c n R r , D n Ξ R r , and n 1 . Now, we will formulate sufficient conditions for the weak convergence of the distributions of the random elements Z n = D n 1 S N n c n as n .

For g R r , denote W n g = D n 1 B N n g + a N n c n . In [13, 14], the following theorem was proved, which establishes sufficient conditions of the weak convergence of multivariate random sequences with independent random indices under operator normalization.

Theorem 1 [14]. Let D n 1 as n and let the sequence of random variables D n 1 B N n n 1 be tight. Assume that there exist a random element Y with distribution H and an r -dimensional random field W g , g R r , such that 3 holds and

W n g W g n

for H -almost all g R r . Then the random field W g is measurable, linearly depends on g and

Z n W Y n ,

where the random field W and the random element Y are independent.

Now, consider an auxiliary statement dealing with the identifiability of a special family of mixtures of multivariate normal distributions. Let U be a nonnegative random variable. The symbol E Φ U Σ will denote the distribution which for each Borel set A in R r is defined as

E Φ U Σ A = 0 Φ u Σ A d P U < u .

Let U be the set of all nonnegative random variables.

It is easy to see that if Y is a random vector such that L Y = Φ Σ independent of U , then E Φ U Σ = L U Y .

Lemma 1. Whatever nonsingular covariance matrix Σ is, the family of distributions E Φ U Σ : U U is identifiable in the sense that if U 1 U , U 2 U , and

E Φ U 1 Σ A = E Φ U 2 Σ A E4

for any set A B R r , then U 1 = d U 2 .

The proof of this lemma is very simple. If U U , then the characteristic function v U t corresponding to the distribution E Φ U Σ has the form

v U t = 0 exp 1 2 t Τ u Σ t d P U < u = 0 exp 1 2 u t Τ Σ t d P U < u = 0 exp us d P U < u , s = 1 2 t Τ Σ t , t R r , E5

But on the right-hand side of (5), there is the Laplace-Stieltjes transform of the random variable U . From (4), it follows that v U 1 t v U 2 t whence by virtue of (5) the Laplace-Stieltjes transforms of the random variables U 1 and U 2 coincide, whence, in turn, it follows that U 1 = d U 2 . The lemma is proved.

Remark 1. When proving Lemma 1, we established a simple but useful by-product result: if ψ s is the Laplace-Stieltjes transform of the random variable U , then the characteristic function v U t corresponding to the distribution E Φ U Σ has the form

v U t = ψ 1 2 t Σ t , t R r . E6

## 3. General theorems

First, consider the case where the random vectors S n n 1 are formed as growing sums of independent random variables. Namely, let X 1 , X 2 , be independent r -valued random vectors, and for n N let

S n = X 1 + + X n .

Consider a sequence of integer-valued positive random variables N n n 1 such that for each n 1 the random variable N n is independent of the sequence S k k 1 . Let b n n 1 be an infinitely increasing sequence of positive numbers such that

L S n b n Φ Σ E7

as n , where Σ is some positive definite matrix.

Let d n n 1 be an infinitely increasing sequence of positive numbers. As Z n take the scalar normalized random vector

Z n = S N n d n .

Theorem 2. Let N n in probability as n . Assume that the random variables X 1 , X 2 , satisfy condition 6 with an asymptotic covariance matrix Σ . Then a distribution F such that

L Z n F n , E8

exists if and only if there exists a distribution function V x satisfying the conditions

1. V x = 0 for x < 0 ;

2. for any A B R r ,

F A = E Φ U Σ A = 0 Φ u Σ A dV u , x R 1 ;

1. P b N n < d n x V x , n .

Proof. The “if” part. We will essentially exploit Theorem 1. For each n 1 , set a n = c n = 0 , B n = D n = d n I r . For the convenience of notation, introduce a random variable U with the distribution function V x . Note that the conditions of the theorem guarantee the tightness of the sequence of random variables

D n 1 B N n = b N n d n , n = 1 , 2 ,

implied by its weak convergence to the random variable U . Further, in the case under consideration, we have W n g = b N n / d n g , g R r . Therefore, the condition N n / d n U implies W n g U g for all g R r . Condition (7) means that in the case under consideration, H = Φ Σ . Hence, by Theorem 1, Z n U Y where Y is a random element with the distribution Φ Σ independent of the random variable U . It is easy to see that the distribution of the random element U Y coincides with E Φ U Σ where the matrix Σ satisfies (7).

The “only if” part. Let condition (8) hold. Make sure that the sequence D n 1 B N n n 1 is tight. Let Y be a random element with the distribution Φ Σ . There exist δ > 0 and R > 0 such that

P Y > R > δ . E9

For R specified above and an arbitrary x > 0 , we have

P Z n > x P S N n d n > x S N n b N n > R = = P b N n d n > x S N n b N n 1 S N n b N n > R P b N n d n > x R S N n b N n > R = = k = 1 P N n = k P b k d n > x R S k b k > R = k = 1 P N n = k P b k d n > x R P S k b k > R E10

(the last equality holds since any constant is independent of any random variable). Since by (7) the convergence S k / b k Y takes place as k , from (9) it follows that there exists a number k 0 = k 0 R δ such that

P S k b k > R > δ / 2

for all k > k 0 . Therefore, continuing (10) we obtain

P Z n > x δ 2 k = k 0 + 1 P N n = k P b k d n > x R = = δ 2 P b N n d n > x R k = 1 k 0 P N n = k P b k d n > x R δ 2 P b N n d n > x R P N n k 0 .

Hence,

P b N n d n > x R 2 δ P Z n > x + P N n k 0 . E11

From the condition N n P as n , it follows that for any ϵ > 0 there exists an n 0 = n 0 ϵ such that P N n n 0 < ϵ for all n n 0 . Therefore, with the account of the tightness of the sequence Z n n 1 that follows from its weak convergence to the random element Z with L Z = F implied by (8), relation (11) implies

lim x sup n n 0 ϵ P b N n d n > x R ϵ , E12

whatever ϵ > 0 is. Now assume that the sequence

D n 1 B N n = b N n d n , n = 1 , 2 ,

is not tight. In that case, there exists an α > 0 and sequences N of natural and x n n N of real numbers satisfying the conditions x n n n N and

P b N n d n > x n > α , n N . E13

But, according to (12), for any ϵ > 0 there exist M = M ϵ and n 0 = n 0 ϵ such that

sup n n 0 ϵ P b N n d n > M ϵ 2 ϵ . E14

Choose ϵ < α / 2 where α is the number from (13). Then for all n N large enough, in accordance with (13), the inequality opposite to (14) must hold. The obtained contradiction by the Prokhorov theorem proves the tightness of the sequence D n 1 B N n n 1 or, which in this case is the same as that, of the sequence b N n / d n n 1 .

Introduce the set W Z containing all nonnegative random variables U such that P Z A = E Φ U Σ A for any A B R r . Let L be any probability metric that metrizes weak convergence in the space of random variables, or, which is the same in this context, n the space of distribution functions, say, the Lévy metric or the smoothed Kolmogorov distance. If X 1 and X 2 are random variables with the distribution functions F 1 and F 2 respectively, then we identify L X 1 X 2 and L F 1 F 2 . Show that there exists a sequence of random variables U n n 1 , U n W Z , such that

L b N n d n U n 0 n . E15

Denote

β n = inf L b N n d n U : U W Z .

Prove that β n 0 as n . Assume the contrary. In that case, β n δ for some δ > 0 and all n from some subsequence N of natural numbers. Choose a subsequence N 1 N so that the sequence b N n / d n n N 1 weakly converges to a random variable U (this is possible due to the tightness of the family b N n / d n n 1 established above). But then W n g U g ( n , n N 1 ) for any g R r . Applying Theorem 1 to n N 1 with condition (7) playing the role of condition (3), we make sure that U W Z , since condition (8) provides the coincidence of the limits of all weakly convergent subsequences. So, we arrive at the contradiction to the assumption that β n δ for all n N 1 . Hence, β n 0 as n .

For any n = 1 , 2 , , choose a random variable U n from W Z satisfying the condition

L b N n d n U n β n + 1 n .

This sequence obviously satisfies condition (15). Now consider the structure of the set W Z . This set contains all the random variable’s defining the family of special mixtures of multivariate normal laws considered in Lemma 1, according to which this family is identifiable. So, whatever a random element Z is, the set W Z contains at most one element. Therefore, actually condition (15) is equivalent to

b N n d n U n ,

that is, to condition (iii) of the theorem. The theorem is proved.

Corollary 1. Under the conditions of Theorem 2 , non-randomly normalized random sums S N n / d n are asymptotically normal with some covariance matrix Σ if and only if there exists a number c > 0 such that

b N n d n c n .

Moreover, in this case, Σ = c Σ .

This statement immediately follows from Theorem 2 with the account of Lemma 1.

Now consider a formally more general setting.

Let N 1 , N 2 , and W 1 , W 2 , be random variables and random vectors, respectively, such that for each n 1 the random variable N n takes only natural values and is independent of the sequence W 1 , W 2 , . Let

T n = T n W 1 W n = T n , 1 W 1 W n T n , r W 1 W n

be a statistic taking values in R r , r 1 . For each n 1 define a random vector (random element) T N n by setting

T N n ω = T N n ω W 1 ω W N n ω ω

for every elementary outcome ω Ω .

We shall say that a statistic T n is asymptotically normal with the asymptotic covariance matrix Σ if there exists a non-random r -dimensional vector t such that

L n T n t Φ Σ n . E16

Examples of asymptotically normal statistics are well known. Under certain conditions, the property of asymptotic normality is inherent in maximum likelihood estimators, sample moments, sample quantiles, etc.

Our nearest aim is to describe the asymptotic behavior of the random elements T N n , that is, of statistics constructed from samples with random sizes N n .

Again let d n n 1 be an infinitely increasing sequence of positive numbers. Now set

Z n = d n T N n t .

Theorem 3. Let N n in probability as n . Assume that a statistic T n is asymptotically normal in the sense of 16 with an asymptotic covariance matrix Σ . Then a distribution F such that

L Z n F n ,

exists if and only if there exists a distribution function V x satisfying the conditions.

(i) V x = 0 for x < 0 ;

(ii) for any A B R r

F A = 0 Φ u 1 Σ A dV u , x R 1 ;

(iii) P N n < d n x V x , n .

The proof of Theorem 3 relies on Theorem 1 with (16) playing the role of (3) and Lemma 1 and differs from the proof of Theorem 2 only by that b N n / d n is replaced by d n / N n .

Corollary 2. Under the conditions of Theorem 3 the statistic T N n is asymptotically normal with some covariance matrix Σ if and only if there exists a number c > 0 such that

N n d n c n .

Moreover, in this case, Σ = c 1 Σ .

This statement immediately follows from Theorem 2 with the account of Lemma 1.

## 4. Some remarks on the heavy-tailedness of scale mixtures of normals

The one-dimensional marginals of the multivariate limit law in Theorems 2 and 3 are scale mixtures of normals with zero means of the form E Φ x / U , x R , where Φ x is the standard normal distribution function and U is a nonnegative random variable. It turns out, although absolutely not so evident, that these distributions are always leptokurtic having sharper vertex and heavier tails than the normal law itself.

It is easy to see that

E Φ x / U = P X U < x , x R ,

where X is a standard normal variable independent of U . First, as a measure of leptokurtosity, consider the excess coefficient which is traditionally used in (descriptive) statistics. Recall that for a random variable Y with E Y 4 < , the excess coefficient (kurtosis) κ Y is defined as

κ Y = E Y E Y D Y 4 .

If P X < x = Φ x , then κ X = 3 . Densities with sharper vertices (and, respectively, with heavier tails) than the normal density, have κ > 3 , and κ < 3 for densities with more flat vertices.

Lemma 2. Let X and U be independent random variables with finite fourth moments; moreover, let E X = 0 and P U 0 = 1 . Then

κ XU κ X .

Furthermore, κ XU = κ X if and only if P U = const = 1 .

For the proof see [10].

So, if X is a standard normal random variable and U is a nonnegative random variable with E U 4 < independent of X , then κ X U 3 and κ X U = 3 if and only if U is non-random.

Using the Jensen inequality, we can easily obtain one more inequality directly connecting the tails of the normal mixtures with the tails of the normal distribution.

Lemma 3. Assume that the random variable U satisfies the normalization condition E U 1 = 1 . Then

1 E Φ x / U 1 Φ x , x > 0 .

From Lemma 3, it follows that if X is the standard normal random variable and U is a nonnegative random variable independent of X with E U 1 = 1 , then for any x 0

P X U x P X x = 2 1 Φ x ,

that is, scale mixtures of normal laws are always more leptokurtic and have heavier tails than normal laws themselves.

The class of scale mixtures of normal laws is very rich and involves distributions with various character of decrease of tails. For example, this class contains Student distributions with arbitrary (not necessarily integer) number of degrees of freedom (and the Cauchy distribution included), symmetric stable distributions (see the “multiplication theorem” 3.3.1 in [15]), symmetric fractional stable distributions (see [16]), symmetrized gamma distributions with arbitrary shape and scale parameters (see [10]), and symmetrized Weibull distributions with shape parameters belonging to the interval 0 1 (see [17, 18]). As an example, in the next section, we will discuss the conditions for the convergence of the distributions of the statistics constructed from samples with random sizes to the multivariate Student distribution.

## 5. Convergence of the distributions of random sums of random vectors with finite covariance matrices to multivariate elliptically contoured stable and Linnik distributions

### 5.1 Convergence of the distributions of random sums of random vectors to multivariate stable laws

Let Σ be a positive definite r × r -matrix, α 0 2 . A random vector Z α , Σ is said to have the (centered) elliptically contoured stable distribution G α , Σ with characteristic exponent α , if its characteristic function g α , Σ t has the form

g α , Σ t E exp i t X = exp t Σ t α / 2 , t R r .

Univariate stable distributions are popular examples of heavy-tailed distributions. Their moments of orders δ α do not exist (the only exception is the normal law corresponding to α = 2 ). Stable laws and only they can be limit distributions for sums of a non-random number of independent identically distributed random variables with infinite variance under linear normalization. Here it will be shown that they also can be limiting for random sums of random vectors with finite covariance matrices. The result of this subsection generalizes the main theorem of [19] to a multivariate case.

By ζ α , we will denote a positive random variable with the one-sided stable distribution corresponding to the characteristic function

g α t = exp t α exp 1 2 iπα sign t , t R ,

with 0 < α 1 (for more details see [15] or [4]).

Let α 0 2 . It is known that, if Y is a random vector such that L Y = Φ Σ independent of the random variable ζ α / 2 , then

Z α , Σ = d ζ α / 2 Y E17

(see Proposition 2.5.2 in [4]). In other words,

G α , Σ = E Φ ζ α / 2 Σ . E18

As in Section 3, let X 1 , X 2 , be independent r -valued random vectors. For n N , denote S n = X 1 + + X n . Consider a sequence of integer-valued positive random variables N n n 1 such that for each n 1 the random variable N n is independent of the sequence S k k 1 . Let b n n 1 be an infinitely increasing sequence of positive numbers providing convergence (6) with some positive definite matrix Σ .

Theorem 4. Let N n in probability as n . Assume that the random variables X 1 , X 2 , satisfy condition 7 with an asymptotic covariance matrix Σ . Then

L S N n d n G α , Σ n

with some infinitely increasing sequence of positive numbers d n n 1 and some α 0 2 , if and only if

N n d n ζ α / 2 , 1

as n .

Proof. This theorem is a direct consequence of Theorem 2 with the account of relations (17) and (18).

### 5.2 Convergence of the distributions of random sums of random vectors with finite covariance matrices to multivariate elliptically contoured Linnik distributions

In 1953, Yu. V. Linnik [20] introduced the class of univariate symmetric probability distributions defined by the characteristic functions

f α L t = 1 1 + t α , t R ,

where α 0 2 . Later, the distributions of this class were called Linnik distributions [21] or α -Laplace distributions [22]. Here the first term will be used since it has become conventional. With α = 2 , the Linnik distribution turns into the Laplace distribution corresponding to the density

f Λ x = 1 2 e x , x R .

A random variable with the Linnik distribution with parameter α will be denoted L 1 , α .

The Linnik distributions possess many interesting analytic properties (see, e.g., [17, 18] and the references therein) but, perhaps, most often Linnik distributions are recalled as examples of geometric stable distributions often used as heavy-tailed models of some statistical regularities in financial data [23, 24].

The multivariate Linnik distribution was introduced by D. N. Anderson in [25] where it was proved that the function

f α , Σ L t = 1 1 + t Σ t α / 2 , t R r , α 0 2 , E19

is the characteristic function of an r -variate probability distribution, where Σ is a positive definite r × r -matrix. In [25], the distribution corresponding to the characteristic function (19) was called the r -variate Linnik distribution. For the properties of the multivariate Linnik distributions, see [25, 26].

The r -variate Linnik distribution can also be defined in another way. For this purpose, recall that the distribution of a nonnegative random variable M δ whose Laplace transform is

ψ δ s E e sM δ = 1 1 + s δ , s 0 , E20

where 0 < δ 1 , is called the Mittag-Leffler distribution. It is another example of heavy-tailed geometrically stable distributions; for more details see for example, [17, 18] and the references therein. The Mittag-Leffler distributions are of serious theoretical interest in the problems related to thinned (or rarefied) homogeneous flows of events such as renewal processes or anomalous diffusion or relaxation phenomena, see [27, 28] and the references therein. In [18], it was demonstrated that

L 1 , α = d Y 1 2 M α / 2 , E21

where Y 1 is a random variable with the standard univariate normal distribution independent of the random variable M α / 2 with the Mittag-Leffler distribution with parameter α / 2 .

Now let Y be a random vector such that L Y = Φ Σ , where Σ is a positive definite r × r -matrix, independent of the random variable M α / 2 . By analogy with (21), introduce the random vector L r , α , Σ as

L r , α , Σ = 2 M α / 2 Y .

Then, in accordance with what has been said in Section 2,

L L r , α , Σ = E Φ 2 M α / 2 Σ . E22

The distribution (14) will be called the ( centered ) elliptically contoured multivariate Linnik distribution.

Using Remark 1, we can easily make sure that the two definitions of the multivariate Linnik distribution coincide. Indeed, with the account of (20), according to Remark 1, the characteristic function of the random vector L r , α , Σ defined by (22) has the form

E exp i t L r , α , Σ = ψ α / 2 t Σ t = 1 1 + t Σ t α / 2 = f α , Σ L t , t R r ,

that coincides with Anderson’s definition (19).

Our definition (22) together with Theorem 2 opens the way to formulate a theorem stating that the multivariate Linnik distribution can not only be limiting for geometric random sums of independent identically distributed random vectors with infinite second moments [29], but it can also be limiting for random sums of independent random vectors with finite covariance matrices.

Theorem 5. Let N n in probability as n . Assume that the random variables X 1 , X 2 , satisfy condition 7 with an asymptotic covariance matrix Σ . Then

L S N n d n L L r , α , Σ n

with some infinitely increasing sequence of positive numbers d n n 1 and some α 0 2 , if and only if

N n d n 2 M α / 2

as n .

Proof. This theorem is a direct consequence of Theorem 2 with the account of relation (22).

## 6. Convergence of the distributions of asymptotically normal statistics to the multivariate Student distribution

The multivariate Student distribution is described, for example, in [30] (also see [31]). Consider an r -dimensional normal random vector Y with zero vector of expectations and covariance matrix Σ . Assume that a random variable W γ has the chi-square distribution with parameter (the “number of degrees of freedom”) γ > 0 (not necessarily integer) and is independent of Y . The distribution P γ , Σ of the random vector

Q γ , Σ = γ / W γ Y E23

is called the multivariate Student distribution (with parameters γ and Σ ). For any x R r the distribution density of Z has the form

p γ , Σ x = Γ r + γ / 2 ) Σ 1 / 2 Γ γ / 2 πγ r / 2 1 1 + 1 γ x Τ Σ 1 x r + γ / 2 .

According to Theorem 3, the multivariate Student distribution is the resulting transformation of the limit distribution of an asymptotically normal (in the sense of (16)) statistic under the replacement of the sample size by a random variable whose asymptotic distribution is chi-square. Consider this case in more detail.

Let G m , m x be the gamma-distribution function with the shape parameter coinciding with the scale parameter and equal to m :

G m , m x = 0 if x 0 , m m Γ m 0 x e my y m 1 dy if x > 0 .

Theorem 6. Let γ > 0 be arbitrary, Σ be a positive definite matrix and let d n n 1 be an infinitely increasing sequence of positive numbers. Assume that N n in probability as n . Let a statistic T n be asymptotically normal in the sense of 16 . Then the convegence

L d n T N n t P γ , Σ n ,

takes place if and only if

P N n < d n x G γ / 2 , γ / 2 x , n ,

where G γ / 2 , γ / 2 x is the gamma-distribution function with coinciding shape and scale parameters equal to γ / 2 .

Proof. This statement is a direct consequence of Theorem 3, representation (23) and Lemma 1.

Let N p , m be a random variable with the negative binomial distribution

P N p , m = k = C m + k 2 k 1 p m 1 p k 1 , k = 1 , 2 , E24

Here m > 0 and p 0 1 are parameters; for non-integer m , the quantity C m + k 2 k 1 is defined as

C m + k 2 k 1 = Γ m + k 1 k 1 ! Γ m .

In particular, for m = 1 , relation (24) determines the geometric distribution. It is well known that

E N p , m = m 1 p + p p ,

so that E N p , m as p 0 .

As is known, the negative binomial distribution with natural m admits an illustrative interpretation in terms of Bernoulli trials. Namely, the random variable with distribution (24) is the number of the Bernoulli trials held up to the m th failure, if the probability of the success in a trial is 1 p .

Lemma 4. For any fixed m > 0

lim p 0 sup x R P N p , m E N p , m < x G m , m x = 0 ,

where G m , m x is the gamma-distribution function with the shape parameter coinciding with the scale parameter and equal to m .

The proof is a simple exercise on characteristic functions; for more details, see [8].

Corollary 3. Let m > 0 be arbitrary. Assume that for each n 1 the random variable N n has the negative binomial distribution with parameters p = 1 n and m . Let a statistic T n be asymptotically normal in the sense of 16 . Then

L mn T N n t P 2 m , Σ n

where P 2 m , Σ is the r -variate Student distribution with parameters γ = 2 m and Σ .

Proof. By Lemma 4 we have

N n nm = N n E N n E N n nm = N n E N n m n 1 + 1 mr = N n E N n 1 + O 1 n U m

as n where U m is the random variable having the gamma-distribution function with coinciding shape and scale parameters equal to m . Now the desired assertion directly follows from Theorem 6.

Remark 2. The r -variate Cauchy distribution ( γ = 1 ) appears in the situation described in Corollary 2 when the sample size N n has the negative binomial distribution with the parameters p = 1 n , m = 1 2 , and n is large.

Remark 3. In the case where the sample size N n has the negative binomial distribution with the parameters p = 1 n , m = 1 (that is, the geometric distribution with the parameter p = 1 n ), then, as n , we obtain the limit r -variate Student distribution with parameters γ = 2 and Σ . Moreover, if Σ = I r (that is, the r -variate Student distribution is spherically symmetric), then its one-dimensional marginals have the form (1). As we have already noted, distribution (1) was apparently for the first time introduced as a limit distribution for the sample median in a sample with geometrically distributed random size in [11]. It is worth noticing that in the cited paper [11], distribution (1) was not identified as the Student distribution with two degrees of freedom.

Thus, the main conclusion of this section can be formulated as follows. If the number of random factors that determine the observed value of a random variable is random itself with the distribution that can be approximated by the gamma distribution with coinciding shape and scale parameters (e.g., is negative binomial with probability of success close to one, see Lemma 4), then those functions of the random factors that are regarded as asymptotically normal in the classical situation are actually asymptotically Student with considerably heavier tails. Hence, since gamma-models and/or negative binomial models are widely applicable (to confirm this it may be noted that the negative binomial distribution is mixed Poisson with mixing gamma distribution, this fact is widely used in insurance), the Student distribution can be used in descriptive statistics as a rather reasonable heavy-tailed asymptotic approximation.

## 7. The asymptotic distribution of sample quantiles in samples with sizes generated by a Cox process

Sometimes, when the performance of a technical or financial system is analyzed, a forecast of main characteristics is made on the basis of data accumulated during a certain period of the functioning of the system. As a rule, data are accumulated as a result of some “informative events” that occur during this period. For example, inference concerning the distribution of insurance claims, which is very important for the estimation of, say, the ruin probability of an insurance company, is usually performed on the basis of the statistic W 1 , W 2 , , W N T of the values of insurance claims arrived within a certain time interval 0 T (here N T denotes the number of claims arrived during the time interval 0 T ). Moreover, this inference is typically used for the prediction of the value of the ruin probability for the next period T 2 T . But it is obvious (at least in the example above) that the observed number of informative events occurred during the time interval 0 T is actually a realization of a random variable, because both the number of insurance claims arrived within this interval follow a stochastic counting process. If the random character of the number of available observations is not taken into consideration, then all what can be done is the conditional forecast. To obtain a complete prediction with the account of the randomness of the number of “informative events,” we should use the results similar to Theorems 2 and 3. One of rather realistic and general assumptions concerning N t , the number of observations accumulated by the time t , is that N t is a Cox process. In this section, as an example, we will consider the asymptotic behavior of sample quantiles constructed from a sample whose size is determined by a Cox process. As we have already noted in the introduction, this problem is very important for the proper application of such risk measures as VaR (Value-at-Risk) in, say, financial engineering.

Let W 1 , , W n , n 1 , be independent identically distributed random variables with common distribution density p x and W 1 , , W n be the corresponding order statistics, W 1 W 2 W n . Let r N λ 1 , , λ r be some numbers such that 0 < λ 1 < λ 2 < < λ r < 1 . The quantiles of orders λ 1 , , λ r of the random variable W 1 will be denoted ξ λ i , i = 1 , , r . The sample quantiles of orders λ 1 , , λ r are the random variables W λ i n + 1 , i = 1 , , r , with a denoting the integer part of a number a . The following result due to Mosteller [32] (also see [33], Section 9.2) is classical. Denote

Y n , j = n W λ j n + 1 ξ λ j , j = 1 , , r .

Theorem 7 [32]. If p x is differentiable in some neighborhoods of the quantiles ξ λ i and p ξ λ i 0 , i = 1 , , r , then, as n , the joint distribution of the normalized sample quantiles Y n , 1 , , Y n , r weakly converges to the r -variate normal distribution with zero vector of expectations and covariance matrix Σ = σ ij ,

σ ij = λ i 1 λ j p ξ λ i p ξ λ j , i j .

To take into account the randomness of the sample size, consider the sequence W 1 , W 2 of independent identically distributed random variables with common distribution density p x .

Let N t , t 0 , be a Cox process controlled by a process Λ t . Recall the definition of a Cox process. Let N 1 t , t 0 , be a standard Poisson process (i.e., a homogeneous Poisson process with unit intensity). Let Λ t , t 0 , be a random process with non-decreasing right-continuous trajectories, Λ 0 = 0 , P Λ t < = 1 for all t > 0 . Assume that the processes Λ t and N 1 t are independent. Set

N t = N 1 Λ t , t 0 .

The process N t is called a doubly stochastic Poisson process (or a Cox process) controlled by the process Λ t . The one-dimensional distributions of a Cox process are mixed Poisson. For example, if Λ t has the gamma distribution, then N t has the negative binomial distribution.

Cox processes are widely used as models of inhomogeneous chaotic flows of events, see, for example, [2].

Assume that all the involved random variables and processes are independent. In this section, under the assumption that Λ t in probability, the asymptotics of the joint distribution of the random variables W λ i N t + 1 , i = 1 , , r is considered as t .

As we have already noted, it was B. V. Gnedenko who drew attention to the essential distinction between the asymptotic properties of sample quantiles constructed from samples with random sizes and the analogous properties of sample quantiles in the standard situation. Briefly recall the history of the problem under consideration. B. V. Gnedenko, S. Stomatovič, and A. Shukri [34] obtained sufficient conditions for the convergence of distribution of the sample median constructed from sample of random size. In the candidate (PhD) thesis of A. K. Shukri, these conditions were extended to quantiles of arbitrary orders. In [35], necessary and sufficient conditions for the weak convergence of the one-dimensional distributions of sample quantiles in samples with random sizes were obtained.

Our aim here is to give necessary and sufficient conditions for the weak convergence of the joint distributions of sample quantiles constructed from samples with random sizes driven by a Cox process and to describe the r -variate limit distributions emerging here, thus extending Mosteller’s Theorem 4 to samples with random sizes. The results of this section extend those of [36] to the continuous-time case.

Lemma 5. Let N t be a Cox process controlled by the process Λ t . Then N t P t if and only if Λ t P t .

Lemma 6. Let N t be a Cox process controlled by the process Λ t . Let d t > 0 be a function such that d t t . Then the following conditions are equivalent:

1. One-dimensional distributions of the normalized Cox process weakly converge to the distribution of some random variable Z as t :

N t d t Z t .

1. One-dimensional distributions of the controlling process Λ t , appropriately normalized, converge to the same distribution:

Λ t d t Z t .

For the proof of Lemmas 5 and 6 see [37].

Now we proceed to the main results of this section. In addition to the notation introduced above, for positive integer n set Q j n = W λ j n + 1 , j = 1 , , r , Q n = Q 1 n Q r n , ξ = ξ λ 1 ξ λ r . Let d t be an infinitely increasing positive function. Set

Z t = d t Q N t ξ .

Theorem 8. Let Λ t P as t . If p x is differentiable in neighborhoods of the quantiles ξ λ i and p ξ λ i 0 , i = 1 , , r , then the convergence

Z t Z t ,

to some random vector Z takes place, if and only if there exists a nonnegative random variable U such that

P Z A = E Φ U 1 Σ A , A B R r ,

where Σ = σ ij ,

σ ij = λ i 1 λ j p ξ λ i p ξ λ j , i j ,

and

Λ t d t U t .

The proof is a simple combination of Lemmas 1, 5, and 6 and Theorem 3.

Corollary 4. Under the conditions of Theorem 8, the joint distribution of the normalized sample quantiles d t W λ j N t + 1 ξ λ j , j = 1 , , r , weakly converges to the r -variate normal law with zero expectation and covariance matrix Σ , if and only if

Λ t d t 1 t .

This statement immediately follows from Theorem 8 with the account of Lemma 1.

Corollary 5. Under the conditions of Theorem 8, the joint distribution of the normalized sample quantiles d t W λ j N t + 1 ξ λ j , j = 1 , , r , weakly converges to the r -variate Student distribution with parameters γ > 0 and Σ defined in Theorem 4, if and only if

P Λ t < xd t G γ / 2 , γ / 2 x , t ,

where G γ / 2 , γ / 2 x is the gamma-distribution function with coinciding shape and scale parameters equal to γ / 2 .

Let 0 < λ < 1 and let ξ λ be the λ -quantile of the random variable W 1 . As above, the standard normal distribution function will be denoted Φ x .

## 8. Conclusion

The purpose of the chapter was to give a possible explanation of the emergence of heavy-tailed distributions that are often observed in practice instead of the expected normal laws. As the base for this explanation, limit theorems for random sums and statistics constructed from samples with random sizes were considered. Within this approach, it becomes possible to obtain arbitrarily heavy tails of the data distributions without assuming the non-existence of the moments of the observed characteristics. Some comments were made on the heavy-tailedness of scale mixtures of normal distributions. Two general theorems presenting necessary and sufficient conditions for the convergence of the distributions of random sums of random vectors and multivariate statistics constructed from samples with random sizes were proved. As examples of the application of these general theorems, conditions were presented for the convergence of the distributions of random sums of independent random vectors with finite covariance matrices to multivariate elliptically contoured stable and Linnik distributions. An alternative definition of the latter was proposed. Also, conditions were presented for the convergence of the distributions of asymptotically normal (in the traditional sense) statistics to multivariate elliptically contoured Student distributions when the sample size is replaced by a random variable. The joint asymptotic behavior of sample quantiles in samples with random sizes was considered. Special attention was paid to the continuous-time case assuming that the sample size increases in time following a Cox process resulting in the sample size having the mixed Poisson distribution.

## Acknowledgments

Supported by Russian Science Foundation, project 18-11-00155.

## References

1. 1. Shiryaev AN. Foundations of Financial Mathematics. Vol. 1. Facts, Models. – Singapore: World Scientific; 1998
2. 2. Bening V, Korolev V. Generalized Poisson Models and Their Application in Insurance and Finance. Utrecht: VSP; 2002
3. 3. Meerschaert MM, Scheffler H-P. Limit theorems for continuous-time random walks with infinite mean waiting times. Journal of Applied Probability. 2004;41(3):623-638
4. 4. Samorodnitsky G, Taqqu MS. Stable Non-Gaussian Random Processes, Stochastic Models with Infinite Variance. New York: Chapman and Hall; 1994
5. 5. Chen J. From the central limit theorem to heavy-tailed distributions. Journal of Applied Probability. 2003;40(3):803-806
6. 6. Korolev VY. Convergence of random sequences with independent random indices. I. Theory of Probability and its Applications. 1994;39(2):313-333
7. 7. Schluter C, Trede M. Weak convergence to the Student and Laplace distributions. Journal of Applied Probability. 2016;53:121-129
8. 8. Bening V, Korolev V. On an application of the Student distribution in the theory of probability and mathematical statistics. Theory of Probability and its Applications. 2005;49(3):377-391
9. 9. Bening V, Korolev V. Some statistical problems related to the Laplace distribution. Informatics and its Apllications. 2008;2(2):19-34
10. 10. Gnedenko BV, Korolev V. Random Summation: Limit Theorems and Applications. Boca Raton: CRC Press; 1996
11. 11. Gnedenko BV. On estimation of the unknown parameters of distributions from a random number of independent observations. Probability Theory and Mathematical Statistics. Proccedings of Tbilisi Mathematical Institute named after A. M. Razmadze. 1989;24:146-150 (in Russian)
12. 12. Kolmogorov AN. The method of median in the theory of errors. Matematicheskii Sbornik. 1931;38(3/4):47-50
13. 13. Korolev VY, Kossova EV. On limit distributions of randomly indexed multidimensional random sequences with an operator normalization. Journal of Mathematical Sciences. 1992;72(1):2915-2929
14. 14. Korolev VY, Kossova EV. Convergence of multidimensional random sequences with independent random indices. Journal of Mathematical Sciences. 1995;76(2):2259-2268
15. 15. Zolotarev VM. One-Dimensional Stable Distributions. Providence: American Mathematical Society; 1986
16. 16. Kolokoltsov V, Korolev V, Uchaikin V. Fractional stable distributions. Journal of Mathematical Sciences. 2001;105(6):2569-2576
17. 17. Korolev VY, Zeifman AI. A note on mixture representations for the Linnik and Mittag-Leffler distributions and their applications. Journal of Mathematical Sciences. 2017;218(3):314-327
18. 18. Korolev VY, Zeifman AI. Convergence of statistics constructed from samples with random sizes to the Linnik and Mittag-Leffler distributions and their generalizations. Journal of the Korean Statistical Society. 2017;46(2):161-181 Available online 25 July 2016. Also available on arXiv:1602.02480v1 [math.PR]
19. 19. Korolev VY. On the convergence of distributions of random sums of independent random variables to stable laws. Theory of Probability and its Applications. 1998;42(4):695-696
20. 20. Linnik YV. Linear forms and statistical criteria, I, II. Selected Translations in Mathematical Statistics and Probability. 1963;3:41-90 (Original paper appeared in: Ukrainskii Matematicheskii Zhournal, 1953. Vol. 5. pp. 207-243, 247-290)
21. 21. Kotz S, Kozubowski TJ, Podgorski K. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Boston: Birkhauser; 2001
22. 22. Pillai RN. Semi-α-Laplace distributions. Communications in Statistical Theory and Methods. 1985;14:991-1000
23. 23. Kozubowski TJ, Rachev ST. The theory of geometric stable distributions and its use in modeling financial data. European Journal of Operational Research. 1994;74(2):310-324
24. 24. Kozubowski T, Panorska A. Multivariate geometric stable distributions in financial applications 1. Mathematical and Computer Modelling. 1999;29:83-92
25. 25. Anderson DN. A multivariate Linnik distribution. Statistics & Probability Letters. 1992;14:333-336
26. 26. Ostrovskii IV. Analytic and asymptotic properties of multivariate Linnik’s distribution. Mathematical Physics, Analysis and Geometry. 1995;2(3):436-455
27. 27. Weron K, Kotulski M. On the Cole-Cole relaxation function and related Mittag-Leffler distributions. Physica A: Statistical Mechanics and its Applications. 1996;232:180-188
28. 28. Gorenflo R, Mainardi F. Continuous time random walk, Mittag-Leffler waiting time and fractional diffusion: Mathematical aspects, chap. 4. In: Klages R, Radons G, Sokolov IM, editors. Anomalous Transport: Foundations and Applications. Weinheim, Germany: Wiley-VCH; 2008. pp. 93-127. Available at: http://arxiv.org/abs/0705.0797
29. 29. Kozubowski TJ, Rachev ST. Multivariate geometric stable laws. Journal of Computational Analysis and Applications. 1999;1(4):349-385
30. 30. DeGroot MH. Optimal Statistical Decisions. New York, London: McGraw-Hill Company; 1970
31. 31. Gupta SS. Bibliography on the Multivariate Normal Integrals and Related Topics. Providence: American Mathematical Society; 1963
32. 32. Mosteller F. On some useful “inefficient” statistics. Annals of Mathematical Statistics. 1946;17:377-408
33. 33. David HA. Order Statistics. New York: Wiley; 1970
34. 34. Gnedenko BV, Stomatovič S, Shukri A. On the distribution of the median. Bulletin of Moscow University, Series Mathematics, Mechanics. 1984;2:59-63
35. 35. Selivanova DO. Estimates of Convergence Rate in Limit Theorems for Random Sums. PhD Thesis: Moscow State University; 1995
36. 36. Korolev VY. Asymptotic properties of sample quantiles constructed from samples with random sizes. Theory of Probability and its Applications. 2000;44(2):394-399
37. 37. Korolev VY. On convergence of distributions of compound Cox processes to stable laws. Theory of Probability and its Applications. 1999;43(4):644-650

Written By

Victor Korolev and Alexander Zeifman

Reviewed: 10 September 2019 Published: 22 October 2019