1. Introduction
In Bayesian analysis, there are four basic elements: the data, the model, the prior, and the loss function. A Bayesian estimator minimizes some posterior expected loss (PEL) function. We confine our interests to six loss functions in this chapter: the squared error loss function (well known), the weighted squared error loss function ([1], p. 78), Stein’s loss function [2, 3, 4, 5, 6, 7, 8, 9, 10], the power-power loss function [11], the power-log loss function [12], and Zhang’s loss function [13]. It is worthy to note that among the six loss functions, the first and second loss functions are defined on Θ=−∞∞, and they penalize overestimation and underestimation equally. The third and fourth loss functions are defined on Θ=0∞, and they penalize gross overestimation and gross underestimation equally, that is, an action a will suffer an infinite loss when it tends to 0 or ∞. The fifth and sixth loss functions are defined on Θ=01, and they penalize gross overestimation and gross underestimation equally, that is, an action a will suffer an infinite loss when it tends to 0 or 1.
The squared error loss function and the weighted squared error loss function have been used by many authors for the problem of estimating the variance, σ2, based on a random sample from a normal distribution with mean μ unknown (see, for instance, [14, 15]). As pointed out by [16], the two loss functions penalize equally for overestimation and underestimation, which is fine for the unrestricted parameter space Θ=−∞∞.
For Θ=0∞, the positive restricted parameter space, where 0 is a natural lower bound and the estimation problem is not symmetric, we should not choose the squared error loss function and the weighted squared error loss function but choose a loss function which can penalize gross overestimation and gross underestimation equally, that is, an action a will suffer an infinite loss when it tends to 0 or ∞. Stein’s loss function owns this property, and thus it is recommended for Θ=0∞ by many researchers (e.g., see [2, 3, 4, 5, 6, 7, 8, 9, 10]). Moreover, [11] proposes the power-power loss function which not only penalizes gross overestimation and gross underestimation equally but also has balanced convergence rates or penalties for its argument too large and too small. Therefore, Stein’s loss function and the power-power loss function are recommended for Θ=0∞.
Analogously, for a restricted parameter space Θ=01, where 0 and 1 are two natural bounds and the estimation problem is not symmetric, we should not select the squared error loss function and the weighted squared error loss function but select a loss function which can penalize gross overestimation and gross underestimation equally, that is, an action a will suffer an infinite loss when it tends to 0 or 1. It is worthy to note that Stein’s loss function and the power-power loss function are also not appropriate in this case. The power-log loss function proposed by [12] has this property. Moreover, they propose six properties for a good loss function on Θ=01. Specifically, the power-log loss function is convex in its argument, attains its global minimum at the true unknown parameter, and penalizes gross overestimation and gross underestimation equally. Apart from the six properties, [13] proposes the seventh property, that is, balanced convergence rates or penalties for the argument too large and too small, for a good loss function on Θ=01. Therefore, the power-log loss function and Zhang’s loss function are recommended for Θ=01.
The rest of the chapter is organized as follows. In Section 2, we obtain two Bayesian estimators for θ∈Θ=−∞∞ under the squared error loss function and the weighted squared error loss function. In Section 3, we obtain two Bayesian estimators for θ∈Θ=0∞ under Stein’s loss function and the power-power loss function. In Section 4, we obtain two Bayesian estimators for θ∈Θ=01 under the power-log loss function and Zhang’s loss function. In Section 5, we summarize three strings of inequalities in a theorem. Some conclusions and discussions are provided in Section 6.
Advertisement
2. Bayesian estimation for θ ∈(−∞,∞)
There are two loss functions which are defined on Θ=−∞∞ and penalize overestimation and underestimation equally, that is, the squared error loss function (well known) and the weighted squared error loss function (see [1], p. 78).
2.1 Squared error loss function
The Bayesian estimator under the squared error loss function (well known), δ2πx, minimizes the posterior expected squared error loss (PESEL), EL2θax, that is,
δ2πx=argmina∈AEL2θax,E1
where Aax:ax∈−∞∞ is the action space, a=ax∈−∞∞ is an action (estimator),
L2θa=θ−a2E2
is the squared error loss function, and θ∈−∞∞ is the unknown parameter of interest. The PESEL is easy to obtain (see [16]):
PESELπax=EL2θax=a2−2aEθx+Eθ2x.E3
It is found in [16] that
δ2πx=EθxE4
by taking partial derivative of the PESEL with respect to a and setting it to 0.
2.2 Weighted squared error loss function
The Bayesian estimator under the weighted squared error loss function, δw2πx, minimizes the posterior expected weighted squared error loss (PEWSEL) (see [1]), ELw2θax, that is,
δw2πx=argmina∈AELw2θax,E5
where Aax:ax∈−∞∞ is the action space, a=ax∈−∞∞ is an action (estimator),
Lw2θa=1θ2θ−a2E6
is the weighted squared error loss function, and θ∈−∞∞ is the unknown parameter of interest. The PEWSEL is easy to obtain (see [1]):
PEWSELπax=ELw2θax=a2E1θ2x−2aE1θx+1.E7
It is found in [1] that
δw2πx=E1θxE1θ2xE8
by taking partial derivative of the PEWSEL with respect to a and setting it to 0.
Advertisement
3. Bayesian estimation for θ ∈ (0,∞)
There are many hierarchical models where the parameter of interest is θ∈Θ=0∞. As pointed out in the introduction, we should calculate and use the Bayesian estimator of the parameter θ under Stein’s loss function or the power-power loss function because they penalize gross overestimation and gross underestimation equally. We list several such hierarchical models as follows.
Model (a) (hierarchical normal and inverse gamma model). This hierarchical model has been investigated by [10, 16, 17]. Suppose that we observe X1,X2,…,Xn from the hierarchical normal and inverse gamma model:
Xi∣θ∼iidNμθ,i=1,2,…,n,θ∼IGαβ,E9
where −∞<μ<∞, α>0, and β>0 are known constants, θ is the unknown parameter of interest, Nμθ is the normal distribution, and IGαβ is the inverse gamma distribution. It is worthy to note that the problem of finding the Bayesian rule under a conjugate prior is a standard problem and the problem is treated in almost every text on mathematical statistics. The idea of selecting an appropriate prior from the conjugate family was put forward by [18]. Specifically, Bayesian estimation of θ under the prior IGαβ is studied in Example 4.2.5 (p. 236) of [17] and in Exercise 7.23 (p. 359) of [16]. However, they only calculate the Bayesian estimator with respect to IGαβ prior under the squared error loss, δ2πx=Eθx.
Model (b) (hierarchical Poisson and gamma model). This hierarchical model has been investigated by [1, 16, 19, 20]. Suppose that X1,X2,…,Xn are observed from the hierarchical Poisson and gamma model:
Xi∣θ∼iidPθ,i=1,2,…,n,θ∼Gαβ,E10
where α>0 and β>0 are hyperparameters to be determined, Pθ is the Poisson distribution with an unknown mean θ>0, and Gαβ is the gamma distribution with an unknown shape parameter α and an unknown rate parameter β. The gamma prior Gαβ is a conjugate prior for the Poisson model, so that the posterior distribution of θ is also a gamma distribution. The hierarchical Poisson and gamma model (10) has been considered in Exercise 4.32 (p. 196) of [4]. It has been shown that the marginal distribution of X is a negative binomial distribution if α is a positive integer. The Bayesian estimation of θ under the gamma prior is studied in [19] and in Tables 3.3.1 (p. 121) and 4.2.1 (p. 176) of [1]. However, they only calculated the Bayesian posterior estimator of θ under the squared error loss function.
Model (c) (hierarchical normal and normal-inverse-gamma model). This hierarchical model has been investigated by [2, 21, 22]. Let the observations X1,X2,…,Xn be from the hierarchical normal and normal-inverse-gamma model:
Xi∣μθ∼iidNμθ,i=1,2,…,n,μ∣θ∼Nμ0θ/κ0,θ∼IGv0/2v0σ02/2,E11
where −∞<μ0<∞, κ0>0, v0>0, and σ0>0 are known hyperparameters, Nμθ is a normal distribution with an unknown mean μ and an unknown variance θ, μ∣θ is N(μ0,θ/κ0) which is a normal distribution, and θ is IGv0/2v0σ02/2 which is an inverse gamma distribution. More specifically, with a joint conjugate prior πμθ∼N−IGμ0κ0v0σ02, which is the normal-inverse-gamma distribution, the posterior distribution of θ was studied in Example 1.5.1 (p. 20) of [21] and Part I (pp. 69–70) of [22]. However, they did not provide any Bayesian posterior estimator of θ. Moreover, the normal distribution with a normal-inverse-gamma prior which assumes that μ is unknown is more realistic than the normal distribution with an inverse gamma prior investigated by [10] which assumes that μ is known.
3.1 Stein’s loss function
3.1.1 One-dimensional case
The Bayesian estimator under Stein’s loss function, δsπx, minimizes the posterior expected Stein’s loss (PESL) (see [1, 10, 16]), ELsθax, that is,
δsπx=argmina∈AELsθax,E12
where Aax:ax>0 is the action space, a=ax>0 is an action (estimator),
Lsθa=aθ−1−logaθE13
is Stein’s loss function, and θ>0 is the unknown parameter of interest. The PESL is easy to obtain (see [10]):
PESLπax=ELsθax=aE1θx−1−loga+Elogθx.E14
It is found in [10] that
δsπx=1E1θxE15
by taking partial derivative of the PESL with respect to a and setting it to 0. The PESLs evaluated at the Bayesian estimators are (see [10])
PESLsπx=ELsθaxa=δsπx,PESL2πx=ELsθaxa=δ2πx,E16
where δ2πx=Eθx is the Bayesian estimator under the squared error loss function.
For the variance parameter θ of the hierarchical normal and inverse gamma model (9), [10] recommends and analytically calculates the Bayesian estimator:
δsπx=1α∗β∗,E17
where
α∗=α+n2andβ∗=1β+12∑i=1nxi−μ2−1,E18
with respect to IGαβ prior under Stein’s loss function. This estimator minimizes the PESL. [10] also analytically calculates the Bayesian estimator,
δ2πx=Eθx=1α∗−1β∗,E19
with respect to IGαβ prior under the squared error loss, and the corresponding PESL. [10] notes that
Elogθx=−logβ∗−ψα∗,E20
which is essential for the calculation of
PESLsπx=logα∗−ψα∗E21
and
PESL2πx=1α∗−1+logα∗−1−ψα∗,E22
depends on the digamma function ψ⋅. Finally, the numerical simulations exemplify that PESLsπx and PESL2πx depend only on α and n and do not depend on μ, β, and x; the estimators δsπx are unanimously smaller than the estimators δ2πx; and PESLsπx are unanimously smaller than PESL2πx.
For the hierarchical Poisson and gamma model (43), [20] first calculates the posterior distribution of θ, πθx, and the marginal pmf of x, πx, in Theorem 1 of their paper. [20] then calculates the Bayesian posterior estimators δsπx and δ2πx, and the PESLs PESLsπx and PESL2πx, and they satisfy two inequalities. After that, the estimators of the hyperparameters of the model (10) by the moment method α1n and β1n are summarized in Theorem 2 of their paper. Moreover, the estimators of the hyperparameters of the model (10) by the maximum likelihood estimator (MLE) method α2n and β2n are summarized in Theorem 3 of their paper. Finally, the empirical Bayesian estimators of the parameter of the model (10) under Stein’s loss function by the moment method and the MLE method are summarized in Theorem 4 of their paper. In numerical simulations of [20], they have illustrated the two inequalities of the Bayesian posterior estimators and the PESLs, the moment estimators and the MLEs are consistent estimators of the hyperparameters, and the goodness of fit of the model to the simulated data. The numerical results indicate that the MLEs are better than the moment estimators when estimating the hyperparameters. Finally, [20] exploits the attendance data on 314 high school juniors from two urban high schools to illustrate their theoretical studies.
For the variance parameter θ of the normal distribution with a normal-inverse-gamma prior (11), [23] recommends and analytically calculates the Bayesian posterior estimator, δsπx, with respect to a conjugate prior μ∣θ∼N(μ0,θ/κ0), and θ∼IGv0/2v0σ02/2 under Stein’s loss function which penalizes gross overestimation and gross underestimation equally. This estimator minimizes the PESL. As comparisons, the Bayesian posterior estimator, δ2πx=Eθx, with respect to the same conjugate prior under the squared error loss function, and the PESL at δ2πx, are calculated. The calculations of δsπx, δ2πx, PESLsπx, and PESL2πx depend only on Eθx, Eθ−1x, and Elogθx. The numerical simulations exemplify their theoretical studies that the PESLs depend only on v0 and n, but do not depend on μ0, κ0, σ0, and especially x. The estimators δ2πx are unanimously larger than the estimators δsπx, and PESL2πx are unanimously larger than PESLsπx. Finally, [23] calculates the Bayesian posterior estimators and the PESLs of the monthly simple returns of the Shanghai Stock Exchange (SSE) Composite Index, which also exemplify the theoretical studies of the two inequalities of the Bayesian posterior estimators and the PESLs.
3.1.2 Multidimensional case
For estimating a covariance matrix which is assumed to be positive definite, many researchers exploit the multidimensional Stein’s loss function (e.g., see [2, 8, 24, 25, 26, 27, 28, 29, 30, 31]). The multidimensional Stein’s loss function (see [2]) is originally defined to estimate the p×p unknown covariance matrix Σ by Σ̂ with the loss function:
LΣΣ̂=trΣ−1Σ̂−logdetΣ−1Σ̂−p.E23
When p=1, the multidimensional Stein’s loss function reduces to
Lsσ2a=aσ2−logaσ2−1,E24
which is in the form of (13), the one-dimensional Stein’s loss function.
3.2 Power-power loss function
The Bayesian estimator under the power-power loss function, δpπx, minimizes the posterior expected power-power loss (PEPL) (see [11]), ELpθax, that is,
δpπx=argmina∈AELpθax,E25
where Aax:ax>0 is the action space, a=ax>0 is an action (estimator),
Lpθa=aθ+θa−2E26
is the power-power loss function, and θ>0 is the unknown parameter of interest. The PEPL is easy to obtain (see [11]):
PEPLπax=ELpθax=aE1θx+1aEθx−2.E27
It is found in [11] that
δpπx=EθxE1θxE28
by taking partial derivative of the PEPL with respect to a and setting it to 0. The PEPLs evaluated at the Bayesian estimators are (see [11])
PEPLpπx=ELpθaxa=δpπx,PEPL2πx=ELpθaxa=δ2πx.E29
The power-power loss function is proposed in [11], and it has all the seven properties proposed in his paper. More specifically, it penalizes gross overestimation and gross underestimation equally, is convex in its argument, and has balanced convergence rates or penalties for its argument too large and too small. Therefore, it is recommended for the positive restricted parameter space Θ=0∞.
Advertisement
4. Bayesian estimation for θ∈01
There are some hierarchical models where the unknown parameter of interest is θ∈Θ=01. As pointed out in the introduction, we should calculate and use the Bayesian estimator of the parameter θ under the power-log loss function or Zhang’s loss function because they penalize gross overestimation and gross underestimation equally. We list two such hierarchical models as follows.
Model (d) (beta-binomial model). This hierarchical model has been investigated by [1, 12, 13, 16, 32, 33]. Suppose that X1,X2,…,Xn are from the beta-binomial model:
Xi∣θ∼iidBinmθ,i=1,2,…,n,θ∼Beαβ,E30
where α>0 and β>0 are known constants, m is a known positive integer, θ∈01 is the unknown parameter of interest, Beαβ is the beta distribution, and Binmθ is the binomial distribution. Specifically, Bayesian estimation of θ under the prior Beαβ is studied in Example 7.2.14 (p. 324) of [16] and in Tables 3.3.1 (p. 121) and 4.2.1 (p. 176) of [1]. However, they only calculate the Bayesian estimator with respect to Beαβ prior under the squared error loss, δ2πx=Eθx. Moreover, they only consider one observation. The beta-binomial model has been investigated recently. For instance, [32] uses the beta-binomial to draw the random removals in progressive censoring; [12, 13] use the beta-binomial to model some magazine exposure data for the monthly magazine Signature; [33] develops estimation procedure for the parameters of a zero-inflated overdispersed binomial model in the presence of missing responses.
Model (e) (beta-negative binomial model). This hierarchical model has been investigated by [1, 34]. Suppose that X1,X2,…,Xn are from the beta-negative binomial model:
Xi∣θ∼iidNBmθ,i=1,2,…,n,θ∼Beαβ,E31
where α>0 and β>0 are known constants, m is a known positive integer, θ∈01 is the unknown parameter of interest, Beαβ is the beta distribution, and NBmθ is the negative binomial distribution. Specifically, Bayesian estimation of θ under the prior Beαβ is studied in Tables 3.3.1 (p. 121) and 4.2.1 (p. 176) of [1]. However, he only calculates the Bayesian estimator with respect to Beαβ prior under the squared error loss function, δ2πx=Eθx. Moreover, he only considers one observation.
4.1 Power-log loss function
A good loss function Lθa=Laθ=Lxx=a/θ for Θ=01 should have the six properties summarized in Table 1 (see Table 1 in [12]).
Properties | Lx | Laθ |
---|
(a) | Lx≥0 for all 0<x<1θ | Laθ≥0 for all 0<a<1 |
(b) | L1=0 | Lθθ=Laθa=θ=0 |
(c) | L1θ−=limx→1θ−Lx=∞ | L1−θ=lima→1−Laθ=∞ |
(d) | L0+=limx→0+Lx=∞ | L0+θ=lima→0+Laθ=∞ |
(e) | Convex in x for all 0<x<1θ | Convex in a for all 0<a<1 |
(f) | L'1=dLxdxx=1=0 | ∂∂aLaθa=θ=0 |
Table 1.
(Table 1 in [12]) The six properties of a good loss function for Θ=01. 0<θ<1 is fixed.
In Table 1, property (a) means that any action a of the parameter θ should incur a nonnegative loss. Property (b) means that when x=a/θ=1, or a=θ, that is, a correctly estimates θ, the loss is 0. Property (c) means that when x=a/θ→1/θ−, that is, a is moving away from θ and tends to 1−, it will incur an infinite loss. Property (d) means that when x=a/θ→0+, that is, a is moving away from θ and tends to 0+, it will also incur an infinite loss. Properties (c) and (d) mean that the loss function will penalize gross overestimation and gross underestimation equally. Property (e) is useful in the proofs of some propositions of the minimaxity and the admissibility of the Bayesian estimator (see [1]). Property (f) means that 1 and θ are the local extrema of Lx and Laθ, respectively. Property (f) also implies that Lθ+Δaθ=oΔa, that is, the loss incurred by an action a=θ+Δa near θ (Δa≈0), is very small compared to Δa.
Let
gplx=1θ−121θ−x−logxandgpl1=1θ−1.E32
Define
Lplx=gplx−gpl1=1θ−121θ−x−logx−1θ−1.E33
Thus
Lplθa=Lplaθ=Lplxx=a/θ=1θ−121θ−aθ−logaθ−1θ−1=θ1θ−121−a−loga+logθ−1θ−1.E34
It is easy to check (see the supplement of [12]) that Lplθa=Lplaθ=Lplxx=a/θ, which is called the power-log loss function, satisfies all the six properties listed in Table 1. Consequently, the power-log loss function is a good loss function for Θ=01, and thus it is recommended for Θ=01.
We remark that the power-log loss function on Θ=01 is an analog of the power-log loss function on Θ=0∞, which is the popular Stein’s loss function.
The Bayesian estimator under the power-log loss function, δplπx, minimizes the posterior expected power-log loss (PEPLL) (see [12]), ELplθax, that is,
δplπx=argmina∈AELplθax,E35
where Aax:ax∈01 is the action space, a=ax∈01 is an action (estimator), Lplθa given by (34) is the power-log loss function, and θ∈01 is the unknown parameter of interest. The PEPLL is easy to obtain (see [12]):
PEPLLπax=ELplθax=E1x1−a−loga+E2x−E3x+1,E36
where
E1x=Eθ−11−θ2x>0,E2x=Elogθx<0,E3x=Eθ−1x>0.E37
It is found in [12] that
δplπx=2+E1x−E1xE1x+42E38
by taking partial derivative of the PEPLL with respect to a and setting it to 0. The PEPLLs evaluated at the Bayesian estimators are (see [12])
PEPLLplπx=ELplθaxa=δplπx,PEPLL2πx=ELplθaxa=δ2πx.E39
Finally, the numerical simulations and a real data example of some monthly magazine exposure data (see [35]) exemplify the theoretical studies of two size relationships about the Bayesian estimators and the PEPLLs in [12].
4.2 Zhang’s loss function
Zhang et al. [12] proposed six properties for a good loss function Lθa=Laθ=Lxx=a/θ on Θ=01. Apart from the six properties, [13] proposes the seventh property (balanced convergence rates or penalties for the argument too large and too small) for a good loss function on Θ=01. Moreover, the seven properties for a good loss function on Θ=01 are summarized in Table 1 of [13]. The explanations of the first six properties in Table 1 of [13] can be found in the previous subsection (see also [12]). In Table 1 of [13], property (g) (the seventh property) means that Lk1θ1n and L1θ1−1n tend to ∞ at the same rate and Lk2θ1nθ and L1−1nθ tend to ∞ at the same rate. In other words,
limn→∞Lk1θ1nL1θ1−1n=1andlimn→∞Lk2θ1nθL1−1nθ=1.E40
And they say that Lk1θ1n and L1θ1−1n are asymptotically equivalent. Similarly, Lk2θ1nθ and L1−1nθ are said to be asymptotically equivalent. They also say that Lx (Laθ) has balanced convergence rates or penalties for x (a) too large and too small. It is worthy to note that k1θ1n→0 and 1θ1−1n→1θ at the same order O1n. Analogously, k2θ1n→0 and 1−1n→1 at the same order O1n. Finally, only when properties (c) and (d) hold, property (g) may hold.
Let
gzx=11θ−12x+11θ−xandgz1=1θ1θ−12.E41
Let
Lzx=gzx−gz1=11θ−12x+11θ−x−1θ1θ−12.E42
Thus
Lzθa=Lzaθ=Lzxx=a/θ=11θ−12aθ+11θ−aθ−1θ1θ−12=θ1θ−12a+θ1−a−1θ1θ−12.E43
It is easy to check (see the supplement of [13]) that Lzθa=Lzaθ=Lzxx=a/θ, which is called Zhang’s loss function, satisfies all the seven properties listed in Table 1 of [13]. Consequently, Zhang’s loss function is a good loss function, and thus it is recommended for Θ=01.
The Bayesian estimator under Zhang’s loss function, δzπx, minimizes the posterior expected Zhang’s loss (PEZL) (see [13]), ELzθax, that is,
δzπx=argmina∈AELzθax,E44
where Aax:ax∈01 is the action space, a=ax∈01 is an action (estimator), Lzθa given by (43) is Zhang’s loss function, and θ∈01 is the unknown parameter of interest. The PEZL is easy to obtain (see [13]):
PEZLπax=ELzθax=E1xa+E2x1−a−E3x,E45
where
E1x=Eθ31−θ2x,E2x=Eθx,E3x=Eθ1−θ2x.E46
It is found in [13] that
δzπx=E1xE1x+E2xE47
by taking partial derivative of the PEZL with respect to a and setting it to 0. The PEZLs evaluated at the Bayesian estimators are (see [13])
PEZLzπx=ELzθaxa=δzπx,PEZL2πx=ELzθaxa=δ2πx.E48
Zhang et al. [13] considers an example of some magazine exposure data for the monthly magazine Signature (see [12, 35]) and compares the numerical results with those of [12].
For the probability parameter θ of the beta-negative binomial model (31), [34] recommends and analytically calculates the Bayesian estimator δzπx, with respect to Beαβ prior under Zhang’s loss function which penalizes gross overestimation and gross underestimation equally. This estimator minimizes the PEZL. They also calculate the usual Bayesian estimator δ2πx=Eθx which minimizes the PESEL. Moreover, they also obtain the PEZLs evaluated at the two Bayesian estimators, PEZLzπx and PEZL2πx. After that, they show two theorems about the estimators of the hyperparameters of the beta-negative binomial model (31) when m is known or unknown by the moment method (Theorem 1 in [34]) and the MLE method (Theorem 2 in [34]). Finally, the empirical Bayesian estimator of the probability parameter θ under Zhang’s loss function is obtained with the hyperparameters estimated by the moment method or the MLE method from the two theorems.
In the numerical simulations of [34], they have illustrated three things: the two inequalities of the Bayesian posterior estimators and the PEZLs, the moment estimators and the MLEs, which are consistent estimators of the hyperparameters, and the goodness of fit of the beta-negative binomial model to the simulated data. Numerical simulations show that the MLEs are better than the moment estimators when estimating the hyperparameters in terms of the goodness of fit of the model to the simulated data. However, the MLEs are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators.
In the real data section of [34], they consider an example of some insurance claim data, which are assumed from the beta-negative binomial model (31). They consider four cases to fit the real data. In the first case, they assume that m=6 is known for illustrating purpose (of course, one can assume another known m value). In the other three cases, they assume that m is unknown, and they provide three approaches to handle this scenario. The first two approaches consider a range of m values, for instance, m=1,2,…,20. The first approach is to maximize the log-likelihood function. The second approach is to maximize the p-value of the goodness of fit of the model (31) to the real data. The third approach is to determine the hyperparameters α, β, and m from Theorems 1 and 2 in [34] by the moment method and the MLE method, respectively, when m is unknown. Four tables which show the number of claims, the observed frequencies, the expected probabilities, and the expected frequencies of the insurance claims data are provided to illustrate the four cases.
Advertisement
5. Inequalities among Bayesian posterior estimators
For the six loss functions, we have the corresponding six Bayesian estimators δw2πx, δplπx, δsπx, δpπx, δ2πx, and δzπx. Interestingly, for the six Bayesian estimators, we discover three strings of inequalities which are summarized in Theorem 1 (see Theorem 1 in [36]). To our surprise, an order between the two Bayesian estimators δw2πx and δplπx on Θ=01 does not exist. It is worthy to note that the three strings of inequalities only depend on the loss functions. Moreover, the inequalities are independent of the chosen models, and the used priors provided the Bayesian estimators exist, and thus they exist in a general setting which makes them quite interesting.
In this section, we compare the six Bayesian estimators δw2πx, δplπx, δsπx, δpπx, δ2πx, and δzπx. The domains of the loss functions, the six Bayesian estimators, the PELs, and the smallest PELs are summarized in Table 2 (see Table 1 in [36]). The six PELs are PEWSEL, PEPLL, PESL, PEPL, PESEL, and PEZL. In Table 2, each Bayesian estimator minimizes some corresponding PEL. Furthermore, the smallest PEL is the PEL evaluated at the corresponding Bayesian estimator.
Domain | Bayesian estimators | PELs | Smallest PELs |
---|
Θ=−∞∞ | δw2πx=E1θxE1θ2x | PEWSELπax=ELw2θax=E1θ2θ−a2x | PEWSELw2πx=PEWSELπaxa=δw2πx |
Θ=01 | δplπx=2+E1plx−E1plxE1plx+42with E1plx=E1−θ2θx>0 | PEPLLπax=ELplθax=Eθ1θ−121−a−loga+logθ−1θ+1x | PEPLLplπx=PEPLLπaxa=δplπx |
Θ=0∞ | δsπx=1E1θx | PESLπax=ELsθax=Eaθ−logaθ−1x | PESLsπx=PESLπaxa=δsπx |
Θ=0∞ | δpπx=EθxE1θx | PEPLπax=ELpθax=Eaθ+θa−2x | PEPLpπx=PEPLπaxa=δpπx |
Θ=−∞∞ | δ2πx=Eθx | PESELπax=EL2θax=Eθ−a2x | PESEL2πx=PESELπaxa=δ2πx |
Θ=01 | δzπx=E1zxE1zx+E2zxwithE1zx=Eθ31−θ2xandE2zx=Eθx | PEZLπax=ELzθax=Eθ1θ−12a+θ1−a−1θ1θ−12x | PEZLzπx=PEZLπaxa=δzπx |
Table 2.
(Table 1 in [36]) The six Bayesian estimators, the PELs, and the smallest PELs.
It is easy to see that all the six loss functions are well defined on Θ=01, and thus all the six Bayesian estimators are well defined on Θ=01. There are only four loss functions defined on Θ=0∞, since the power-log loss function and Zhang’s loss function are only defined on Θ=01. Hence, only four Bayesian estimators are well defined on Θ=0∞. Moreover, only the weighted squared error loss function and the squared error loss function are defined on Θ=−∞∞, and therefore only two Bayesian estimators are well defined on Θ=−∞∞. Among the six Bayesian estimators, there exist three strings of inequalities which are summarized in the following theorem.
Theorem 1 (Theorem 1 in [36]). Assume the prior satisfies some regularity conditions so that the posterior expectations involved in the definitions of the six Bayesian estimators exist. Then forΘ=01, there exists a string of inequalities among the six Bayesian estimators:
maxδw2πxδplπx≤δsπx≤δpπx≤δ2πx≤δzπx.E49
Moreover, forΘ=0∞, there exists a string of inequalities among the four Bayesian estimators:
δw2πx≤δsπx≤δpπx≤δ2πx.E50
Finally, forΘ=−∞∞, there exists an inequality between the two Bayesian estimators:
δw2πx≤δ2πx.E51
The proof of Theorem 1 exploits a key, important, and unified tool, the covariance inequality (see Theorem 4.7.9 (p. 192) in [16]), and the proof can be found in the supplement of [36].
It is worthy to note that the six Bayesian estimators and the six smallest PELs are all functions of π, x, and the loss function. Because there exists three strings of inequalities among the six Bayesian estimators, we would wonder whether there exists a string of inequalities among the six smallest PELs, in other words, PEWSELw2πx, PEPLLplπx, PESLsπx, PEPLpπx, PESEL2πx, and PEZLzπx. The answer to this question is no! The numerical simulations of the smallest PELs exemplify this fact (see [36]).
Advertisement
6. Conclusions and discussions
In this chapter, we have investigated six loss functions: the squared error loss function, the weighted squared error loss function, Stein’s loss function, the power-power loss function, the power-log loss function, and Zhang’s loss function. Now we give some suggestions on the conditions for using each of the six loss functions. It is worthy to note that among the six loss functions, the first two loss functions are defined on Θ=−∞∞ and they penalize overestimation and underestimation equally on −∞∞, and thus we recommend to use them when the parameter space is −∞∞. Moreover, the middle two loss functions are defined on Θ=0∞, and they penalize gross overestimation and gross underestimation equally on 0∞, and thus we recommend to use them when the parameter space is 0∞. In particular, if one prefers the loss function to have balanced convergence rates or penalties for its argument too large and too small, then we recommend to use the power-power loss function on 0∞. Furthermore, the last two loss functions are defined on Θ=01, and they penalize gross overestimation and gross underestimation equally on 01, and thus we recommend to use them when the parameter space is 01. In particular, if one prefers the loss function to have balanced convergence rates or penalties for its argument too large and too small, then we recommend to use Zhang’s loss function on 01.
For each one of the six loss functions, we can find a corresponding Bayesian estimator, which minimizes the corresponding posterior expected loss. Among the six Bayesian estimators, there exist three strings of inequalities summarized in Theorem 1 (see also Theorem 1 in [36]). However, a string of inequalities among the six smallest PELs does not exist.
We summarize three hierarchical models where the unknown parameter of interest is θ∈Θ=0∞, that is, the hierarchical normal and inverse gamma model (9), the hierarchical Poisson and gamma model (10), and the hierarchical normal and normal-inverse-gamma model (11). In addition, we summarize two hierarchical models where the unknown parameter of interest is θ∈Θ=01, that is, the beta-binomial model (30) and the beta-negative binomial model (31).
Now we give some suggestions on the selection of the hyperparameters. One way to select the hyperparameters is through the empirical Bayesian analysis, which relies on a conjugate prior modeling, where the hyperparameters are estimated from the observations and the “estimated prior” is then used as a regular prior in the later inference. The marginal distribution can then be used to recover the prior distribution from the observations. For empirical Bayesian analysis, two common methods are used to obtain the estimators of the hyperparameters, that is, the moment method and the MLE method. Numerical simulations show that the MLEs are better than the moment estimators when estimating the hyperparameters in terms of the goodness of fit of the model to the simulated data. However, the MLEs are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators.
Advertisement
Acknowledgments
The research was supported by the Fundamental Research Funds for the Central Universities (2019CDXYST0016; 2018CDXYST0024), China Scholarship Council (201606055028), National Natural Science Foundation of China (11671060), and MOE project of Humanities and Social Sciences on the west and the border area (14XJC910001).
Advertisement
Conflict of interest
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.