(Table 1 in [12]) The six properties of a good loss function for

## Abstract

In this chapter, we have investigated six loss functions. In particular, the squared error loss function and the weighted squared error loss function that penalize overestimation and underestimation equally are recommended for the unrestricted parameter space −∞∞; Stein’s loss function and the power-power loss function, which penalize gross overestimation and gross underestimation equally, are recommended for the positive restricted parameter space 0∞; the power-log loss function and Zhang’s loss function, which penalize gross overestimation and gross underestimation equally, are recommended for 01. Among the six Bayesian estimators that minimize the corresponding posterior expected losses (PELs), there exist three strings of inequalities. However, a string of inequalities among the six smallest PELs does not exist. Moreover, we summarize three hierarchical models where the unknown parameter of interest belongs to 0∞, that is, the hierarchical normal and inverse gamma model, the hierarchical Poisson and gamma model, and the hierarchical normal and normal-inverse-gamma model. In addition, we summarize two hierarchical models where the unknown parameter of interest belongs to 01, that is, the beta-binomial model and the beta-negative binomial model. For empirical Bayesian analysis of the unknown parameter of interest of the hierarchical models, we use two common methods to obtain the estimators of the hyperparameters, that is, the moment method and the maximum likelihood estimator (MLE) method.

### Keywords

- Bayesian estimators
- power-log loss function
- power-power loss function
- restricted parameter spaces
- Stein’s loss function
- Zhang’s loss function

## 1. Introduction

In Bayesian analysis, there are four basic elements: the data, the model, the prior, and the loss function. A Bayesian estimator minimizes some posterior expected loss (PEL) function. We confine our interests to six loss functions in this chapter: the squared error loss function (well known), the weighted squared error loss function ([1], p. 78), Stein’s loss function [2, 3, 4, 5, 6, 7, 8, 9, 10], the power-power loss function [11], the power-log loss function [12], and Zhang’s loss function [13]. It is worthy to note that among the six loss functions, the first and second loss functions are defined on

The squared error loss function and the weighted squared error loss function have been used by many authors for the problem of estimating the variance,

For

Analogously, for a restricted parameter space

The rest of the chapter is organized as follows. In Section 2, we obtain two Bayesian estimators for

## 2. Bayesian estimation for θ ∈(−∞,∞)

There are two loss functions which are defined on

### 2.1 Squared error loss function

The Bayesian estimator under the squared error loss function (well known),

where

is the squared error loss function, and

It is found in [16] that

by taking partial derivative of the PESEL with respect to

### 2.2 Weighted squared error loss function

The Bayesian estimator under the weighted squared error loss function,

where

is the weighted squared error loss function, and

It is found in [1] that

by taking partial derivative of the PEWSEL with respect to

## 3. Bayesian estimation for θ ∈ (0,∞)

There are many hierarchical models where the parameter of interest is

** Model (a) (hierarchical normal and inverse gamma model)**. This hierarchical model has been investigated by [10, 16, 17]. Suppose that we observe

where

** Model (b) (hierarchical Poisson and gamma model)**. This hierarchical model has been investigated by [1, 16, 19, 20]. Suppose that

where

** Model (c) (hierarchical normal and normal-inverse-gamma model)**. This hierarchical model has been investigated by [2, 21, 22]. Let the observations

where

### 3.1 Stein’s loss function

#### 3.1.1 One-dimensional case

The Bayesian estimator under Stein’s loss function,

where

is Stein’s loss function, and

It is found in [10] that

by taking partial derivative of the PESL with respect to

where

For the variance parameter

where

with respect to

with respect to

which is essential for the calculation of

and

depends on the digamma function

For the hierarchical Poisson and gamma model (43), [20] first calculates the posterior distribution of

For the variance parameter

#### 3.1.2 Multidimensional case

For estimating a covariance matrix which is assumed to be positive definite, many researchers exploit the multidimensional Stein’s loss function (e.g., see [2, 8, 24, 25, 26, 27, 28, 29, 30, 31]). The multidimensional Stein’s loss function (see [2]) is originally defined to estimate the

When

which is in the form of (13), the one-dimensional Stein’s loss function.

### 3.2 Power-power loss function

The Bayesian estimator under the power-power loss function,

where

is the power-power loss function, and

It is found in [11] that

by taking partial derivative of the PEPL with respect to

The power-power loss function is proposed in [11], and it has all the seven properties proposed in his paper. More specifically, it penalizes gross overestimation and gross underestimation equally, is convex in its argument, and has balanced convergence rates or penalties for its argument too large and too small. Therefore, it is recommended for the positive restricted parameter space

## 4. Bayesian estimation for θ ∈ 0 1

There are some hierarchical models where the unknown parameter of interest is

** Model (d) (beta-binomial model)**. This hierarchical model has been investigated by [1, 12, 13, 16, 32, 33]. Suppose that

where * Signature*; [33] develops estimation procedure for the parameters of a zero-inflated overdispersed binomial model in the presence of missing responses.

** Model (e) (beta-negative binomial model)**. This hierarchical model has been investigated by [1, 34]. Suppose that

where

### 4.1 Power-log loss function

A good loss function

Properties | ||
---|---|---|

(a) | ||

(b) | ||

(c) | ||

(d) | ||

(e) | Convex in | Convex in |

(f) |

In Table 1, property (a) means that any action

Let

Define

Thus

It is easy to check (see the supplement of [12]) that

We remark that the power-log loss function on

The Bayesian estimator under the power-log loss function,

where

where

It is found in [12] that

by taking partial derivative of the PEPLL with respect to

Finally, the numerical simulations and a real data example of some monthly magazine exposure data (see [35]) exemplify the theoretical studies of two size relationships about the Bayesian estimators and the PEPLLs in [12].

### 4.2 Zhang’s loss function

Zhang et al. [12] proposed six properties for a good loss function

And they say that

Let

Let

Thus

It is easy to check (see the supplement of [13]) that

The Bayesian estimator under Zhang’s loss function,

where

where

It is found in [13] that

by taking partial derivative of the PEZL with respect to

Zhang et al. [13] considers an example of some magazine exposure data for the monthly magazine * Signature* (see [12, 35]) and compares the numerical results with those of [12].

For the probability parameter * m* is known or unknown by the moment method (Theorem 1 in [34]) and the MLE method (Theorem 2 in [34]). Finally, the empirical Bayesian estimator of the probability parameter

In the numerical simulations of [34], they have illustrated three things: the two inequalities of the Bayesian posterior estimators and the PEZLs, the moment estimators and the MLEs, which are consistent estimators of the hyperparameters, and the goodness of fit of the beta-negative binomial model to the simulated data. Numerical simulations show that the MLEs are better than the moment estimators when estimating the hyperparameters in terms of the goodness of fit of the model to the simulated data. However, the MLEs are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators.

In the real data section of [34], they consider an example of some insurance claim data, which are assumed from the beta-negative binomial model (31). They consider four cases to fit the real data. In the first case, they assume that * m* values, for instance,

## 5. Inequalities among Bayesian posterior estimators

For the six loss functions, we have the corresponding six Bayesian estimators

In this section, we compare the six Bayesian estimators ** Table 1** in [36]). The six PELs are PEWSEL, PEPLL, PESL, PEPL, PESEL, and PEZL. In Table 2, each Bayesian estimator minimizes some corresponding PEL. Furthermore, the smallest PEL is the PEL evaluated at the corresponding Bayesian estimator.

Domain | Bayesian estimators | PELs | Smallest PELs |
---|---|---|---|

It is easy to see that all the six loss functions are well defined on

Theorem 1 (Theorem 1 in [36]). * there exists a string of inequalities among the six Bayesian estimators*:

* , there exists a string of inequalities among the four Bayesian estimators*:

* , there exists an inequality between the two Bayesian estimators*:

The proof of Theorem 1 exploits a key, important, and unified tool, the covariance inequality (see Theorem 4.7.9 (p. 192) in [16]), and the proof can be found in the supplement of [36].

It is worthy to note that the six Bayesian estimators and the six smallest PELs are all functions of

## 6. Conclusions and discussions

In this chapter, we have investigated six loss functions: the squared error loss function, the weighted squared error loss function, Stein’s loss function, the power-power loss function, the power-log loss function, and Zhang’s loss function. Now we give some suggestions on the conditions for using each of the six loss functions. It is worthy to note that among the six loss functions, the first two loss functions are defined on

For each one of the six loss functions, we can find a corresponding Bayesian estimator, which minimizes the corresponding posterior expected loss. Among the six Bayesian estimators, there exist three strings of inequalities summarized in Theorem 1 (see also Theorem 1 in [36]). However, a string of inequalities among the six smallest PELs does not exist.

We summarize three hierarchical models where the unknown parameter of interest is

Now we give some suggestions on the selection of the hyperparameters. One way to select the hyperparameters is through the empirical Bayesian analysis, which relies on a conjugate prior modeling, where the hyperparameters are estimated from the observations and the “estimated prior” is then used as a regular prior in the later inference. The marginal distribution can then be used to recover the prior distribution from the observations. For empirical Bayesian analysis, two common methods are used to obtain the estimators of the hyperparameters, that is, the moment method and the MLE method. Numerical simulations show that the MLEs are better than the moment estimators when estimating the hyperparameters in terms of the goodness of fit of the model to the simulated data. However, the MLEs are very sensitive to the initial estimators, and the moment estimators are usually proved to be good initial estimators.

## Acknowledgments

The research was supported by the Fundamental Research Funds for the Central Universities (2019CDXYST0016; 2018CDXYST0024), China Scholarship Council (201606055028), National Natural Science Foundation of China (11671060), and MOE project of Humanities and Social Sciences on the west and the border area (14XJC910001).

## Conflict of interest

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

## References

- 1.
Robert CP. The Bayesian Choice: From Decision-Theoretic Motivations to Computational Implementation. 2nd paperback ed. New York: Springer; 2007 - 2.
James W, Stein C. Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1. 1961. pp. 361-380 - 3.
Brown LD. Inadmissibility of the usual estimators of scale parameters in problems with unknown location and scale parameters. The Annals of Mathematical Statistics. 1968; 39 :29-48 - 4.
Brown LD. Comment on the paper by maatta and casella. Statistical Science. 1990; 5 :103-106 - 5.
Parsian A, Nematollahi N. Estimation of scale parameter under entropy loss function. Journal of Statistical Planning and Inference. 1996; 52 :77-91 - 6.
Petropoulos C, Kourouklis S. Estimation of a scale parameter in mixture models with unknown location. Journal of Statistical Planning and Inference. 2005; 128 :191-218 - 7.
Oono Y, Shinozaki N. On a class of improved estimators of variance and estimation under order restriction. Journal of Statistical Planning and Inference. 2006; 136 :2584-2605 - 8.
Ye RD, Wang SG. Improved estimation of the covariance matrix under stein’s loss. Statistics & Probability Letters. 2009; 79 :715-721 - 9.
Bobotas P, Kourouklis S. On the estimation of a normal precision and a normal variance ratio. Statistical Methodology. 2010; 7 :445-463 - 10.
Zhang YY. The bayes rule of the variance parameter of the hierarchical normal and inverse gamma model under stein’s loss. Communications in Statistics-Theory and Methods. 2017; 46 :7125-7133 - 11.
Zhang YY. The bayes rule of the positive restricted parameter under the power-power loss with an application. Communications in Statistics-Theory and Methods. 2019; Under review - 12.
Zhang YY, Zhou MQ, Xie YH, Song WH. The bayes rule of the parameter in (0, 1) under the power-log loss function with an application to the beta-binomial model. Journal of Statistical Computation and Simulation. 2017; 87 :2724-2737 - 13.
Zhang YY, Xie YH, Song WH, Zhou MQ. The bayes rule of the parameter in (0, 1) under zhang’s loss function with an application to the beta-binomial model. Communications in Statistics-Theory and Methods. 2019. DOI: 10.1080/03610926.2019.1565840 - 14.
Stein C. Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Annals of the Institute of Statistical Mathematics. 1964; 16 :155-160 - 15.
Maatta JM, Casella G. Developments in decision-theoretic variance estimation. Statistical Science. 1990; 5 :90-120 - 16.
Casella G, Berger RL. Statistical Inference. 2nd ed. USA: Duxbury; 2002 - 17.
Lehmann EL, Casella G. Theory of Point Estimation. 2nd ed. New York: Springer; 1998 - 18.
Raiffa H, Schlaifer R. Applied Statistical Decision Theory. Cambridge: Harvard University Press; 1961 - 19.
Deely JJ, Lindley DV. Bayes empirical bayes. Journal of the American Statistical Association. 1981; 76 :833-841 - 20.
Zhang YY, Wang ZY, Duan ZM, Mi W. The empirical bayes estimators of the parameter of the poisson distribution with a conjugate gamma prior under stein’s loss function. Journal of Statistical Computation and Simulation. 2019. DOI: 10.1080/00949655.2019.1652606 - 21.
Mao SS, Tang YC. Bayesian Statistics. 2nd ed. Beijing: China Statistics Press; 2012 - 22.
Chen MH. Bayesian statistics lecture. Statistics Graduate Summer School. China: School of Mathematics and Statistics; Northeast Normal University: Changchun; 2014 - 23.
Xie YH, Song WH, Zhou MQ, Zhang YY. The bayes posterior estimator of the variance parameter of the normal distribution with a normal-inverse gamma prior under stein’s loss. Chinese Journal of Applied Probability and Statistics. 2018; 34 :551-564 - 24.
Dey D, Srinivasan C. Estimation of a covariance matrix under stein’s loss. The Annals of Statistics. 1985; 13 :1581-1591 - 25.
Sheena Y, Takemura A. Inadmissibility of non-order-preserving orthogonally invariant estimators of the covariance matrix in the case of stein’s loss. Journal of Multivariate Analysis. 1992; 41 :117-131 - 26.
Konno Y. Estimation of a normal covariance matrix with incomplete data under stein’s loss. Journal of Multivariate Analysis. 1995; 52 :308-324 - 27.
Konno Y. Estimation of normal covariance matrices parametrized by irreducible symmetric cones under stein’s loss. Journal of Multivariate Analysis. 2007; 98 :295-316 - 28.
Sun XQ, Sun DC, He ZQ. Bayesian inference on multivariate normal covariance and precision matrices in a star-shaped model with missing data. Communications in Statistics-Theory and Methods. 2010; 39 :642-666 - 29.
Ma TF, Jia LJ, Su YS. A new estimator of covariance matrix. Journal of Statistical Planning and Inference. 2012; 142 :529-536 - 30.
Xu K, He DJ. Further results on estimation of covariance matrix. Statistics & Probability Letters. 2015; 101 :11-20 - 31.
Tsukuma H. Estimation of a high-dimensional covariance matrix with the stein loss. Journal of Multivariate Analysis. 2016; 148 :1-17 - 32.
Singh SK, Singh U, Sharma VK. Expected total test time and Bayesian estimation for generalized lindley distribution under progressively type-ii censored sample where removals follow the beta-binomial probability law. Applied Mathematics and Computation. 2013; 222 :402-419 - 33.
Luo R, Paul S. Estimation for zero-inflated beta-binomial regression model with missing response data. Statistics in Medicine. 2018; 37 :3789-3813 - 34.
Zhou MQ, Zhang YY, Sun Y, Sun J. The empirical bayes estimators of the probability parameter of the beta-negative binomial model under Zhang’s loss function. Computational Statistics and Data Analysis. 2019; Under review - 35.
Danaher PJ. A markov mixture model for magazine exposure. Journal of the American Statistical Association. 1989; 84 :922-926 - 36.
Zhang YY, Xie YH, Song WH, Zhou MQ. Three strings of inequalities among six bayes estimators. Communications in Statistics-Theory and Methods. 2018; 47 :1953-1961