Open access peer-reviewed chapter

A Study on the Comparison of the Effectiveness of the Jackknife Method in the Biased Estimators

By Nilgün Yıldız

Submitted: September 10th 2018Reviewed: November 2nd 2018Published: December 18th 2018

DOI: 10.5772/intechopen.82366

Downloaded: 390


In this study, we proposed an alternative biased estimator. The linear regression model might lead to ill-conditioned design matrices because of the multicollinearity and thus result in inadequacy of the ordinary least squares estimator (OLS). Scientists have developed alternative estimation techniques that would eradicate the instability in the estimates. Several biased estimators such as Stein estimator, the ordinary ridge regression (ORR) estimator, the principal components regression (PCR) estimator. Liu developed a Liu estimator (LE) by combining the Stein estimator with the ORR estimator. Since both ORR and LE depend on OLS estimator, multicollinearity affects them both. Therefore, the ORR and LE may give misleading information in the presence of multicollinearity. To overcome this problem, Liu introduced a new estimator, which is based on k and d biasing parameters, the authors worked on developing an estimator that would still have the valuable characteristics of the Liu-type estimator (LTE) but have a smaller bias. We are proposing a modified jackknife Liu-type estimator (MJLTE) that was created by combining the ideas underlying both the LTE and JLTE. Under mean square error matrix criteria, the MJLTE is superior to Liu-type estimator (LTE) and jackknifed Liu-type estimator (JLTE). Finally, a real data example and a Monte Carlo simulation are also given to illustrate theoretical results.


  • jackknifed estimators
  • jackknified Liu-type estimator
  • multicollinearity
  • MSE
  • Liu-type estimator

1. Introduction

With regression analysis; Is there a relationship between dependent and independent variables? If there is a relationship, what is the power of this relationship? What is the relationship between variables? Is it possible to predict prospective variables and how should they be estimated? What is the effect of a particular variable or group of variables on other variables or variables in the event that certain conditions are checked? Try to search for answers to questions such as. Linear regression is very important, popular method in statistics. According to Web of Science, the number of publications about linear regression between 2014 and 2018 is given in Figure 1.

Figure 1.

Number of publications published between 2014 and 2018.

According to Figure 1, the number of studies conducted in 2014 is 12,381, while the number of studies conducted in 2018 is 13,137.

The number of publications about linear regression by document types is given in Figure 2.

Figure 2.

Number of publications by document types.

The most common type of document about linear regression is the article. This is followed by proceeding paper, review, and editorial material.

The number of publications about linear regression by research area is given in Figure 3.

Figure 3.

Number of publications by research area.

The most widely published area related to linear regression is engineering, followed by mathematics, computer science, environmental sciences, ecology and other scientific fields.

The number of publications about linear regression by countries is given in Figure 4.

Figure 4.

Number of publications by countries.

The countries with the most publications on linear regression are USA, China, England, Germany, Canada, Australia, respectively.

In regression analysis, the most commonly used method for estimating coefficients is ordinary least squares (OLS). We considered the multiple linear regression model given as


where yis n×1observable random vector, Xis a n×pmatrix of non-stochastic (independent) variables of rank p; βis p×1vector of unknown parameters associated with X, and εis a n×1vector of error terms with


In regression analysis, there are several methods to estimate unknown parameters. The most frequently used method is the least squares method (OLS). Apart from this method, there are three general estimation methods: maximum likelihood, generalized least squares, and best linear unbiased estimator BLUE [1].

Since the use of once very popular estimators such as the ordinary least squares (OLS) estimation has become limited due to multicollinearity, which makes them unstable and results in bias and reduced variance of the regression coefficients.

We can give, it is a linear (or close to linear) relationship between the independent variables as the definition of multicollinearity. In the regression analysis, multicollinearity leads to the following problems:

  • In the case of multicollinearity, linear regression coefficients are uncertain and the standard errors of these coefficients are infinite.

  • The regression coefficients of the multicollinearity increase the variance and covariance of OLS.

  • The value of the model R2is high but none of the independent variables is significant compared to the partial ttest.

  • The direction of the related independent variables’ relations with the dependent variable may contradict the theoretical and empirical expectations.

  • If independent variables are interrelated, some of them may need to be removed from the model. But what variables will be extracted? Removing an incorrect variable from the model will result in a model error. On the other hand, there are no simple rules that we can use to include and subtract the arguments in the model.

Methods for dealing with multicollinearity are collecting additional data, model respecification. Instead of two related variables, the sum of these two variables (as a single variable) can be taken and use of biased estimators. In this book provides information on biased estimators used as OLS alternatives. In literature many researchers have developed biased regression estimators [2, 3].

Examples of such biased estimators are the ordinary ridge regression (ORR) estimator introduced by Hoerl and Kennard [4].


where kis a biasing parameter, in later years researchers combined various estimators to obtain better results. For example, Baye and Parker [5] introduced rkclass estimator, which combines the ORR and principal component regression (PCR). In addition, Baye and Parker also showed that rkclass estimator is superior to PCR estimator based on the scalar mean square error (SMSE) criterion.

Since both ORR and LE depend on OLS estimator, multicollinearity affects both of them. Therefore, the ORR and LE may give misleading information in the presence of multicollinearity. Liu estimator (LE) was developed by Liu [6] by combining the Stein [7] estimator with the ORR estimator.


To overcome this problem, Liu [8] introduced a new estimator, which is based on kand dbiasing parameters as follows


Next, the authors worked on developing an estimator that would still have valuable characteristics of the Liu-type estimator (LTE), but have a smaller bias. In 1956, Quenouille [9] suggested that it is possible to reduce bias by applying a jackknife procedure to a biased estimator.

This procedure enables processing of experimental data to get statistical estimator for unknown parameters. A truncated sample is used calculate specific function of estimators. The advantage of jackknife procedure is that it presents an estimator that has a small bias while still providing beneficial properties of large samples. In this article, we applied the jackknife technique to the LTE. Further, we established the mean squared error superiority of the proposed estimator over both the LTE and the jackknifed Liu-type estimator (JLTE).

The article is organized as follows: The model as well as LTE and the JLTE are described in Section 2. The proposed new estimator is introduced in Section 3. Superiority of the new estimator vis-a-vis the LTE and the JLTE are studied and the performance of the modified Jackknife Liu-type estimator (MJLTE) is compared to that of the JLTE in Section 4. Sections 5 and 6 consider a real data example and a simulation study to justify the superiority of the suggested estimator.


2. The model

We assume that two or more regressors in Xare closely linearly related, therefore model suffers from multicollinearity problem. A symmetric matrix S=XXhas an eigenvalue–eigenvector decomposition of the form S=TΛT, where Tis an orthogonal matrix and Λis (real) a diagonal matrix. The diagonal elements of Λare the eigenvalues of Sand the column vectors of Tare the eigenvectors of S. The orthogonal version of the standard multiple linear regression models is


where Z=XT, γ=Tβand ZZ=Λ. The ordinary least squares estimator (OLSE) of γis given by


Liu [8] proposed a new biased estimator for γ, called the Liu-type estimators (LTE), and defined as




γ̂LTEhas bias vector


and covariance matrix


By using Hinkley [10], Singh et al. [11], Nyquist [12], and Batah et al. [13] we can propose the jackknifed form of γ̂LTE. Quenouille [9] and Tukey [14] introduced the jackknife technique to reduce the bias. Hinkley [10] stated that with few exceptions, the jackknife had been applied to balanced models. After some algebraic manipulations, the corresponding jackknife estimator is obtained by deleting the ith observation ziyias


where ziis the ithrow of Z, ei=yiziγ̂LTEkdis the Liu-type residual, wi=ziA1ziis the distance factor and A1=Λ+kI1IdΛ1=FkdΛ1. In the view of the non-zero value of wireflecting the lack of balance in the model, we use the weighted jackknife procedure. Thus, weighted pseudo values are defined as


the weighted jackknifed estimator of γ is obtained as


However, since IA1Λ=IΛ+kI1ΛdI=IFkd, we obtain


From (9) we have


Variance of the JLTE as,


MSEMs of the JLTE and LTE as

MSEMγ̂JLTEkd=Covγ̂JLTEkd+Biasγ̂JLTEkdγ̂JLTEkd=σ22IFkdFkdΛ1Fkd2IFkd+ Fkd2γγIFkd2E20

3. Our novel MJLTE estimator

In this section, Yıldız [15] propose a new estimator for γ. The proposed estimator is designated as the modified jackknifed Liu-type estimator (MJLTE) denoted by γ̂MJLTEkd


It may be noted that the proposed estimator MJLTE in (22) is obtained as in the case of JLTE but by plugging in the LTE instead of the OLSE. The expressions for bias, covariance and mean squared error matrix (MSEM) of γ̂MJLTEkdare obtained as


where W=I+k+dΛ+kI1k+d2Λ+kI2=I+FkdFkd2and Φ=2IFkdFkd2

4. Properties of the MJLTE

One of the most prominent features of our novel MJLTE estimator is that its bias, under some conditions, is less than LTE estimator from which it originates from.

Theorem 4.1.Under the model (1) with the assumptions (2), the inequality

Biasγ̂MJLTEkd2<Biasγ̂LTEkd2holds true for d>0and k>d

Proof. From 11 and 23, we can obtain that


It is obvious that the difference is greater than 0, because it consists of the product of the squares in the expression above. Thus, the proof is completed.

Corollary 4.1.The bias of the absolute value of the ithcomponent of MJLTE is smaller than that of LTE, namelyBias(γ̂MJLTEkdi<Bias(γ̂LTEkdi.

Theorem 4.2.The MJLTE has smaller variance than the LTE

Proof. From 12 and 24, it can be shown that




His a diagonal matrix and ith element


is a positive number. Thus we conclude that H is a positive definite matrix. This completes the proof.

Next, we prove necessary and sufficient condition for the MJLTE to outperform the LTE using the MSEM condition. The proof requires the following lemma.

Lemma 4.1.Let Mbe a positive definite matrix, namely M>0, αbe some vector, then

Mαα0if and only if αM1α1

Proof. see Farebrother[16]

Theorem 4.3.MJLTE is superior to the LTE in the MSEM sense, namelyMSEMγ̂LTEkdMSEMγ̂MJLTEkd>0, if the inequality

Δ1=MSEMγ̂LTEkdMSEMγ̂MJLTEkdis nonnegative definite matrix if and if the inequality


is satisfied withL=FkdW,Fkd=FkdIandW=I+FkdFkd2


We consider the difference from (21, 25) we have




W=I+FkdFkd2is a positive definite matrix. We have seen His a positive definite matrix from Theorem 2. Therefore, the difference Δ1is a nonnegative definite, if and only if L1Δ1L1is a nonnegative definite. The matrix L1Δ1L1can be written as


Since the matrix σ2H+FkdγγFkdis symmetric and positive definite, using Lemma 4.1, we may conclude that L1Δ1L1is a nonnegative definite, if and only if the inequality


is satisfied.

4.1 Comparison between the JLTE and the MJLTE

Here, we show that the MJLTE outperforms the JLTE in terms of the sampling variance.

Theorem 4.4.The variance ofMJLTE has a smaller variance than that of the JLTE ford>0andk>d

Proof.From (19, 24) it can be written as




where V=IFkdand U=I+Fkd, respectively. It can be shown that


where Σ=VUΛ1VΛ1VUV,Σis a diagonal matrix. Then ith the diagonal element of Covγ̂JLTEkdCovγ̂MJLTEkdis


Hence of Covγ̂JLTEkdCovγ̂MJLTEkd>0which completes the proof.

In the following theorem, we have obtained a necessary and sufficient condition for the MJLTE to outperform the JLTE in terms of matrix mean square error. The proof of the theorem is similar to that of Theorem 4.3.

Theorem 4.5.

Δ2=MSEMγ̂JLTEkdMSEMγ̂MJLTEkdis a nonnegative definite matrix, if and if the inequality


is satisfied.

Proof. From (20, 25) we have


We have seen from Theorem 4.4 that Σis a positive definite matrix. Therefore, the difference Δ2is a nonnegative definite, if and only if L1Δ2L1is a nonnegative definite. The matrix L1Δ2L1can be written as


The difference Δ2is a nonnegative definite matrix, if and only if L1Δ2L1is a nonnegative definite matrix. Since the matrix σ2Σ+Fkd2γγFkd2is symmetric and positive definite, using Lemma 4.1, we may conclude that L1Δ2L1is nonnegative definite, if and only if the inequality


is satisfied. This confirms our validation. Theorems 4–6 showed that the estimator we proposed was superior to the LTE estimator and JLTE estimator. Accordingly, we can easily say that the MJLTE estimator is better than other estimators LTE, JLTE.

5. Numerical example

To motivate the problem of estimation in the linear regression model, we consider the hedonic prices of housing attributes. The data consists of 92 detached homes in the Ottawa area sold during 1987 (see Yatchew [17]).

Let y be the sale price (sp)of the house, Xbe a 92 × 9 observation matrix consisting of the variables: frplc: dummy forfireplace(s), grge: dummy forgarage, lux: dummy forluxury appointment, avginc:average neighborhood income, dhwy:distance to highway, lot area: area of lot, nrbed:number of bedrooms, usespc: usable space. The data are given in Table 1.


Table 1.

Data set.

The eigenvalues of the matrix XX: 9 × 9 are given by λ1=1.47, λ2=3.77, λ3=4.52, λ4=15.33, λ5=18.57, λ6=20.97, λ7=41.79, λ8=271.15and λ9=239153.68.

If we use the spectral norm, then the corresponding measure of conditioning of Xis the number κX=λmaxXX/λminXXwhereκ.1. We obtained κX=403.27, which is large and so Xmay be considered as being ill-conditioned.

In this case, the regression coefficients become insignificant and therefore, it is hard to make a valid inference or prediction using OLS method. To overcome many of the difficulties associated with the OLS estimates, the LTE. When β̂=XX1Xyand kand dare biasing parameters the use of β̂LTE=XX+kI1Xy+dβ̂, k>0, <d<+has become conventional. The LTE estimator will be used for the following example. The original model was used to reconstruct a canonical form as shown in (6) y=+ε. Estimators γ̂LTE,γ̂JLTEand γ̂MJLTEused data d=0.10,0.30,0.70,1and k=0.30,0.50,0.70,1. Then, the original variable scale was obtained by using the coefficients estimated by these estimators. The individual values of dand kfor the scalar MSE (SMSE = trace (MSEM)) of the estimators are shown in Tables 25. The effects of different values of don MSE can be seen in Figures 58 that clearly show that the proposed estimator (MJLTE) has smaller estimated MSE values compared to those of the LTE, JLTE.

d = 0.10d = 0.30d = 0.70d = 1

Table 2.

The estimated MSE values of LTE, JLTE and MJLTE k = 0.30.

d = 0.10d = 0.30d = 0.70d = 1

Table 3.

The estimated MSE values of LTE, JLTE and MJLTE k = 0.50.

d = 0.10d = 0.30d = 0.70d = 1

Table 4.

The estimated MSE values of LTE, JLTE and MJLTE k = 0.70.

d = 0.10d = 0.30d = 0.70d = 1

Table 5.

The estimated MSE values of LTE, JLTE and MJLTE k = 1.

Figure 5.

Various MSE of the proposed estimator compared to others for different values ofdwhenk=0.30.

Figure 6.

Various MSE of the proposed estimator compared to others for different values ofdwhenk=0.50.

Figure 7.

Various MSE of the proposed estimator compared to others for different values ofdwhenk=0.70.

Figure 8.

Various MSE of the proposed estimator compared to others for different values ofdwhenk=1.

We observed that for all values of d SMSE(MJLTE) assumed smaller values compared to both SMSE(JLTE) and SMSE(LTE). The estimators’ SMSE values are affected by increasing values of k, however the estimator that is affected the least by these changes is our proposed MJLTE estimator. When compared to the other two estimators, the SMSE values of MJLTE gave the best results for both the small and large values of k and d.


6. A simulation study

We want to illustrate the behavior of the proposed parameter estimator by a Monte Carlo simulation. The main purpose of this article is to demonstrate the construction and the details of the simulation which is designed to evaluate the performances of the estimators LTE, JLTE and MJLTE when the regressors are highly intercorrelated. According to Liu [8] and Kibria [18] the explanatory variables and response variable are generated by using the following equations


where zijis an independent standard normal pseudo-random number and pis specified so that correlation between any two explanatory variables is given by γ2. In this study, we used γ=0.90,0.95,0.99to investigate the effects of different degrees of collinearity with sample sizes n=20,50and 100, while four different combinations for kdare taken as (0.8, 0.5), (1, 0.7), (1.5, 0.9), (2, 1). The standard deviations considered in the simulation study are σ=0.1;1.0;10. For each choice of γ, σ2and n, the experiment was replicated 1000 times by generating new error terms. The average SMSE was computed using the following formula


Let us consider the LTE, JLTE and MJLTE and compute their respective estimated MSE values with the different levels of multicollinearity. According to the simulation results shown in Tables 4 and 5 for LTE, JLTE and MJLTE with increasing levels of multicollinearity there was a general increase in the estimated MSE values Moreover, increasing level of multicollinearity also lead to the increase in the MSE estimators for fixed dand k.

In Table 4, the MSE values of the estimators corresponding to different values of dare given for k = 0.70. For all values of d, the smallest MSE value appears to belong to the MJLTE estimator. The least affected by multicollinearity is MJLTE according to MSE criteria.

In Table 5, the MSE values of the estimators corresponding to different values of dare given for k = 1.

For all values of d, the smallest MSE value appears to belong to the MJLTE estimator. The least affected by multicollinearity is MJLTE according to MSE criteria.

We can see that MJLTE is much better than the competing estimator when the explanatory variables are severely collinear. Moreover, we can see that for all cases of LTE, JLTE and MJLTE in MSE criterion the MJLTE has smaller estimated MSE values than those of the LTE and JLTE.

7. Conclusion

In this paper, we combined the LTE and JLTE estimators to introduce a new estimator, which we called MJLTE. Combining the underlying criteria of LTE and JLTE estimators enabled us to create a new estimator for regression coefficients of a linear regression model that is affected by multicollinearity. Moreover, the use of jackknife procedure enabled as to produce an estimator with a smaller bias. We compared our MJLTE to its originators LTE and JLTE in terms of MSEM and found that MJLTE has a smaller variance compared to both LTE and JLTE. Thus, MJLTE is superior to both LTE and JLTE under certain conditions.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Nilgün Yıldız (December 18th 2018). A Study on the Comparison of the Effectiveness of the Jackknife Method in the Biased Estimators, Statistical Methodologies, Jan Peter Hessling, IntechOpen, DOI: 10.5772/intechopen.82366. Available from:

chapter statistics

390total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Introductory Chapter: Ramifications of Incomplete Knowledge

By Jan Peter Hessling

Related Book

First chapter

Introductory Chapter: Challenges of Uncertainty Quantification

By Jan Peter Hessling

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us