A Study on the Comparison of the Effectiveness of the Jackknife Method in the Biased Estimators

Nilgün Yıldız

doi:10.5772/intechopen.82366

Abstract

In this study, we proposed an alternative biased estimator. The linear regression model might lead to ill-conditioned design matrices because of the multicollinearity and thus result in inadequacy of the ordinary least squares estimator (OLS). Scientists have developed alternative estimation techniques that would eradicate the instability in the estimates. Several biased estimators such as Stein estimator, the ordinary ridge regression (ORR) estimator, the principal components regression (PCR) estimator. Liu developed a Liu estimator (LE) by combining the Stein estimator with the ORR estimator. Since both ORR and LE depend on OLS estimator, multicollinearity affects them both. Therefore, the ORR and LE may give misleading information in the presence of multicollinearity. To overcome this problem, Liu introduced a new estimator, which is based on k and d biasing parameters, the authors worked on developing an estimator that would still have the valuable characteristics of the Liu-type estimator (LTE) but have a smaller bias. We are proposing a modified jackknife Liu-type estimator (MJLTE) that was created by combining the ideas underlying both the LTE and JLTE. Under mean square error matrix criteria, the MJLTE is superior to Liu-type estimator (LTE) and jackknifed Liu-type estimator (JLTE). Finally, a real data example and a Monte Carlo simulation are also given to illustrate theoretical results.

Keywords

jackknifed estimators
jackknified Liu-type estimator
multicollinearity
MSE
Liu-type estimator

Author Information

Show +

Nilgün Yıldız*
- The Department of Mathematics, Faculty of Arts and Sciences, Marmara University Kuyubaşı, Istanbul, Turkey

*Address all correspondence to: ncelebiyil@gmail.com

1. Introduction

With regression analysis; Is there a relationship between dependent and independent variables? If there is a relationship, what is the power of this relationship? What is the relationship between variables? Is it possible to predict prospective variables and how should they be estimated? What is the effect of a particular variable or group of variables on other variables or variables in the event that certain conditions are checked? Try to search for answers to questions such as. Linear regression is very important, popular method in statistics. According to Web of Science, the number of publications about linear regression between 2014 and 2018 is given in Figure 1.

Figure 1.
Number of publications published between 2014 and 2018.

According to Figure 1, the number of studies conducted in 2014 is 12,381, while the number of studies conducted in 2018 is 13,137.

The number of publications about linear regression by document types is given in Figure 2.

Figure 2.
Number of publications by document types.

The most common type of document about linear regression is the article. This is followed by proceeding paper, review, and editorial material.

The number of publications about linear regression by research area is given in Figure 3.

Figure 3.
Number of publications by research area.

The most widely published area related to linear regression is engineering, followed by mathematics, computer science, environmental sciences, ecology and other scientific fields.

The number of publications about linear regression by countries is given in Figure 4.

Figure 4.
Number of publications by countries.

The countries with the most publications on linear regression are USA, China, England, Germany, Canada, Australia, respectively.

In regression analysis, the most commonly used method for estimating coefficients is ordinary least squares (OLS). We considered the multiple linear regression model given as

y = Xβ + ε E1

where y is n × 1 observable random vector, X is a n × p matrix of non-stochastic (independent) variables of rank p ; β is p × 1 vector of unknown parameters associated with X , and ε is a n × 1 vector of error terms with

E e = 0 , Cov e = σ 2 I E2

In regression analysis, there are several methods to estimate unknown parameters. The most frequently used method is the least squares method (OLS). Apart from this method, there are three general estimation methods: maximum likelihood, generalized least squares, and best linear unbiased estimator BLUE [1].

Since the use of once very popular estimators such as the ordinary least squares (OLS) estimation has become limited due to multicollinearity, which makes them unstable and results in bias and reduced variance of the regression coefficients.

We can give, it is a linear (or close to linear) relationship between the independent variables as the definition of multicollinearity. In the regression analysis, multicollinearity leads to the following problems:

In the case of multicollinearity, linear regression coefficients are uncertain and the standard errors of these coefficients are infinite.
The regression coefficients of the multicollinearity increase the variance and covariance of OLS.
The value of the model R 2 is high but none of the independent variables is significant compared to the partial t test.
The direction of the related independent variables’ relations with the dependent variable may contradict the theoretical and empirical expectations.
If independent variables are interrelated, some of them may need to be removed from the model. But what variables will be extracted? Removing an incorrect variable from the model will result in a model error. On the other hand, there are no simple rules that we can use to include and subtract the arguments in the model.

Methods for dealing with multicollinearity are collecting additional data, model respecification. Instead of two related variables, the sum of these two variables (as a single variable) can be taken and use of biased estimators. In this book provides information on biased estimators used as OLS alternatives. In literature many researchers have developed biased regression estimators [2, 3].

Examples of such biased estimators are the ordinary ridge regression (ORR) estimator introduced by Hoerl and Kennard [4].

β ̂ k = X ′ X + kI − 1 X ′ y k ≥ 0 E3

where k is a biasing parameter, in later years researchers combined various estimators to obtain better results. For example, Baye and Parker [5] introduced r − k class estimator, which combines the ORR and principal component regression (PCR). In addition, Baye and Parker also showed that r − k class estimator is superior to PCR estimator based on the scalar mean square error (SMSE) criterion.

Since both ORR and LE depend on OLS estimator, multicollinearity affects both of them. Therefore, the ORR and LE may give misleading information in the presence of multicollinearity. Liu estimator (LE) was developed by Liu [6] by combining the Stein [7] estimator with the ORR estimator.

β ̂ d = X ′ X + I − 1 X ′ y + d β ̂ 0 < d < 1 E4

To overcome this problem, Liu [8] introduced a new estimator, which is based on k and d biasing parameters as follows

β ̂ LTE = X ′ X + k I − 1 X ′ y + d β ̂ k > 0 , − ∞ < d < ∞ E5

Next, the authors worked on developing an estimator that would still have valuable characteristics of the Liu-type estimator (LTE), but have a smaller bias. In 1956, Quenouille [9] suggested that it is possible to reduce bias by applying a jackknife procedure to a biased estimator.

This procedure enables processing of experimental data to get statistical estimator for unknown parameters. A truncated sample is used calculate specific function of estimators. The advantage of jackknife procedure is that it presents an estimator that has a small bias while still providing beneficial properties of large samples. In this article, we applied the jackknife technique to the LTE. Further, we established the mean squared error superiority of the proposed estimator over both the LTE and the jackknifed Liu-type estimator (JLTE).

The article is organized as follows: The model as well as LTE and the JLTE are described in Section 2. The proposed new estimator is introduced in Section 3. Superiority of the new estimator vis-a-vis the LTE and the JLTE are studied and the performance of the modified Jackknife Liu-type estimator (MJLTE) is compared to that of the JLTE in Section 4. Sections 5 and 6 consider a real data example and a simulation study to justify the superiority of the suggested estimator.

2. The model

We assume that two or more regressors in X are closely linearly related, therefore model suffers from multicollinearity problem. A symmetric matrix S = X ′ X has an eigenvalue–eigenvector decomposition of the form S = T Λ T ′ , where T is an orthogonal matrix and Λ is (real) a diagonal matrix. The diagonal elements of Λ are the eigenvalues of S and the column vectors of T are the eigenvectors of S . The orthogonal version of the standard multiple linear regression models is

y = XTT ′ β + ε = Zγ + ε E6

where Z = XT , γ = T ′ β and Z ′ Z = Λ . The ordinary least squares estimator (OLSE) of γ is given by

γ ̂ = Z ′ Z − 1 Z ′ y = Λ − 1 Z ′ y E7

Liu [8] proposed a new biased estimator for γ , called the Liu-type estimators (LTE), and defined as

γ ̂ LTE k d = Λ + kI − 1 Z ′ y − d γ ̂ for k ≥ 0 and − ∞ ≤ d ≤ + ∞ = Λ + kI − 1 Z ′ y − d Λ − 1 Z ′ y = I − Λ + kI − 1 k + d γ ̂ = F k d γ ̂ E8

where

F k d = Λ + kI − 1 Λ − dI E10

γ ̂ LTE has bias vector

Bias γ ̂ LTE = F k d − I γ E11

and covariance matrix

Cov γ ̂ LTE = σ 2 F k d Λ − 1 F k d ′ E12

By using Hinkley [10], Singh et al. [11], Nyquist [12], and Batah et al. [13] we can propose the jackknifed form of γ ̂ LTE . Quenouille [9] and Tukey [14] introduced the jackknife technique to reduce the bias. Hinkley [10] stated that with few exceptions, the jackknife had been applied to balanced models. After some algebraic manipulations, the corresponding jackknife estimator is obtained by deleting the ith observation z i ′ y i as

A − z i z i ′ − 1 = A − 1 + A − 1 z i z i ′ A − 1 1 − z i ′ A − 1 z i

γ ̂ LTE − i k d = A − z i z i ′ − 1 Z ′ y − z i y i = A − 1 + A − 1 z i z i ′ A − 1 1 − z i ′ A − 1 z i Z ′ y − z i y i = A − 1 Z ′ y − A − 1 z i y i + A − 1 z i z i ′ A − 1 1 − z i ′ A − 1 z i Z ′ y − A − 1 z i z i ′ A − 1 1 − z i ′ A − 1 z i z i y i

= γ ̂ LTE k d + A − 1 z i y i 1 + z i ′ A − 1 z i 1 − z i ′ A − 1 z i + A − 1 z i z i ′ 1 − z i ′ A − 1 z i γ ̂ LTE k d = γ ̂ LTE k d − A − 1 z i A − 1 z i y i − z i ′ γ ̂ LTE k d 1 − z i ′ A − 1 z i = γ ̂ LTE k d − A − 1 z i e i 1 − w i E13

where z i ′ is the i th row of Z , e i = y i − z i ′ γ ̂ LTE k d is the Liu-type residual, w i = z i ′ A − 1 z i is the distance factor and A − 1 = Λ + kI − 1 I − d Λ − 1 = F k d Λ − 1 . In the view of the non-zero value of w i reflecting the lack of balance in the model, we use the weighted jackknife procedure. Thus, weighted pseudo values are defined as

Q i = γ ̂ LTE k d + n 1 − w i γ ̂ LTE k d − γ ̂ LTE − i ( k d )

the weighted jackknifed estimator of γ is obtained as

γ ̂ JLTE k d = 1 n ∑ i = 1 n Q i = γ ̂ LTE k d + A − 1 ∑ i = 1 n z i e i E14

∑ i = 1 n z i e i = ∑ i = 1 n z i y i − z i ′ γ ̂ LTE k d = I − A − 1 Z ′ y

γ ̂ JLTE k d = γ ̂ LTE k d + A − 1 Z ′ y − A − 1 Λ A − 1 Z ′ y = 2 I − A − 1 Λ γ ̂ LTE k d E15

However, since I − A − 1 Λ = I − Λ + k I − 1 Λ − dI = I − F k d , we obtain

γ ̂ JLTE k d = 2 I − F k d γ ̂ LTE k d E16

From (9) we have

γ ̂ JLTE k d = 2 I − F k d F k d γ ̂ E17

Bias γ ̂ JLTE k d = I − F k d 2 γ E18

Variance of the JLTE as,

Cov γ ̂ JLTE k d = σ 2 2 I − F k d F k d Λ − 1 F k d ′ 2 I − F k d ′ E19

MSEMs of the JLTE and LTE as

MSEM γ ̂ JLTE k d = Cov γ ̂ JLTE k d + Bias γ ̂ JLTE k d γ ̂ JLTE k d ′ = σ 2 2 I − F k d F k d Λ − 1 F k d ′ 2 I − F k d ′ + F k d 2 γγ ′ I − F k d 2 ′ E20

MSEM γ ̂ LTE k d = σ 2 F k d Λ − 1 F k d ′ + F k d − I ββ ′ F k d − I E21

3. Our novel MJLTE estimator

In this section, Yıldız [15] propose a new estimator for γ . The proposed estimator is designated as the modified jackknifed Liu-type estimator (MJLTE) denoted by γ ̂ MJLTE k d

γ ̂ MJLTE k d = I − k + d 2 Λ + k I − 2 I − k + d Λ + k I − 1 γ ̂ E22

It may be noted that the proposed estimator MJLTE in (22) is obtained as in the case of JLTE but by plugging in the LTE instead of the OLSE. The expressions for bias, covariance and mean squared error matrix (MSEM) of γ ̂ MJLTE k d are obtained as

Bias γ ̂ MJLTE k d = − k + d Λ + kI − 1 W Λ + kI − 1 γ E23

Cov γ ̂ MJLTE k d = σ 2 ΦΛ − 1 Φ ′ E24

MSEM γ ̂ MJLTE k d = σ 2 ΦΛ − 1 Φ ′ + k + d 2 Λ + kI − 1 W Λ + kI − 1 γγ ′ Λ + kI − 1 W Λ + kI − 1 ′ E25

where W = I + k + d Λ + kI − 1 − k + d 2 Λ + kI − 2 = I + F k d − F k d 2 and Φ = 2 I − F k d F k d 2

4. Properties of the MJLTE

One of the most prominent features of our novel MJLTE estimator is that its bias, under some conditions, is less than LTE estimator from which it originates from.

Theorem 4.1. Under the model (1) with the assumptions (2), the inequality

Bias γ ̂ MJLTE k d 2 < Bias γ ̂ LTE k d 2 holds true for d > 0 and k > d

Proof. From 11 and 23, we can obtain that

Bias γ ̂ MJLTE k d 2 − Bias γ ̂ LTE k d 2 = k + d 2 Λ + k I − 2 W 2 − Λ + k I 2 Λ + k I − 2 > 0

It is obvious that the difference is greater than 0, because it consists of the product of the squares in the expression above. Thus, the proof is completed.

Corollary 4.1. The bias of the absolute value of the i th component of MJLTE is smaller than that of LTE, namely Bias ( γ ̂ MJLTE k d i < Bias ( γ ̂ LTE k d i .

Theorem 4.2. The MJLTE has smaller variance than the LTE

Proof. From 12 and 24, it can be shown that

Cov γ ̂ LTE k d − Cov γ ̂ MJLTE k d = σ 2 H

where

H = I + k + d Λ + kI − 1 Λ − 1 I + k + d Λ + kI − 1 ′ − ΦΛ − 1 Φ ′ = I − F k d Λ − 1 − I − F k d 2 Λ − 1 ( I − F ( k d ) 2 ′ I − F k d

H is a diagonal matrix and ith element

h ii = λ i + k i 4 − λ i + 2 k i + d i 2 λ i − d i 2 λ i − d i 2 λ i λ i + k i 6

is a positive number. Thus we conclude that H is a positive definite matrix. This completes the proof.

Next, we prove necessary and sufficient condition for the MJLTE to outperform the LTE using the MSEM condition. The proof requires the following lemma.

Lemma 4.1. Let M be a positive definite matrix, namely M > 0 , α be some vector, then

M − αα ′ ≥ 0 if and only if α ′ M − 1 α ≤ 1

Proof. see Farebrother [16]

Theorem 4.3. MJLTE is superior to the LTE in the MSEM sense, namely MSEM γ ̂ LTE k d − MSEM γ ̂ MJLTE k d > 0 , if the inequality

Δ 1 = MSEM γ ̂ LTE k d − MSEM γ ̂ MJLTE k d is nonnegative definite matrix if and if the inequality

γ ′ L − 1 σ 2 H + F ∗ k d γγ ′ F ∗ k d ′ L − 1 ′ − 1 γ ≤ 1 E26

is satisfied with L = F ∗ k d W , F ∗ k d = F k d − I and W = I + F k d − F k d 2

Proof.

We consider the difference from (21, 25) we have

Δ 1 = MSEM γ ̂ LTE k d − MSEM γ ̂ MJLTE k d = σ 2 H + F ∗ k d γγ ′ F ∗ k d ′ − Lγγ ′ L ′ E27

where

H = I + k + d Λ + k I − 1 Λ − 1 I + k + d Λ + k I − 1 ′ − ΦΛ − 1 Φ ′ = I − F k d Λ − 1 − I − F k d 2 Λ − 1 ( I − F ( k d ) 2 ′ I − F k d

W = I + F k d − F k d 2 is a positive definite matrix. We have seen H is a positive definite matrix from Theorem 2. Therefore, the difference Δ 1 is a nonnegative definite, if and only if L − 1 Δ 1 L − 1 ′ is a nonnegative definite. The matrix L − 1 Δ 1 L − 1 ′ can be written as

L − 1 Δ 1 L − 1 ′ = L − 1 σ 2 H + F ∗ k d γγ ′ F ∗ k d ′ L − 1 ′ − γγ ′ E28

Since the matrix σ 2 H + F ∗ k d γγ ′ F ∗ k d ′ is symmetric and positive definite, using Lemma 4.1, we may conclude that L − 1 Δ 1 L − 1 ′ is a nonnegative definite, if and only if the inequality

γ ′ L − 1 σ 2 H + F ∗ k d γγ ′ F ∗ k d ′ L − 1 ′ − 1 γ ≤ 1

is satisfied.

4.1 Comparison between the JLTE and the MJLTE

Here, we show that the MJLTE outperforms the JLTE in terms of the sampling variance.

Theorem 4.4. The variance of MJLTE has a smaller variance than that of the JLTE for d > 0 and k > d

Proof. From (19, 24) it can be written as

Cov γ ̂ JLTE k d = σ 2 2 I − F k d F k d Λ − 1 F k d ′ 2 I − F k d ′ = σ 2 VU Λ − 1 U ′ V ′ E29

and

Cov γ ̂ MJLTE k d = σ 2 ΦΛ − 1 Φ ′ = σ 2 VUV Λ − 1 V ′ U ′ V ′ E30

where V = I − F k d and U = I + F k d , respectively. It can be shown that

Cov γ ̂ JLTE k d − Cov γ ̂ MJLTE k d = σ 2 Σ E31

where Σ = VU Λ − 1 − V Λ − 1 V ′ U ′ V ′ , Σ is a diagonal matrix. Then ith the diagonal element of Cov γ ̂ JLTE k d − Cov γ ̂ MJLTE k d is

σ 2 λ i + k + 2 d i 2 λ i − d i 2 k + d i 2 λ i + k + d i λ i + λ i + k i 6

Hence of Cov γ ̂ JLTE k d − Cov γ ̂ MJLTE k d > 0 which completes the proof.

In the following theorem, we have obtained a necessary and sufficient condition for the MJLTE to outperform the JLTE in terms of matrix mean square error. The proof of the theorem is similar to that of Theorem 4.3.

Theorem 4.5.

Δ 2 = MSEM γ ̂ JLTE k d − MSEM γ ̂ MJLTE k d is a nonnegative definite matrix, if and if the inequality

γ ′ L − 1 σ 2 Σ + F ∗ k d 2 γγ ′ F ∗ k d 2 ′ L − 1 ′ − 1 γ ≤ 1 E32

is satisfied.

Proof. From (20, 25) we have

Δ 2 = MSEM γ ̂ JLTE k d − MSEM γ ̂ MJLTE k d = σ 2 Σ + F ∗ k d 2 γγ ′ F ∗ k d 2 ′ − F ∗ k d Wγγ ′ W ′ F ∗ k d ′

We have seen from Theorem 4.4 that Σ is a positive definite matrix. Therefore, the difference Δ 2 is a nonnegative definite, if and only if L − 1 Δ 2 L − 1 ′ is a nonnegative definite. The matrix L − 1 Δ 2 L − 1 ′ can be written as

L − 1 Δ 2 L − 1 ′ = L − 1 σ 2 Σ + F ∗ k d 2 γγ ′ F ∗ k d 2 ′ L − 1 ′ − γγ ′

The difference Δ 2 is a nonnegative definite matrix, if and only if L − 1 Δ 2 L − 1 ′ is a nonnegative definite matrix. Since the matrix σ 2 Σ + F ∗ k d 2 γγ ′ F ∗ k d 2 ′ is symmetric and positive definite, using Lemma 4.1, we may conclude that L − 1 Δ 2 L − 1 ′ is nonnegative definite, if and only if the inequality

γ ′ L − 1 σ 2 Σ + F ∗ k d 2 γγ ′ F ∗ k d 2 ′ L − 1 ′ − 1 γ ≤ 1

is satisfied. This confirms our validation. Theorems 4–6 showed that the estimator we proposed was superior to the LTE estimator and JLTE estimator. Accordingly, we can easily say that the MJLTE estimator is better than other estimators LTE, JLTE.

5. Numerical example

To motivate the problem of estimation in the linear regression model, we consider the hedonic prices of housing attributes. The data consists of 92 detached homes in the Ottawa area sold during 1987 (see Yatchew [17]).

Let y be the sale price (sp) of the house, X be a 92 × 9 observation matrix consisting of the variables: frplc: dummy for fireplace(s), grge: dummy for garage, lux: dummy for luxury appointment, avginc: average neighborhood income, dhwy: distance to highway, lot area: area of lot, nrbed: number of bedrooms, usespc: usable space. The data are given in Table 1.

sellprix	fireplac	garage	luxbath	avginc	crowdist	ncrosdst	disthwy	lotarea	nrbed	usespace	south	west	nsouth	nwest
180	0	1	0	32.3163	0.93428	0	0.63807	3.63297	3	1.23309	0.84	0.409	0	0
135	0	1	0	31.5016	2.18624	0.13942	0.66452	6.5	3	0.84592	1.44	1.645	0.087171	0.16962
165.9	0	1	0	32.0654	2.31148	0.15337	0.43422	5.72	3	0.87508	1.673	1.595	0.12102	0.16276
101	0	0	0	37.8348	2.54381	0.17924	0.26872	3.136	2	0.71445	2.252	1.183	0.20514	0.10622
127	0	0	0	37.8348	2.5458	0.17947	0.16621	2.7	3	0.73789	2.193	1.293	0.19657	0.12131
235	1	0	1	67.0056	2.77147	0.2046	0.22935	6.695	4	1.3518	2.352	1.466	0.21967	0.14505
195	1	1	0	65.8278	3.08747	0.23979	0.30295	4.23	3	1.16829	2.564	1.72	0.25047	0.17991
184.5	1	0	0	62.3053	3.4844	0.28399	0.33009	4.224	3	0.98228	2.785	2.094	0.28258	0.23123
106	1	1	0	38.4946	3.83086	0.32258	0.87056	3.234	2	0.79507	2.148	3.172	0.19003	0.37917
156	1	1	0	52.3552	3.85306	0.32505	0.39006	5	3	0.90239	2.563	2.877	0.25033	0.33869
195	1	0	0	52.3552	3.88283	0.32836	0.67211	4.8	4	1.29465	2.363	3.081	0.22127	0.36668
206	1	1	0	52.3552	3.92493	0.33305	0.69495	4.75	4	1.7111	2.38	3.121	0.22374	0.37217
157	0	1	0	52.3552	3.95888	0.33683	0.43005	5	3	0.901	2.621	2.967	0.25875	0.35104
180	1	0	0	79.4583	3.96236	0.33722	0.12078	4	3	0.95527	3.015	2.571	0.316	0.29669
193	1	1	0	79.4583	3.9626	0.33725	0.19503	6.35545	3	1.50436	3.063	2.514	0.32297	0.28887
230	1	1	0	52.3552	3.97284	0.33839	0.39307	5	3	1.08421	2.661	2.95	0.26456	0.3487
212	0	1	1	53.8647	4.04329	0.34623	0.22395	5	4	1.08724	2.845	2.873	0.2913	0.33814
102	1	0	0	59.5774	4.11494	0.35421	1.18386	4	2	1.24813	2.113	3.531	0.18495	0.42843
137	1	1	0	59.5774	4.13141	0.35605	1.01187	2.4	3	0.753	2.285	3.442	0.20994	0.41622
187	0	1	0	59.5774	4.13723	0.35669	1.00413	6.10686	3	0.91936	2.297	3.441	0.21168	0.41608
103	0	0	0	59.5774	4.16338	0.35961	1.14419	4	2	0.68893	2.193	3.539	0.19657	0.42953
100	0	0	0	59.5774	4.22521	0.36649	1.22908	4	3	1.07483	2.169	3.626	0.19308	0.44147
152	1	1	0	39.7652	4.29688	0.37447	1.02071	9.9	4	1.01668	3.793	2.019	0.42903	0.22094
127	1	1	0	39.5229	4.52879	0.4003	0.6903	5.16	4	1.64743	2.897	3.481	0.29885	0.42157
119.5	0	1	0	35.6	4.59649	0.40784	1.39392	4.9	2	0.9768	4.206	1.854	0.48903	0.1983
103	0	0	0	39.5229	4.63841	0.41251	0.64774	4.6158	2	1.14358	3.023	3.518	0.31716	0.42665
99	0	1	0	30.6514	4.69599	0.41892	1.00164	5	4	0.8655	4.112	2.268	0.47537	0.25511
75	0	0	0	39.5	4.69941	0.4193	0.49382	1.891	3	1.1436	3.192	3.449	0.34171	0.41718
128	1	1	0	39.5229	4.73353	0.4231	0.7881	5	3	1.03911	2.992	3.668	0.31265	0.44723
132	1	0	0	35.6	4.75938	0.42598	1.23504	5.0892	3	1.32579	4.273	2.096	0.49877	0.23151
132	0	0	0	38.1216	4.80701	0.43128	0.35373	6.5415	4	1.25312	3.851	2.877	0.43745	0.33869
134	0	0	0	39.5	5.04763	0.45808	0.27429	4.725	3	0.80536	3.64	3.497	0.4068	0.42377
120	1	1	0	39.8732	5.08006	0.46169	0.58519	5	3	1.2452	4.208	2.846	0.48932	0.33443
125	0	1	0	39.8732	5.21855	0.47712	0.7121	6.96	2	0.62472	4.391	2.82	0.51591	0.33086
135	1	1	0	25.9545	5.23599	0.47906	1.01716	5.4	3	0.88273	4.563	2.568	0.5409	0.29628
139	1	1	0	38.24	5.26572	0.48237	0.92231	4.08	4	1.44414	3.335	4.075	0.36249	0.50309
151	0	1	0	34.4049	5.34316	0.49099	0.96285	6.6	3	1.03205	3.368	4.148	0.36728	0.51311
116.5	0	1	0	38.1216	5.53704	0.51258	0.51961	5.3	3	0.99325	4.544	3.164	0.53814	0.37807
137	1	1	0	50.0548	5.56767	0.516	0.70637	6.6	3	0.8773	3.757	4.109	0.4238	0.50775
149	1	1	0	55.8667	5.75411	0.53676	1.38296	5.6	3	1.19255	3.37	4.664	0.36757	0.58392
167	1	1	0	58.4763	5.8055	0.54248	1.20341	6.4152	4	1.48624	3.565	4.582	0.3959	0.57266
163	1	1	0	58.4763	5.8213	0.54424	1.03375	4.92	4	1.60065	3.716	4.481	0.41784	0.5588
147.5	0	1	0	58.4763	6.0749	0.57248	0.73326	5.1825	3	0.972	4.16	4.427	0.48235	0.55139
237.7	1	1	0	58.4763	6.0752	0.57252	0.62363	5.4495	4	1.99967	4.241	4.35	0.49412	0.54083
168	1	1	0	44.2475	6.2599	0.59309	1.49998	5	3	1.30275	3.706	5.045	0.41639	0.6362
180	1	1	0	58.4763	6.3173	0.59948	1.03263	5.3014	4	1.59782	4.135	4.776	0.47872	0.59929
156	1	0	0	57.7446	6.6398	0.63539	1.09965	5	3	1.41006	4.354	5.013	0.51053	0.63181
145	1	1	0	44.2475	6.6469	0.63618	1.67399	6.572	4	1.22169	3.891	5.389	0.44327	0.68341
144	1	1	0	57.6861	6.6854	0.64047	0.76067	6.825	4	1.689	5.616	3.627	0.69388	0.44161
140	1	1	0	57.6861	6.69752	0.64182	0.63376	6.3	3	1.14778	5.556	3.74	0.68517	0.45712
236	1	1	1	44.2475	6.7398	0.64653	1.59621	4.725	4	1.63184	4.037	5.397	0.46448	0.68451
172	1	1	0	53.5907	6.8231	0.65581	0.98439	5	4	1.22956	4.596	5.043	0.54569	0.63593
148	1	0	0	71.0269	7.1147	0.68828	1.42051	6	3	1.14276	4.501	5.51	0.53189	0.70001
153.5	1	1	0	71.0269	7.1224	0.68914	1.54559	7.3832	4	1.433	4.406	5.596	0.51809	0.71182
154	1	1	0	71.0269	7.157	0.69299	1.21892	5.292	3	1.44025	4.696	5.401	0.56022	0.68506
145.5	1	1	0	50.7101	7.3023	0.70917	0.6233	6.735	4	1.49297	6.044	4.098	0.75607	0.50624
149	1	0	0	50.7101	7.35272	0.71479	0.4738	6.0888	4	1.26687	6	4.25	0.74967	0.5271
138	1	0	0	50.7101	7.3538	0.71491	0.58063	5.3352	4	1.50339	6.062	4.163	0.75868	0.51516
141.5	1	0	0	52.6378	7.44672	0.72525	1.07227	6.54	4	1.03462	6.403	3.802	0.80822	0.46562
125	1	0	0	52.6378	7.46892	0.72773	0.9696	7.02	4	1.16772	6.368	3.903	0.80314	0.47948
130	0	1	0	52.6378	7.52266	0.73371	0.93924	6.5	4	1.09757	6.396	3.96	0.80721	0.48731
132	0	1	0	52.6378	7.55757	0.7376	0.91361	5.1	4	1.20377	6.411	4.002	0.80939	0.49307
132.9	1	1	0	52.6378	7.58783	0.74097	0.88591	6	3	1.06059	6.421	4.043	0.81084	0.4987
122	1	1	0	52.6378	7.6221	0.74479	0.85984	5	3	0.80494	6.435	4.085	0.81287	0.50446
162	1	1	0	51.5087	7.63628	0.74637	0.35916	4.992	4	1.4318	5.705	5.076	0.70681	0.64046
127.5	1	0	0	73.4464	7.6503	0.74793	0.68757	6.785	3	0.84692	5.488	5.33	0.67529	0.67531
87	0	0	0	42.3138	7.6785	0.75107	1.67919	4.6	4	0.74017	4.769	6.018	0.57083	0.76973
139.9	1	1	0	40.9246	7.90353	0.77613	0.40138	5.5	3	1.19235	6.366	4.684	0.80285	0.58666
240	1	1	1	41.95	7.9194	0.77789	1.16065	5.004	4	1.3972	5.316	5.87	0.6503	0.74942
134	1	1	0	52.6378	7.92504	0.77852	1.08098	4.2581	3	1.2298	6.776	4.11	0.86241	0.50789
136.5	1	0	0	42.3138	7.9381	0.77998	1.58184	4.625	3	1.06694	5.008	6.159	0.60555	0.78908
143.5	1	1	0	51.5087	7.98552	0.78526	0.19433	5	4	1.44972	6.038	5.226	0.75519	0.66104
140	1	1	0	57.5706	7.99308	0.7861	0.94096	7.856	3	1.03167	6.743	4.292	0.85762	0.53287
123	1	0	0	51.5087	8.02072	0.78918	0.7685	6.757	4	1.28782	5.665	5.678	0.701	0.72307
147	1	1	0	57.5706	8.06875	0.79453	0.5518	5.04	3	0.96054	6.565	4.691	0.83176	0.58762
134.9	1	1	0	57.5706	8.16678	0.80544	0.77684	5.185	4	1.35781	6.763	4.578	0.86053	0.57211
154	1	1	0	57.5706	8.23868	0.81345	0.85344	5.27	4	1.60938	6.855	4.57	0.87389	0.57102
143.9	1	0	0	40.9246	8.56769	0.85009	0.53129	5	3	0.91112	6.877	5.11	0.87709	0.64512
126	1	1	0	53.8029	8.62504	0.85648	0.48411	5	4	0.88838	6.885	5.195	0.87825	0.65679
118.5	1	0	0	40.0618	8.9486	0.89251	0.10317	5.1	3	1.05885	6.85	5.758	0.87317	0.73405
158	1	1	1	39.5262	8.9552	0.89325	1.86432	5.035	3	1.1192	5.425	7.125	0.66613	0.92164
118	0	0	0	37.1457	9.2964	0.93124	0.18611	5.035	3	0.93595	7.127	5.969	0.91341	0.763
109.25	0	0	0	30.9704	9.4572	0.94915	0.70329	5.4	3	0.67449	7.56	5.682	0.97632	0.72362
124	0	0	0	49.0297	9.5445	0.95887	0.43645	6	3	1.46683	6.863	6.633	0.87505	0.85412
137	1	0	0	35.5188	9.5795	0.96277	1.7827	5	3	0.97388	5.882	7.561	0.73253	0.98147
142	1	1	0	30.5844	9.6273	0.96809	1.59895	5	4	1.4761	6.056	7.484	0.75781	0.97091
120.5	0	0	0	26.9947	9.6483	0.97043	1.51784	5	3	0.69971	6.132	7.449	0.76885	0.9661
123	0	1	0	51.1569	9.7401	0.98066	1.79905	5	2	0.68574	5.97	7.696	0.74531	1
157.5	1	0	0	51.1569	9.744	0.98109	1.71459	5	4	1.07226	6.039	7.647	0.75534	0.99328
115	1	0	0	47.8688	9.7586	0.98272	0.92055	5	3	0.90163	6.651	7.141	0.84425	0.92384
126.5	1	1	0	55.2901	9.8628	0.99432	0.412	5	3	0.902	7.637	6.241	0.98751	0.80033
155	1	1	0	55.2901	9.9138	1	0.49461	5	3	1.41319	7.723	6.216	1	0.7969

Table 1.

Data set.

The eigenvalues of the matrix X ′ X : 9 × 9 are given by λ 1 = 1.47 , λ 2 = 3.77 , λ 3 = 4.52 , λ 4 = 15.33 , λ 5 = 18.57 , λ 6 = 20.97 , λ 7 = 41.79 , λ 8 = 271.15 and λ 9 = 239153.68 .

If we use the spectral norm, then the corresponding measure of conditioning of X is the number κ X = λ max X ′ X / λ min X ′ X where κ . ∈ 1 ∞ . We obtained κ X = 403.27 , which is large and so X may be considered as being ill-conditioned.

In this case, the regression coefficients become insignificant and therefore, it is hard to make a valid inference or prediction using OLS method. To overcome many of the difficulties associated with the OLS estimates, the LTE. When β ̂ = X ′ X − 1 X ′ y and k and d are biasing parameters the use of β ̂ LTE = X ′ X + kI − 1 X ′ y + d β ̂ , k > 0 , − ∞ < d < + ∞ has become conventional. The LTE estimator will be used for the following example. The original model was used to reconstruct a canonical form as shown in (6) y = Zγ + ε . Estimators γ ̂ LTE , γ ̂ JLTE and γ ̂ MJLTE used data d = 0.10 , 0.30 , 0.70 , 1 and k = 0.30 , 0.50 , 0.70 , 1 . Then, the original variable scale was obtained by using the coefficients estimated by these estimators. The individual values of d and k for the scalar MSE (SMSE = trace (MSEM)) of the estimators are shown in Tables 2–5. The effects of different values of d on MSE can be seen in Figures 5–8 that clearly show that the proposed estimator (MJLTE) has smaller estimated MSE values compared to those of the LTE, JLTE.

	d = 0.10	d = 0.30	d = 0.70	d = 1
MSE(LTE)	810.4511	1037.6454	1900.68971	2905.5467
MSE(JLTE)	733.5563	729.5050	977.1382	1688.9649
MSE(MJLTE)	631.2267	669.0754	967.7905	1289.1137

Table 2.

The estimated MSE values of LTE, JLTE and MJLTE k = 0.30.

	d = 0.10	d = 0.30	d = 0.70	d = 1
MSE(LTE)	957.6623	1245.7243	2157.4693	3134.9466
MSE(JLTE)	725.6311	752.2125	1102.8970	1872.6471
MSE(MJLTE)	608.2459	656.6023	892.2214	1115.3394

Table 3.

The estimated MSE values of LTE, JLTE and MJLTE k = 0.50.

	d = 0.10	d = 0.30	d = 0.70	d = 1
MSE(LTE)	1133.2567	1459.7360	2393.9079	2042.7127
MSE(JLTE)	734.5155	795.8311	1234.9720	3340.5986
MSE(MJLTE)	587.0096	633.9972	815.8143	973.5845

Table 4.

The estimated MSE values of LTE, JLTE and MJLTE k = 0.70.

	d = 0.10	d = 0.30	d = 0.70	d = 1
MSE(LTE)	1415.148	1774.1222	1774.1222	3613.8006
MSE(JLTE)	779.0405	891.0250	891.0250	2274.5162
MSE(MJLTE)	551.0494	588.8484	588.8484	807.4456

Table 5.

The estimated MSE values of LTE, JLTE and MJLTE k = 1.

Figure 5.
Various MSE of the proposed estimator compared to others for different values of d when k = 0.30.

Figure 6.
Various MSE of the proposed estimator compared to others for different values of d when k = 0.50.

Figure 7.
Various MSE of the proposed estimator compared to others for different values of d when k = 0.70.

Figure 8.
Various MSE of the proposed estimator compared to others for different values of d when k = 1.

We observed that for all values of d SMSE(MJLTE) assumed smaller values compared to both SMSE(JLTE) and SMSE(LTE). The estimators’ SMSE values are affected by increasing values of k, however the estimator that is affected the least by these changes is our proposed MJLTE estimator. When compared to the other two estimators, the SMSE values of MJLTE gave the best results for both the small and large values of k and d.

6. A simulation study

We want to illustrate the behavior of the proposed parameter estimator by a Monte Carlo simulation. The main purpose of this article is to demonstrate the construction and the details of the simulation which is designed to evaluate the performances of the estimators LTE, JLTE and MJLTE when the regressors are highly intercorrelated. According to Liu [8] and Kibria [18] the explanatory variables and response variable are generated by using the following equations

x ij = 1 − γ 2 1 / 2 z ij + γ z ip , y i = 1 − γ 2 1 / 2 z i + γ z ip i = 1 , 2 , … , n , j = 1 , 2 , … , p

where z ij is an independent standard normal pseudo-random number and p is specified so that correlation between any two explanatory variables is given by γ 2 . In this study, we used γ = 0.90 , 0.95 , 0.99 to investigate the effects of different degrees of collinearity with sample sizes n = 20 , 50 and 100, while four different combinations for k d are taken as (0.8, 0.5), (1, 0.7), (1.5, 0.9), (2, 1). The standard deviations considered in the simulation study are σ = 0.1 ; 1.0 ; 10 _. For each choice of γ , σ 2 and n , the experiment was replicated 1000 times by generating new error terms. The average SMSE was computed using the following formula

SMSE β ̂ = 1 1000 ∑ j = 1 1000 β j − β ′ β j − β

Let us consider the LTE, JLTE and MJLTE and compute their respective estimated MSE values with the different levels of multicollinearity. According to the simulation results shown in Tables 4 and 5 for LTE, JLTE and MJLTE with increasing levels of multicollinearity there was a general increase in the estimated MSE values Moreover, increasing level of multicollinearity also lead to the increase in the MSE estimators for fixed d and k.

In Table 4, the MSE values of the estimators corresponding to different values of d are given for k = 0.70. For all values of d, the smallest MSE value appears to belong to the MJLTE estimator. The least affected by multicollinearity is MJLTE according to MSE criteria.

In Table 5, the MSE values of the estimators corresponding to different values of d are given for k = 1.

For all values of d, the smallest MSE value appears to belong to the MJLTE estimator. The least affected by multicollinearity is MJLTE according to MSE criteria.

We can see that MJLTE is much better than the competing estimator when the explanatory variables are severely collinear. Moreover, we can see that for all cases of LTE, JLTE and MJLTE in MSE criterion the MJLTE has smaller estimated MSE values than those of the LTE and JLTE.

7. Conclusion

In this paper, we combined the LTE and JLTE estimators to introduce a new estimator, which we called MJLTE. Combining the underlying criteria of LTE and JLTE estimators enabled us to create a new estimator for regression coefficients of a linear regression model that is affected by multicollinearity. Moreover, the use of jackknife procedure enabled as to produce an estimator with a smaller bias. We compared our MJLTE to its originators LTE and JLTE in terms of MSEM and found that MJLTE has a smaller variance compared to both LTE and JLTE. Thus, MJLTE is superior to both LTE and JLTE under certain conditions.

References

1. Rao RC, Toutenburg H. Linear Models Least Squares and Alternatives. New york: Springer University Press; 1995
2. Montgomery DC, Peck EA, Vınıng GG. Introduction to Linear Regression Analysis. New york: Wiley; 2006
3. Chatterjee S, Hadi AS. Regression Analysis by Example. New york: Wiley; 2006
4. Hoerl A, Kennard R. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12:55-67
5. Parker DF, Baye MR. Combining ridge and principal component regression: A money demand illustration. Communication in Statistics: Theory and Methods. 1984;13(2):197-205
6. Liu K. A new class of biased estimate in linear regression. Communication in Statistics: Theory and Methods. 1993;22(2):393-402
7. Stein C. Inadmissibility of the usual estimator for mean of multivariate normal distribution. In: Neyman J, editor. Proceedings of the Third Berkley Symposium on Mathematical and Statistics Probability. 1956. pp. 197-206
8. Liu K. Using Liu-type estimator to combat collinearity. Communications in Statistics. 2003;32(5):1009-1020
9. Quenouille MH. Notes on bias in estimation. Biometrika. 1956;43:353-360
10. Hinkley DV. Jackknifing in unbalanced situations. Technometrics. 1977;19:285-292
11. Singh B, Chaubey YP, Dwivedi TD. An almost unbiased ridge estimator. Sankhya. 1986;48:342-346
12. Nyquist H. Applications of the jackknife procedure in ridge regression. Computational Statistics & Data Analysis. 1988;6:177-183
13. Batah F, Ramanathan TK, Gore SD. The efficiency of modified jackknife and ridge type regression estimators: A comparison. Surveys in Mathematics and its Applications. 2008;3:111-122
14. Tukey JW. Bias and confidence in not quite large samples (abstract). Annals of Mathematical Statistics. 1958;29:614
15. Yıldız N. On the performance of the jackknified Liu-type estimator in linear regression model. Communication in Statistics: Theory and Methods. 2018;47(9):2278-2290
16. Farebrother RW. Further results on the mean square error of ridge regression. Journal of the Royal Statistical Society, Series B. 1976;38(B):248-250
17. Yatchew A. Semiparametric Regression for the Applied Econometrician. Cambridge: Cambridge University Press; 2003
18. Kibria BMG. Performance of some new ridge regression estimators. Communication in Statistics: Simulation and Computation. 2003;32:2389-2413

[1] 1. Rao RC, Toutenburg H. Linear Models Least Squares and Alternatives. New york: Springer University Press; 1995

[2] 2. Montgomery DC, Peck EA, Vınıng GG. Introduction to Linear Regression Analysis. New york: Wiley; 2006

[3] 3. Chatterjee S, Hadi AS. Regression Analysis by Example. New york: Wiley; 2006

[4] 4. Hoerl A, Kennard R. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12:55-67

[5] 5. Parker DF, Baye MR. Combining ridge and principal component regression: A money demand illustration. Communication in Statistics: Theory and Methods. 1984;13(2):197-205

[6] 6. Liu K. A new class of biased estimate in linear regression. Communication in Statistics: Theory and Methods. 1993;22(2):393-402

[7] 7. Stein C. Inadmissibility of the usual estimator for mean of multivariate normal distribution. In: Neyman J, editor. Proceedings of the Third Berkley Symposium on Mathematical and Statistics Probability. 1956. pp. 197-206

[8] 8. Liu K. Using Liu-type estimator to combat collinearity. Communications in Statistics. 2003;32(5):1009-1020

[9] 9. Quenouille MH. Notes on bias in estimation. Biometrika. 1956;43:353-360

[10] 10. Hinkley DV. Jackknifing in unbalanced situations. Technometrics. 1977;19:285-292

[11] 11. Singh B, Chaubey YP, Dwivedi TD. An almost unbiased ridge estimator. Sankhya. 1986;48:342-346

[12] 12. Nyquist H. Applications of the jackknife procedure in ridge regression. Computational Statistics & Data Analysis. 1988;6:177-183

[13] 13. Batah F, Ramanathan TK, Gore SD. The efficiency of modified jackknife and ridge type regression estimators: A comparison. Surveys in Mathematics and its Applications. 2008;3:111-122

[14] 14. Tukey JW. Bias and confidence in not quite large samples (abstract). Annals of Mathematical Statistics. 1958;29:614

[15] 15. Yıldız N. On the performance of the jackknified Liu-type estimator in linear regression model. Communication in Statistics: Theory and Methods. 2018;47(9):2278-2290

[16] 16. Farebrother RW. Further results on the mean square error of ridge regression. Journal of the Royal Statistical Society, Series B. 1976;38(B):248-250

[17] 17. Yatchew A. Semiparametric Regression for the Applied Econometrician. Cambridge: Cambridge University Press; 2003

[18] 18. Kibria BMG. Performance of some new ridge regression estimators. Communication in Statistics: Simulation and Computation. 2003;32:2389-2413

A Study on the Comparison of the Effectiveness of the Jackknife Method in the Biased Estimators

Statistical Methodologies

Abstract

Keywords

Author Information

Nilgün Yıldız*

1. Introduction

Figure 1.

Figure 2.

Figure 3.

Figure 4.

2. The model

3. Our novel MJLTE estimator

4. Properties of the MJLTE

4.1 Comparison between the JLTE and the MJLTE

5. Numerical example

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

6. A simulation study

7. Conclusion

References

Introductory Chapter: Ramifications of Incomplete Knowledge

A Study on the Comparison of the Effectiveness of the Jackknife Method in the Biased Estimators

Statistical Methodologies

Abstract

Keywords

Author Information

Nilgün Yıldız*

1. Introduction

Figure 1.

Figure 2.

Figure 3.

Figure 4.

2. The model

3. Our novel MJLTE estimator

4. Properties of the MJLTE

4.1 Comparison between the JLTE and the MJLTE

5. Numerical example

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

6. A simulation study

7. Conclusion

References

Continue reading from the same book

Statistical Methodologies