In this study, we proposed an alternative biased estimator. The linear regression model might lead to ill-conditioned design matrices because of the multicollinearity and thus result in inadequacy of the ordinary least squares estimator (OLS). Scientists have developed alternative estimation techniques that would eradicate the instability in the estimates. Several biased estimators such as Stein estimator, the ordinary ridge regression (ORR) estimator, the principal components regression (PCR) estimator. Liu developed a Liu estimator (LE) by combining the Stein estimator with the ORR estimator. Since both ORR and LE depend on OLS estimator, multicollinearity affects them both. Therefore, the ORR and LE may give misleading information in the presence of multicollinearity. To overcome this problem, Liu introduced a new estimator, which is based on k and d biasing parameters, the authors worked on developing an estimator that would still have the valuable characteristics of the Liu-type estimator (LTE) but have a smaller bias. We are proposing a modified jackknife Liu-type estimator (MJLTE) that was created by combining the ideas underlying both the LTE and JLTE. Under mean square error matrix criteria, the MJLTE is superior to Liu-type estimator (LTE) and jackknifed Liu-type estimator (JLTE). Finally, a real data example and a Monte Carlo simulation are also given to illustrate theoretical results.
- jackknifed estimators
- jackknified Liu-type estimator
- Liu-type estimator
With regression analysis; Is there a relationship between dependent and independent variables? If there is a relationship, what is the power of this relationship? What is the relationship between variables? Is it possible to predict prospective variables and how should they be estimated? What is the effect of a particular variable or group of variables on other variables or variables in the event that certain conditions are checked? Try to search for answers to questions such as. Linear regression is very important, popular method in statistics. According to Web of Science, the number of publications about linear regression between 2014 and 2018 is given in Figure 1.
According to Figure 1, the number of studies conducted in 2014 is 12,381, while the number of studies conducted in 2018 is 13,137.
The number of publications about linear regression by document types is given in Figure 2.
The most common type of document about linear regression is the article. This is followed by proceeding paper, review, and editorial material.
The number of publications about linear regression by research area is given in Figure 3.
The most widely published area related to linear regression is engineering, followed by mathematics, computer science, environmental sciences, ecology and other scientific fields.
The number of publications about linear regression by countries is given in Figure 4.
The countries with the most publications on linear regression are USA, China, England, Germany, Canada, Australia, respectively.
In regression analysis, the most commonly used method for estimating coefficients is ordinary least squares (OLS). We considered the multiple linear regression model given as
where is observable random vector, is a matrix of non-stochastic (independent) variables of rank ; is vector of unknown parameters associated with , and is a vector of error terms with
In regression analysis, there are several methods to estimate unknown parameters. The most frequently used method is the least squares method (OLS). Apart from this method, there are three general estimation methods: maximum likelihood, generalized least squares, and best linear unbiased estimator BLUE .
Since the use of once very popular estimators such as the ordinary least squares (OLS) estimation has become limited due to multicollinearity, which makes them unstable and results in bias and reduced variance of the regression coefficients.
We can give, it is a linear (or close to linear) relationship between the independent variables as the definition of multicollinearity. In the regression analysis, multicollinearity leads to the following problems:
In the case of multicollinearity, linear regression coefficients are uncertain and the standard errors of these coefficients are infinite.
The regression coefficients of the multicollinearity increase the variance and covariance of OLS.
The value of the model is high but none of the independent variables is significant compared to the partial test.
The direction of the related independent variables’ relations with the dependent variable may contradict the theoretical and empirical expectations.
If independent variables are interrelated, some of them may need to be removed from the model. But what variables will be extracted? Removing an incorrect variable from the model will result in a model error. On the other hand, there are no simple rules that we can use to include and subtract the arguments in the model.
Methods for dealing with multicollinearity are collecting additional data, model respecification. Instead of two related variables, the sum of these two variables (as a single variable) can be taken and use of biased estimators. In this book provides information on biased estimators used as OLS alternatives. In literature many researchers have developed biased regression estimators [2, 3].
Examples of such biased estimators are the ordinary ridge regression (ORR) estimator introduced by Hoerl and Kennard .
where is a biasing parameter, in later years researchers combined various estimators to obtain better results. For example, Baye and Parker  introduced class estimator, which combines the ORR and principal component regression (PCR). In addition, Baye and Parker also showed that class estimator is superior to PCR estimator based on the scalar mean square error (SMSE) criterion.
Since both ORR and LE depend on OLS estimator, multicollinearity affects both of them. Therefore, the ORR and LE may give misleading information in the presence of multicollinearity. Liu estimator (LE) was developed by Liu  by combining the Stein  estimator with the ORR estimator.
To overcome this problem, Liu  introduced a new estimator, which is based on and biasing parameters as follows
Next, the authors worked on developing an estimator that would still have valuable characteristics of the Liu-type estimator (LTE), but have a smaller bias. In 1956, Quenouille  suggested that it is possible to reduce bias by applying a jackknife procedure to a biased estimator.
This procedure enables processing of experimental data to get statistical estimator for unknown parameters. A truncated sample is used calculate specific function of estimators. The advantage of jackknife procedure is that it presents an estimator that has a small bias while still providing beneficial properties of large samples. In this article, we applied the jackknife technique to the LTE. Further, we established the mean squared error superiority of the proposed estimator over both the LTE and the jackknifed Liu-type estimator (JLTE).
The article is organized as follows: The model as well as LTE and the JLTE are described in Section 2. The proposed new estimator is introduced in Section 3. Superiority of the new estimator vis-a-vis the LTE and the JLTE are studied and the performance of the modified Jackknife Liu-type estimator (MJLTE) is compared to that of the JLTE in Section 4. Sections 5 and 6 consider a real data example and a simulation study to justify the superiority of the suggested estimator.
2. The model
We assume that two or more regressors in are closely linearly related, therefore model suffers from multicollinearity problem. A symmetric matrix has an eigenvalue–eigenvector decomposition of the form , where is an orthogonal matrix and is (real) a diagonal matrix. The diagonal elements of are the eigenvalues of and the column vectors of are the eigenvectors of . The orthogonal version of the standard multiple linear regression models is
where , and . The ordinary least squares estimator (OLSE) of is given by
Liu  proposed a new biased estimator for , called the Liu-type estimators (LTE), and defined as
has bias vector
and covariance matrix
By using Hinkley , Singh et al. , Nyquist , and Batah et al.  we can propose the jackknifed form of . Quenouille  and Tukey  introduced the jackknife technique to reduce the bias. Hinkley  stated that with few exceptions, the jackknife had been applied to balanced models. After some algebraic manipulations, the corresponding jackknife estimator is obtained by deleting the
where is the row of , is the Liu-type residual, is the distance factor and . In the view of the non-zero value of reflecting the lack of balance in the model, we use the weighted jackknife procedure. Thus, weighted pseudo values are defined as
the weighted jackknifed estimator of γ is obtained as
However, since , we obtain
From (9) we have
Variance of the JLTE as,
MSEMs of the JLTE and LTE as
3. Our novel MJLTE estimator
In this section, Yıldız  propose a new estimator for . The proposed estimator is designated as the modified jackknifed Liu-type estimator (MJLTE) denoted by
It may be noted that the proposed estimator MJLTE in (22) is obtained as in the case of JLTE but by plugging in the LTE instead of the OLSE. The expressions for bias, covariance and mean squared error matrix (MSEM) of are obtained as
4. Properties of the MJLTE
One of the most prominent features of our novel MJLTE estimator is that its bias, under some conditions, is less than LTE estimator from which it originates from.
holds true for and
It is obvious that the difference is greater than 0, because it consists of the product of the squares in the expression above. Thus, the proof is completed.
is a positive number. Thus we conclude that H is a positive definite matrix. This completes the proof.
Next, we prove necessary and sufficient condition for the MJLTE to outperform the LTE using the MSEM condition. The proof requires the following lemma.
if and only if
We consider the difference from (21, 25) we have
is a positive definite matrix. We have seen is a positive definite matrix from Theorem 2. Therefore, the difference is a nonnegative definite, if and only if is a nonnegative definite. The matrix can be written as
Since the matrix is symmetric and positive definite, using Lemma 4.1, we may conclude that is a nonnegative definite, if and only if the inequality
4.1 Comparison between the JLTE and the MJLTE
Here, we show that the MJLTE outperforms the JLTE in terms of the sampling variance.
where and , respectively. It can be shown that
Hence of which completes the proof.
In the following theorem, we have obtained a necessary and sufficient condition for the MJLTE to outperform the JLTE in terms of matrix mean square error. The proof of the theorem is similar to that of Theorem 4.3.
We have seen from Theorem 4.4 that is a positive definite matrix. Therefore, the difference is a nonnegative definite, if and only if is a nonnegative definite. The matrix can be written as
The difference is a nonnegative definite matrix, if and only if is a nonnegative definite matrix. Since the matrix is symmetric and positive definite, using Lemma 4.1, we may conclude that is nonnegative definite, if and only if the inequality
is satisfied. This confirms our validation. Theorems 4–6 showed that the estimator we proposed was superior to the LTE estimator and JLTE estimator. Accordingly, we can easily say that the MJLTE estimator is better than other estimators LTE, JLTE.
5. Numerical example
To motivate the problem of estimation in the linear regression model, we consider the hedonic prices of housing attributes. The data consists of 92 detached homes in the Ottawa area sold during 1987 (see Yatchew ).
Let y be the
The eigenvalues of the matrix : 9 × 9 are given by , , , , , , , and .
If we use the spectral norm, then the corresponding measure of conditioning of is the number where. We obtained , which is large and so may be considered as being ill-conditioned.
In this case, the regression coefficients become insignificant and therefore, it is hard to make a valid inference or prediction using OLS method. To overcome many of the difficulties associated with the OLS estimates, the LTE. When and and are biasing parameters the use of , , has become conventional. The LTE estimator will be used for the following example. The original model was used to reconstruct a canonical form as shown in (6) . Estimators ,and used data and . Then, the original variable scale was obtained by using the coefficients estimated by these estimators. The individual values of and for the scalar MSE (SMSE = trace (MSEM)) of the estimators are shown in Tables 2–5. The effects of different values of on MSE can be seen in Figures 5–8 that clearly show that the proposed estimator (MJLTE) has smaller estimated MSE values compared to those of the LTE, JLTE.
|d = 0.10||d = 0.30||d = 0.70||d = 1|
|d = 0.10||d = 0.30||d = 0.70||d = 1|
|d = 0.10||d = 0.30||d = 0.70||d = 1|
|d = 0.10||d = 0.30||d = 0.70||d = 1|
We observed that for all values of d SMSE(MJLTE) assumed smaller values compared to both SMSE(JLTE) and SMSE(LTE). The estimators’ SMSE values are affected by increasing values of k, however the estimator that is affected the least by these changes is our proposed MJLTE estimator. When compared to the other two estimators, the SMSE values of MJLTE gave the best results for both the small and large values of k and d.
6. A simulation study
We want to illustrate the behavior of the proposed parameter estimator by a Monte Carlo simulation. The main purpose of this article is to demonstrate the construction and the details of the simulation which is designed to evaluate the performances of the estimators LTE, JLTE and MJLTE when the regressors are highly intercorrelated. According to Liu  and Kibria  the explanatory variables and response variable are generated by using the following equations
where is an independent standard normal pseudo-random number and is specified so that correlation between any two explanatory variables is given by . In this study, we used to investigate the effects of different degrees of collinearity with sample sizes and 100, while four different combinations for are taken as (0.8, 0.5), (1, 0.7), (1.5, 0.9), (2, 1). The standard deviations considered in the simulation study are . For each choice of , and , the experiment was replicated 1000 times by generating new error terms. The average SMSE was computed using the following formula
Let us consider the LTE, JLTE and MJLTE and compute their respective estimated MSE values with the different levels of multicollinearity. According to the simulation results shown in Tables 4 and 5 for LTE, JLTE and MJLTE with increasing levels of multicollinearity there was a general increase in the estimated MSE values Moreover, increasing level of multicollinearity also lead to the increase in the MSE estimators for fixed
In Table 4, the MSE values of the estimators corresponding to different values of
In Table 5, the MSE values of the estimators corresponding to different values of
For all values of
We can see that MJLTE is much better than the competing estimator when the explanatory variables are severely collinear. Moreover, we can see that for all cases of LTE, JLTE and MJLTE in MSE criterion the MJLTE has smaller estimated MSE values than those of the LTE and JLTE.
In this paper, we combined the LTE and JLTE estimators to introduce a new estimator, which we called MJLTE. Combining the underlying criteria of LTE and JLTE estimators enabled us to create a new estimator for regression coefficients of a linear regression model that is affected by multicollinearity. Moreover, the use of jackknife procedure enabled as to produce an estimator with a smaller bias. We compared our MJLTE to its originators LTE and JLTE in terms of MSEM and found that MJLTE has a smaller variance compared to both LTE and JLTE. Thus, MJLTE is superior to both LTE and JLTE under certain conditions.