Open access peer-reviewed chapter

On the Selection of Power Transformation Parameters in Regression Analysis

Written By

Haithem Taha Mohammed Ali and Azad Adil Shareef

Submitted: 26 May 2023 Reviewed: 21 June 2023 Published: 27 December 2023

DOI: 10.5772/intechopen.112297

From the Edited Volume

Research Advances in Data Mining Techniques and Applications

Edited by Yves Rybarczyk

Chapter metrics overview

254 Chapter Downloads

View Full Metrics

Abstract

In multiple linear regression, there are several classical methods used to estimate the parameters of power transformation models that are used to transform the response variable. Traditionally, these parameters can be estimated using either Maximum Likelihood Estimation or Bayesian methods in conjunction with the other model parameters. In this chapter, attention has been paid to four indicators of the efficiency and reliability of the regressive modeling, and study the possibility of considering them as decision rules through which the optimal power parameter can be chosen. The indicators are the coefficient of determination and p-value of the general linear F-test statistic. Also, the p-value of Shapiro-Wilk test (SWT) statistic for the residual’s normality of the estimated linear regression of the transformed response vector and the estimated nonlinear regression of the original response vector resulting from the back transform of the power Transformation model. Real data were used and a computational algorithm was proposed to estimate the optimal power parameter. The authors concluded that the multiplicity of indicators does not lead to obtaining an optimal single value for the power parameter, but this multiplicity may be useful in fortifying the decision-making ability.

Keywords

  • Box-Cox transformation
  • multiple linear regression
  • Shapiro-Wilk test
  • general linear F-test statistic
  • Maximum Likelihood Estimation

1. Introduction

It is known that when some conditions of the statistical analysis are not met in the linear regression inputs, this means that the outputs of the statistical inference will be unreliable. The most important two conditions that must be fulfilled in the estimated linear regression model are the normality of residuals and constancy of its variances and it is the most violating condition as well [1]. Also, the unfulfilled of these conditions means that the estimated response mean function has no straight line shapes in its relationships with the explanatory variables. In this regard as well the lack of conditions becomes evident in complicated nonlinear models when the residuals in the original model are additive [2]. Therefore, the data transformation tools to linearity “Especially those that belong to the power transformation (PT) family” have been used to greatly enhance the utility of statistical modeling and obtain a better fit as a general goal. That is, the main goal of data transformation is to prepare it to be compatible with the requirements of statistical inference tools [3]. In short, the confirming conditions for the best estimate of linear regression model are (i) the transformed response should be normally distributed with constant variance for each value of the predictor variables [4] or (ii) have more closing on a better fit to normality [5].

A large body of literature provides various suggestions and developments about the uses of PT for continuous variables in regression models, whether for the dependent variable, independent variables, or both. In this regard, two main research directions can be distinguished; the first is concerned with various proposals and strategies for developing the mathematical functions of PT models to address more complexities in data patterns. “For example, see [6, 7, 8, 9, 10, 11]”. While the second direction, which will be focused on in this chapter was concerned with the selecting methods of optimal power parameters in different PT families and datasets, “For example, see [8, 12, 13, 14, 15, 16, 17, 18]”. There are many methods used to estimate the power parameters in Multiple Linear Regression (MLR). Traditionally, these parameters can be estimated using either Maximum Likelihood Estimation (MLE) or Bayesian methods in conjunction with the other model parameters [13]. It is also known that MLE is very sensitive to outliers [8]. Therefore, in addition to the traditional estimation methods, there are some other proposed methods based on the indicators of statistical modeling efficiency. These indicators were used as decision rules to choose the optimal value of the PT parameter [14, 15]. In general, multiplicity of criteria used for a particular dataset does not lead to a single value or at least a closed feasible region for a power parameter. Also, the values of the power parameters differ according to the transformation models.

Outside of the traditional methods, Bartlett’s method was to choose a transformation based on the minimizing some measure of the heterogeneity of variance [16]. Tukey, 1949 [17] used the efficiency indicators of ANOVA such as minimization of the F-test value for non-additivity, minimization of the F ratio for interaction versus error, and maximization of the F ratio for treatments versus error [18]. Anscombe, 1961 and Anscombe and Tukey 1963 indicated how a certain function of the residuals can be providing us with a certain insight into the PT model [19]. While some other authors went on to propose algorithms for power parameter selection using the goodness of fit tests of the normality transformed data [12, 20] and coefficient of determination of the estimated linear regression of transformed response [15, 21, 22].

The chapter was divided into four sections. The second section included a short review of the PT models. Third section included the application and the computational algorithm. While the fourth section included the conclusions.

Advertisement

2. Power transformation: short review

Finney, 1947 [23] assumed the following simple family of PT to transform both sides of the Dose-Response regression Y=ƞxβ+ε,

ψyx=yλ1xλ2λ0lnylnxλ=0E1

to form a monotonic simple linear regression Eψy=ƞψxβ) for the nonlinear relationship of the positive response Y given the positive dose X.λ1 and λ2 are the power parameters that can be estimated from the data.

Tukey, in 1957, developed another simple family of PT to accommodate negative ys by assuming [24],

ψy=y+aλλ0lny+aλ=0E2

where the value of a can be chosen such that y+a>0. In general, it is assumed that for each λ, ψy is a monotonic function of y over the admissible range [13].

Considering the common family of Box-Cox transformation (BCT) [13], it is possible to propose the following generalized approach,

ψy=y+aλbλgmy+aλ1λ0gmy+alny+aλ=0E3

where a and b are constant quantities and a is chosen so that y+a>0. gmy+a represents the geometric mean of the shifted response y+a. Eq. (3) of BCT family hold for y+a>0 and for y>a. A number of PT models have been derived from this family; the following PT is equivalent to the simple version of Finney transformation Eq. (1) when a=0,b=1 and gmy+a=1,

ψy=yλ1λλ0lnyλ=0E4

As for the following PT, it is an extended form of Eq. (4) when a0, b=1 and gmy+a=1 and equivalent to Tukey transformation according to Eq. (2) since the analysis of variance is unchanged by a linear transformation [24],

ψy=y+aλ1λλ0lny+aλ=0E5

While as for a=0,b=1 can get a PT model equivalent to Eq. (4),

ψy=yλ1λgmyλ1λ0gmylnyλ=0E6

finally, as for a1,b=1, can get the following PT model,

ψy=y+aλ1λgmy+aλ1λ0gmy+alny+aλ=0E7

The main three properties of the PT family are: the first, is the continuity at λ go to zero, consider the BCT according to (Eq. 4), by the use of L’Hospital’s Rule, it can be shown that, limλ0yλ1/λ=lny. The second property is the concavity of the transformation function ψy that leads to obtaining a non-linear regression model for the original data after performing a back transformation of the transformed data model. While the third property is flexibility, as the transformation by power is suitable for dealing with a lot of data structures, and is also suitable for achieving a number of goals.

In BCT family models, if the transformation parameter was negative, the order of the variable would be reversed. That is, when Y is increasing, ψy is decreasing for λ<0. So, Tukey, 1977 proposed the following model to maintain the order of the transformed variable [24],

ψy=yλλ>0lnyλ=0yλλ<0E8

BCT, according to Eq. (4) and Eq. (6), is applicable and restricted to positive data. So, Yeo and Johnson, 2000 [25] generalized BCT to include negative and positive values in datasets. They used a smoothness condition to combine the transformations for positive and negative observations, obtaining a one-parameter transformation family. For YR, Yeo-Jonson Transformation (YJT) is given by,

Ψy=y+1λ1/λλ0andy0Lny+1λ=0andy0y+12λ1/2λλ2andy<0Lny+1λ=2andy<0E9

Three properties of YJT namely [26]; (i) For Y0, then Ψy0, and for Y<0, then Ψy<0. (ii) Ψy is continuous at λ0 and λ2. (iii) Ψy is convex with λ>1, and concave with λ<1.

In MLR, for all previous PT families, the optimal power parameter λ=1 confirm the linearity of the regression relationship and no transformation is required, λ<1 refers to the fact that the regression relationship of the original data is not linear due to the skew of the response distribution towards the right and vice versa for λ>1 [8].

The main idea of the use of PT models in data processing is based on the assumption that the transformed response variable in MLR follows a normal distribution. As a result, the original response follows an unknown and somewhat complex Probability Density Function (PDF) in the exponential family. In the sense that the response transformation changes the shape of data and its original unit of measure [27]. Thus, the optimal power parameter and other model parameters are estimated for the transformed data by the common estimation methods. In the end, the back-transformation will represent the fitted nonlinear regression model of the original data. Mathematically, for the univariate Y>0, based on the main assumption; Yλ=ψyNμσ2, the PDF of the univariate Y>0 is given by fYyλμσ2=fYλψyλμσ2.JYλ, where, JYλ=y/dy is the Jacobian factor to transform (Y1,,Yn)ψY1λψYnλ.

Consider the MLR model Yλ=+ε, where Yλ=ψy represents the (nx1) column vector of transformed values of response variable vector Y. X is the nxp+1) known information matrix. β is the (p + 1)x1 unknown parameters vector and ε is the (nx1) column vector of residuals and distributed according to the normal distribution with mean equal to (nx1) zero vector and identity variances matrix equal to σ2In. Also, based on the main assumption; ψyNσ2In, the joint PDF of response variable vector Y, is given by the following likelihood function,

Lλ,β,σ2y,X=fYy=2Пσ2n/2.expYλTYλ2σ2.JYλE10

Where JYλ=i=1ndyiλ/dyi. Applying the method of MLE for Eq. (10) and solving LnL/β=0 and LnL/σ2=0, we get the following estimates for each value of λ,

βˆλ=XTX1XTYλE11
σ2λ=1/nYλTHYλE12

Where H=IXXTX1XT. Substituting the estimates βˆλ and σ2λ in the logarithm of likelihood function Eq. (10) gives what might be called the Box-Cox objective function after ignoring the constant term,

Lλy=n/2logσ2λ+logJYλE13

Note that the likelihood for a given λ is inversely proportional to the sum of the squared residuals SSresλ of the regression ψy on X. The likelihood function is maximized when SSresλ value is minimized. The value of the power parameter λ is optimal when Lλy is at its maximum.

Advertisement

3. An application and computational algorithm

We consider a real economic dataset that includes a set of five explanatory variables affecting the Current Account of the Republic of Iraq in the period 2004–2020 (Table 1). The dataset has been obtained from Iraqi Central Bank and is also available at https://cbiraq.org/. R program was used to analyze the data.

YearsCurrent Account YDeficit/Surplus in general budget
X1
GDP at Current Prices X2Oil Revenues X3Other Revenues X4Public Expenditures X5
2004−5,796,516865,24853,235,35828639.14343.632117.5
20055,048,11814,127,71573,533,59833627.26875.726375.1
200618,521,58010,248,86695,587,95441076.27987.138076.8
200733,161,85715,568,219111,455,81344646.19953.439031.2
200842,020,41720,848,807157,026,06170,12410128.159403.3
2009−493,3112,642,328130,642,18743309.211900.152,567
20101,453,24444,022162,064,56659,79410384.270134.2
201129,228,74230,049,726217,327,10798090.210717.278757.6
2012378,788,64014,677,649254,225,490109772.110045.1105139.5
2013430,082,730−5,360,605273,587,529112894.3945.7119127.5
2014224,949,984−8,086,894266,420,38497072.48537.4113473.5
2015−4,377,124−39,277,264199,715,69951312.615157.670397.5
201646,126,504−12,658,167203,869,83244,26710142.267067.4
201793,634,5881,932,057225,995,17965071.912350.275490.1
2018−11,244,618−12,514,516226,455,13295619.810,95080873.1
201927,714,354−4,156,528276,157,86799216.38350.6111723.5
2020−1,582,698−12,882,754219,768,79854448.58751.176082.4

Table 1.

The current account and some explanatory variables of Republic of Iraq for the period 2004–2020 “Million IQD”.

Evident from (Figure 1) that there are three outliers among the values of the response variable, which are the values y9,y10 and y11. Also, regarding BCT and the conditions for its implementation, the response positively constraint is not fulfilled due to the presence of some negative values. Therefore, the estimating of MLR for these data would be risky, and the diagnostic and inference tools might give misleading results. So, there is a certain and definite need to conduct some mathematical preparations to shift the data to another space.

Figure 1.

Box plot of response variable values.

So, the following MLR model was chosen, which addresses the presence of negative values in the data and might have some robustness to get past the implications of having outliers,

Zλ=+εE14

Zλ represents the (17×1) column vector of transformed values of Simple Index Numbers (SIN) of the original response variable vector Y.Zλ is defined according to the following simplified version of BCT family,

Zλ=zλ1λλ0lnzλ=0E15

And the ith-value in the Zs vector is defined to the following SIN with considering the first year as a base year,

zi=yi+1+ay1+a100E16

a is a constant to shift the location of the response vector to positive space where it is chosen to ensure the BCT’s constraint Y+a>0. U is the (17x5) known information matrix of the SIN considering the first year as a base year of the explanatory variables, where,

uik=xi+1kx1k100E17

and u1k=100% For k=2,3,4,5. While the SIN for the first explanatory variable is defined as,

ui1=xi+11+bkx1k+bk100E18

where u11=100% and bk is a constant to shift the location of the explanatory variables to positive values where it is chosen so that X+a>0. β is the (6x1) unknown parameters vector and ε is the (17×1) column vector of residuals and distributed according to the normal distribution with mean equal to (17×1) zero vector and identity variances matrix equal to σ2In.

Finally, the nonlinear multiple regression model for the original data regression Z given X is derived from the following back-transform of BCT,

Z=λZλ+11/λλ0expZλλ=0E19

Thus, we can have obtained the estimated multiple nonlinear regression model for the original data regression from the estimated MLR of transformed data,

Ẑ=λUβ̂+11/λλ0expUβ̂λ=0E20

A number of modeling efficiency indicators are included in our search algorithm to obtain optimal power parameter λ. The first is the traditional MLE. The second, third, and fourth are the coefficient of determination (CoD), p-value of SWT statistic for the residual’s normality, and p-value of the general linear F-test statistic of the estimated linear regression of the transformed response vector. The fifth is the p-value of SWT statistic for the residual’s normality of the estimated nonlinear regression of the original response vector resulting from the back-transform of BCT. The proposed computational algorithm is as follows:

Step 1: Transform the original response vector Y to SIN’s vector Z according to Eq. (16) of vector elements and the original information matrix X to SIN’s matrix U according to Eq. (17) of matrix elements.

Step 2: Choose a set of candidate values for the power parameter. For example, fix λΛ,where Λ=21.901.9,2. Λ can be expanded to an acceptable range from which we can obtain a convex curve for MLE, and the same applies to CoD. Also, obtaining a minimum value of the p-value of general linear F-test statistic within Λ can be an indicator of acceptance of the candidate range.

Step 3: Transform the SIN’s vector Z to ψZ using the simple version of the BCT family according to Eq. (15) by the first candidate λ in Λ.

Step 4: Estimate the parameters βˆλ and σ2λ of MLR of Zλ given X according to Eq. (14) using Eq. (11) and Eq. (12).

Step 5: Estimate log-likelihood function Lλz according to Eq. (13). Calculate CoD, a p-value of SWT statistics to test the residual vector normality, and p-value of the general linear F-test statistic.

Step 6: Estimate the multiple nonlinear regression model for the original data regression using Eq. (20).

Step 7: Calculate the p-value of SWT of the residual vector normality of the estimated multiple nonlinear regression model of the original data.

Step 8: Repeat all the steps from 3 to 7 for all values of λΛ.

The tables below show the results of applying the computational algorithm. Table 2 shows the optimal values of λ against each indicator in its optimal state. Table 3 shows the estimates of power parameters according to the five indicators for all Λ=32.902.93.

IndicatorsValueOptimal λ
MLE−22.70
CoD0.69−0.5
p-value of SWT of Residuals Normality
-Transformed data
0.993
p-value of SWT of Residuals Normality
-Back Transformed data
0.892
p-value of F-test statistics0.01−0.5

Table 2.

The optimal values of λ against each indicator in its optimal state.

λ̂MLECoDp-value of SWT of Residuals Normalityp-value of F-test statistics
Transformed dataBack Transformed data
1−39.80.630.940.840.03
(−3.0, −1.2)(−161.6, −61.5)0.68(0.62, 0.67)0.00(0.01, 0.02)
(−1.1, −0.5)(−58.5, −30.2)0.69(0.72, 0.84)0.000.01
(−0.4, −0.3)(−28.4, −25.6)0.68(0.58, 0.70)0.00(0.01, 0.02)
−0.2−24.50.670.240.000.02
−0.1−23.10.660.100.000.02
(0, 0.1)(−22.8, −22.7)0.65(0.04, 0.05)0.00(0.02, 0.03)
(0.2, 0.5)(−28.0, −23.1)0.64(0.11, 0.76)(0.01, 0.11)0.03
(0.6, 1.7)(−63.1, −29.2)0.63(0.71, 0.96)(0.20, 0.89)0.03
(1.8, 2.3)(−84.9, −65.2)0.62(0.68, 0.85)(0.79, 0.89)(0.03, 0.04
(2.4, 3.0)(−110.3, −87.2)0.61(0.89, 0.99)(0.46, 0.75)0.04

Table 3.

Estimates of the power parameter according to the five indicators for all λΛ.

Based on the results of p-values of general linear F-test statistic for all λΛ in (Table 3), we conclude that the full estimated models “whether for the non-Linear multiple regression models when λˆ1 or MLR in which λˆ=1” are appropriate for the data. It is also clear that the residuals are close to normality shape for transformed data models except in the case LnZ based on the indicator of the p-value of SWT of residuals normality (Table 2).

As for the MLE, the highest point corresponds to the value of the parameter when it is close to zero, (Figure 2(a)). That is, the optimal transformation is Lny. On the other hand, according to the p-value of SWT of residuals normality, it is quite clear that the residuals are abnormal. Therefore, it can be said that the results of general linear F-test statistics are not reliable.

Figure 2.

For all λΛ (a) The Log-likelihood curve (b) The CoD estimates (c) p-values of general linear F-test statistic.

As we mentioned in the article, the value of the optimal λ varies according to the different methods and indicators of estimation. Confirmation of that, the results of the optimal case for two of the five indicators led to obtaining identical values for the optimal power parameter at λ̂=0.5 which are CoD (Figure 2(b)) and the p-value of general linear F-test statistic (Figure 2(c)).

Advertisement

4. Conclusions

The use of power transformation models to transform the response variable in regression relationships is, in fact, a way to create a nonlinear model for the data when the requirements of linear regression analysis are not met. In the sense that the statistical modeling operations of the transformed data are more like an intermediate station, the statistical analysis does not succeed unless the operations in this station are accurate and meet the requirements of the model construction. Therefore, there are many indicators of the success of statistical analysis, depending on the multiplicity of its reliability conditions. In this regard and when using PT models there are many methods for selecting the optimal power parameters. Two common directions can be identified: the first is the use of well-known estimation methods such as the method of MLE. The second is the use of some efficiency criteria in regression modeling as a decision rule for estimating the power parameter. We conclude that the multiplicity of criteria for selecting the power parameter does not mean that it can lead to a single value. However, the multiplicity of decision rules can contribute to providing features for optimal solutions and support the decision to choose the optimal power parameter.

References

  1. 1. Chatterjee S, Price B. Regression Analysis by Example. New York: John Wiley and Sons, Inc.; 1977. pp. 19-22
  2. 2. Cook RD, Weisberg S. Diagnostics for heteroscedasticity in regression. Biometrika. 1983;70(1):1-10. DOI: 10.1093/biomet/70.1.1
  3. 3. O’Hara RB, Kotze DJ. Do not log-transform count data. Methods in Ecology and Evolution. 2010;1:118-122. DOI: 10.1111/j.2041-210X.2010.00021.x
  4. 4. van Albada SJ, Robinson PA. Transformation of arbitrary distributions to the normal distribution with application to EEG test-retest reliability. Journal of Neuroscience Methods. 2007;161(2):205-211. DOI: 10.1016/j.jneumeth.2006.11.004
  5. 5. Box GEP, Cox DR. An analysis of transformations revisited, rebutted. Journal of the American Statistical Association. 1982;77(377):209-210
  6. 6. Klein Entink RH, van der Linden WJ, Fox JPA. Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology. 2009;62(Pt 3):621-640
  7. 7. Fischer C. Comparing the logarithmic transformation and the box-cox transformation for individual tree basal area increment models. Forest Science. 2016;62(3):297-306. DOI: 10.5849/forsci.15-135
  8. 8. Raymaekers J, Rousseeuw PJ. Transforming Variables to Central Normality. Machine Learning. 2021. DOI: 10.1007/s10994-021-05960-5
  9. 9. Ferrari SLP, Fumes G. Box-Cox symmetric distributions and applications to nutritional data. AStA Advances in Statistical Analaysis. 2017;101:321-344. DOI: 10.1007/s10182-017-0291-6
  10. 10. Yeo IK, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87(4):954-959
  11. 11. Vélez JI, Correa JC, Marmolejo-Ramos F. A new approach of the Box Cox transformation. Frontiers in Applied Mathmatics and Statistics. 2015;1(12):1-10. DOI: 10.3389/fams.2015.00012
  12. 12. Asar Ö, Ilk O, Dag O. Estimating Box-Cox power transformation parameter via goodness of fit tests. Communications in Statistics - Simulation and Computation. 2017;46(1):91-105. DOI: 10.1080/03610918.2014.957839
  13. 13. Box GEP, Cox DR. An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological). 1964;26(2):211-252. DOI: 10.1111/j.2517-6161.1964.tb00553.x
  14. 14. Alyousif HT, Abduahad FN. Develop a nonlinear model for the conditional expectation of the Bayesian probability distribution (Gamma – Gamma). Al-Nahrain Journal of Science. 2018;17(2):205-212 Available from: https://anjs.edu.iq/index.php/anjs/article/view/462/408
  15. 15. Al-Saffar A, Mohammed Ali HT. Using power transformations in response surface methodology. In: 2022 International Conference on Computer Science and Software Engineering (CSASE), Iraq: IEEE; 2022. pp. 374-379. DOI: 10.1109/CSASE51777.2022.9759781
  16. 16. Tukey JW. Dyadic ANOVA, an analysis of variance for vectors. Human Biology. 1950;21:65-110
  17. 17. Box GEP, Tidwell PW. Transformation of the independent variables. Technometrics. 1962;4:531-550
  18. 18. Tukey JW. One degree of freedom for non-additivity. Biometrics. 1949;5(3):232-242
  19. 19. Velez JI, Marmolejo RF. A new approach to the Box-Cox transformation. Frontiers in Applied Mathematics and Statistics. 2015;1(12):1-10. DOI: 10.3389/fams.2015.00012
  20. 20. Chen G, Lockhart RA, Stephens MA. Box–Cox transformations in linear models: large sample theory and tests of normality. Canadian Journal of Statistics. 2002;30(2):1-59. DOI: 10.2307/3315946
  21. 21. Draper NR, Smith H. Applied Regression Analysis. NY: John Wiley and Sons Inc.; 1981
  22. 22. Atkinson AC, Riani M, Corbellini A. The box-cox transformation: Review and extensions. Statistical Science. 2021;36(2):239-255. DOI: 10.1214/20-STS778
  23. 23. Finney DJ. The Principles of Biological Assay. Supplement to the Journal of the Royal Statistical Society. 1947;9(1):46-81. DOI: 10.2307/2983571
  24. 24. Tukey JW. On the comparative anatomy of Transformations. The Annals of Mathematical Statistics. 1957;28(3):602-632
  25. 25. Yeo IK, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;56(I):87-90. DOI: 10.1093/biomet/87.4.954
  26. 26. Samira, S. Exact Box-Cox analysis. Electronic thesis and dissertation repository. 2018. Available from: https://ir.lib.uwo.ca/etd/5308
  27. 27. Cook RD, Weisberg S. Residuals and Influence in Regression. New York: Chapman and Hall; 1982

Written By

Haithem Taha Mohammed Ali and Azad Adil Shareef

Submitted: 26 May 2023 Reviewed: 21 June 2023 Published: 27 December 2023