Open access peer-reviewed chapter

A New Approach of Power Transformations in Functional Non-Parametric Temperature Time Series

Written By

Haithem Taha Mohammed Ali and Sameera Abdulsalam Othman

Submitted: 14 May 2022 Reviewed: 13 June 2022 Published: 22 July 2022

DOI: 10.5772/intechopen.105832

From the Edited Volume

Time Series Analysis - New Insights

Edited by Rifaat Abdalla, Mohammed El-Diasty, Andrey Kostogryzov and Nikolay Makhutov

Chapter metrics overview

77 Chapter Downloads

View Full Metrics

Abstract

In nonparametric analyses, many authors indicate that the kernel density functions work well when the variable is close to the Gaussian shape. This chapter interest is on the improvement the forecastability of the functional nonparametric time series by using a new approach of the parametric power transformation. The choice of the power parameter in this approach is based on minimizing the mean integrated square error of kernel estimation. Many authors have used this criterion in estimating density under the assumption that the original data follow a known probability distribution. In this chapter, the authors assumed that the original data were of unknown distribution and set the theoretical framework to derive a criterion for estimating the power parameter and proposed an application algorithm in two-time series of temperature monthly averages.

Keywords

  • functional non-parametric time series
  • power transformation
  • Kernel density function
  • Mean Integrated Square Error

1. Introduction

One of the most common approaches for studying forecasting models is the Nonparametric functional regression method, which has been successfully applied in time series analysis. In this chapter, a new approach of power transformation is proposed to improve time series prediction when using functional nonparametric techniques. Although the nonparametric regression estimation under dependence is a useful tool for forecasting in time series [1], the functional and nonparametric approaches does not work well in certain circumstances.

Regarding the functional approach, The functional data (FD) analysis treats with the observations as a functions [2] without the need for fully parametric and non-parametric modeling conditions. In other words, FD analysis reduces the size of the data by clarifying the correlations between a large number of variables by a small number of factors or functions [3]. This transformation of the data structure into a linear combination of a few functions (curves) is equivalent to structural regression models. A number of useful semi-metrics families can be used to measure the proximities between the curves of the functional variables. One of these ways, for example, the Functional Principle Components Analysis (FPCA) [4, 5]. In some data sets with the time dependence of observations, FPCA may lead to weak estimates and that this problem may be exacerbated in some time series data sets especially those characterized by the presence of seasonal changes [6] (See also [7] who pointed out that the standard PCA may not be the suitable technique to apply when the data distribution is skewed or there are outliers).

As for the nonparametric approach to estimating kernel density functions (KDF) or predictions in regression and time series models, although this approach is a Distribution-Free method, the symmetricity of the data is an important issue in order to obtain efficient estimators [8, 9].

As for time series and the goal of improving forecastability, it is known that the time series data sets in practical applications are rarely adapted for statistical analysis due to their instability in variance, trend, and seasonal variations [10].

Based on the aforementioned requirements of importance that precede the analysis and inference in the functional nonparametric time series analysis, it can be said that the power transformation (PT) provide a novel corrective framework of the predictive modeling.

The rest of the chapter is organized as follows: The second section includes some explanations and clarifications of some traditional approaches of PT and their uses in KDF. In the third section, the authors present their proposal which contains a new approach for Transforming of KDS. Section four includes the algorithm for applying the proposed method. While the fifth section includes an applications of the proposed method to two temperature time series datasets. Finally, the sixth section included the conclusions and some future recommendations.

Advertisement

2. The traditional approaches of transformations in KDF

There is a long tradition of applying PT models in statistical applications. In 1952, Finney [11] used the PT model ΨZ=Zλ when λ0 and ΨZ=logz when λ=0, where Z represents the original dose variable in the biological assay. The purpose of using transformation in dose response relationship was to achieve the monotonous and linear characteristics for the Intrinsically nonlinear models. In 1964, Box and Cox [12] proposed the following general class of transformation of the response variable in the multiple linear regression model,

ΨZ=Zλ1λifλ0logzifλ=0E1

to achieve a linear relationship with normality errors. In 1977, Tukey [13] describes an orderly way of re-expressing variables using the following model in order to preserve the order of the variable after the PT is used,

ΨZ=Zλifλ>0logzifλ=0Zλifλ<0E2

to make the relationship as close to a straight line as possible. As for the nonparametric estimates, many authors refer to the usefulness of PT in reducing the bias of KDF when the data is clearly skewed or heavy tailed [14], (For more details see, [15, 16, 17]).

Regarding the transformation parameter estimation issues, most transformation methodologies have a common analytical path, which is the choice of the PT model and proposing an algorithm for estimating the power parameters in parallel with the mechanisms of estimating traditional model parameters. As for the approach of transforming the probability density function (PDF) of all or some model variables before proposing an algorithm estimation, there are at least two common methodologies of data transformation. The first in chronological order is the Box Cox transformation (BCT) methodology of data transforming to normality of response variable in parametric multiple regression models [12]. The common decision rule for selecting power parameter estimator in this approach is the maximization of log likelihood function of the PDF of original data. In some cases, the Bayesian estimating method is used and many other methods included in the subject literature can also be used to choose the transformation parameter. The second methodology is proposed by Wand, Marron and Ruppert in 1991 [8] to transform the KDF to a symmetrical shape in density function. In this methodology, the decision rule for selecting optimal estimator of density power parameter is the minimization of the Mean Integrated Square Error (MISE) of KDF estimator. Both transformation ways are used the distribution approach of transformed data and therefor defining the original data distribution as a “back-transformed” of change-of-variable technique. Mathematically, in the case of the univariate random variable Z, Box and Cox [12] assumes that there exist a parametric PT function ψ. of the random variable Z such that ψz=ZλNμσ2 under the assumption that the original data is of unknown distribution. Therefore, the PDF of the original variable is given by,

fZz=fψzψziμσ2.zdziE3

While the second methodology [8], Wand, Marron, and Ruppert assumes the estimated KDF of the transformed variable ψz that is close to the symmetrical shape is given by,

fψzψzλ=fZψ1zψ1ψzE4

Such that the estimated KDF of the original variable Z is the back-transform of (Eq. (4)) and given by,

f̂zzhλ=n1ψzkhψzψZiE5

Where h is the bandwidth and the kernel K is a density. In brief terms, the first methodology aims in the parametric models to improve the efficiency of the statistical inference based on the data normality, and the second methodology aims to improve the kernel estimator at least on the basis of symmetrical data. And in the same context, the literatures recommend the use of transformations as long as they can improve interpretation of effect sizes between variables [13] or given the fact that model parameters are not easily interpreted in terms of the original response [14], (For more details, see [15, 16, 17]).

Now, assuming that U=ψz, The optimum value of the PT parameter λ is the one that corresponds to the lowest possible value of MISE of the estimated density (Eq. (5)) and given by,

MISEZhλ=Ef̂ZzhλfZz2dzE6

Assume that the first and second derivatives of the function fZz exist, as well as that K1=Z2KZdz, K2=K2Zdz so

MISEZhλ=AMISEZhλ+Oh4+n1h1E7

Where:

AMISEZhλ=h4K12/4ψψ1ufUuλ2du+n1h1K2EψzE8

and the minimized window width for any value of λ is given by,

hλ,z=K2EψzK12ψψ1ufUuλ2du1/5.n1/5E9

and it contains less AMISEz.λ for each constant value of λ that equals,

Infh>0AMISEzhλ=5/4K1K222/5Jzλn4/5E10

Where,

Jzλ=Eψz4ψψ1ufUuλ2du1/5E11

The last two equations (Eqs. (10) and (11)) represent a measure of the transformation’s ψz influence on minimizing the error associated with estimating the function of the original data f̂z.hλ. Therefore, the optimal value of λ can be known as the one that minimizes: Infh>0AMISEzhλ.

By the same decision rules logic AMISEzhλ, derived from the density estimation of the transformed variableZ, the optimal asymptotical window width for each λ according to the original random variable is:

hλ,u=K2K12Juλn1/5E12

Asymptotically, the optimal choice of λ minimizes

Infh>0AMISEuhλ=5/4K1K222/5Juλn4/5E13

Where:

Juλ=fUuλ2du1/5E14

In other words, it can be said that the minimization of Jzλ and Juλ are the sufficient condition to prove the optimization of λ since Eq. (11) and Eq. (14) represents the variable parts of Eq. (10) and Eq. (13) respectively.

Finally, the relationship between MISEuhλ and MISEzhλ can be determined according to the equations:

MISEzhλ=Ef̂uufuu2ψψ1uduE15
MISEuhλ=Ef̂zufzu2ψ1)ψzdz.E16

Both error functions yield the same results, whether in terms of the original variable or of the transformed variable.

Advertisement

3. A new approach of transformations in KDS

Unlike BCT methodology, which assumes that the original data is of unknown distribution, PTs’ in KDF estimation are used to shifted the random variables with a known distribution into symmetric shapes to obtain an efficient kernel density estimation. The statistical literature in nonparametric estimation suggested the use of MISE indicator as a decision rule for power parameter estimation for a number of distributions such as Lognormal [8, 18], gamma [8], Cauchy [9], Pareto [18] and heavy-tailed distributions [19, 20].

Now, similar to the BCT approach, the primary hypothesis of the new approach in this chapter is that the data do not have a definite distribution. We will use the power transformation to transform the data to a normal shape and use MISE as a decision rule to choose the optimal value of the power parameter. Later in the sections 4 and 5 we will use this approach in the functional nonparametric time series analysis.

Let us assume that we have the random variable Z with unknown distribution and U=ψz represents a PT model. Let, for Finney transformation (FT), suppose U=zλ follows the normal distribution with mean μ and variance σ2. Therefore, according to the Eq. (3), the PDF of the original variable Z is given by fZz=ψzfUψziμσ2.

In our proposed approach, the assumption of the normality of the transformed data when the original data is of unknown distribution provides uncomplicated options for estimating the power parameter so that Eq. (14) can be used as the simplest alternative to Eq. (11). In our assumption, we have,

fUuλ=12πσ2euμ22σ2,URE17

So, the square of the second derivative of Eq. (17) is,

fuλ2=2πσ22expuμ2σ21σ4+2σ2uμ2σ4+uμ4σ8E18

By inserting the integration factor, we get,

fuλ2du=σ82πσ21[σ42πσ21euμ2σ2du2σ2uμ22πσ21euμ2σ2du+uμ42πσ21euμ2σ2du]E19

Assume σ2=2δ2, then the first term of Eq. (19),

σ412π2δ2euμ22δ2du=σ4212πδ2euμ22δ2du=σ42E20

and the second term of Eq. (19),

2σ2uμ22πσ21euμ2σ2du=2σ22uμ22πδ21euμ22δ2du=2σ22EUμ2=σ42E21

and the third term of Eq. (19),

uμ42πσ21euμ2σ2du=12uμ42πδ21euμ22δ2du=12EUμ4E22

by using the central moments equation of the real-valued random variable U, EUμn=Ej=0nCjn1njUjμnj then,

EUμ4=EU44μEU3+6μ2EU24μ3EU+μ4E23
Based on the moments equation,μk=EUk,get,
EUμ4=μ44μ3μ+6μ2μ24μμ3+μ4E24

Substitute the three parts defined by Eq. (20), Eq. (21) and (Eq. (22) into Eq. (19) get,

JUλ=[1σ82πσ21(12μ44μ3μ+6μ2μ24μμ3+μ4]1/5E25

Eq. (25) is the end of the derivation. The optimal power parameter value is the one that minimizes the value of JUλ. In the practical application, the estimators of the maximum likelihood method were used for the moments about zero μ̂k=i=1nuik/n and the central moments μ̂k=uiu¯k/n.

Advertisement

4. Proposed application algorithm

For the univariate time series {Zt,tR}, assume that the sample is divided into p+1) statistical samples of size n=Nsp+1 so that the time series data set can be defined as a functional data XiYii=1,..,n. The regression model,

Y=mX+εE26

represents the relationship between the smooth functional data mX and scalar response Yi=Zi+s,i=p,,Ns. The white noise ε is a sequence of independent identically distributed functions in such Eε/X=0. X1,X2,.,Xn are identically distributed as the functional random variable Xi=Zip+1Zi. Assume N= for some nϵN and some τ>0 to get a statistical sample of curves Xi=Zti1τ<t of size n1and the response Yi=Z+s,i=1,,n1 [5]. The kernel regression estimator evaluated at a given functionmX in Eq. (26) by:

m̂X=i=1nYiKh1dXXii=1nKh1dXXiE27

Where K is a kernel function and, h (depending on n) are a positive real bandwidth and dXXi denotes any semi-metric index of proximity between the observed curves based on the functional principal components [5, 6, 21]. Many authors have proposed a number of methods for measuring the proximity such as, the method of FPCA in which, dXXi is measuring by the square root of the quantity XitXjt2dt or the quantity Xi2tXj2t2dt (for more details, see [4, 21, 22, 23, 24, 25]).

The application methodology includes estimating the smooth functional data mX in the regression equation Eq. (26) according to the kernel estimator Eq. (27) after transforming the time series dataset. So, the following proposed application algorithm of the nonparametric estimation of transformed functional time series according to the proposed new approach for transforming the kernel density were as follows:

Step 1: Choosing the common range Λ=33 for the power parameter λ

Step 2: Calculate the value of JψZλ according to Eq. (14).

Step 3: Transform the original response variable Z according the Finney [11] PT model, ΨZ=Zλ when λ0 and BCT model Eq. (1) to get the explanatory functional matrices ΨλX=Ψλznxτ (for more about the matrices file organizing in the R program, see [5, 21]”.

Step 4: Redefining the functional data of the regression model XiYii=1,..,nso that the statistical sample of curves Xi=Zti1τ<t is defined as follows,

ΨλXi=ΨλZti1τ<tE28

and the response Yi=Z+s,i=1,,n1 is defined as follows,

ΨλYi=ΨλZ+sE29

Step 5: Defining the Eq. (28) and Eq. (29) in whichτ equal the seasonal length.

Step 6: Estimate the explanatory function regression ΨλYi=mΨλX+ε, (where)

m̂ΨλX=i=1nΨλYiKh1dΨλXΨλXii=1nKh1dΨλXΨλ(XiE30

by using the Nadaraya–Watson regression estimator for functional data.

Step 7: Perform the steps 2 through 6 for all λΛ.

Step 8: Choose the optimal value that corresponds to the lowest value of JUλ.

Step 9: Calculate the estimator of mean square errors of the last curve MSEXn=1/sj=1sẑjzj2, where, ẑj and zj are the j-th estimated and real values respectively in the last curve. ẑjvalues denoted to, they are computed from the back transform of ψz=ziλ.

In all PT methodologies, the decision rule for choosing the optimal power parameter, always leads to what we might call the area of feasible solutions. For example, the argumentative question in BCT is: Does the optimal parameter that results from minimizing MLE method for the original response function achieve the normality of the transformed variable in practice? The authors believe that this problem is due to the nature of the data. In the proposed approach, the optimal power parameter that corresponds to the lowest JUλ, we have the challenge of complexity in the feasible solutions area that we suppose to achieve: The transformed response normality in practical application that provides quality conditions for both functional and nonparametric analyzes approaches in nonstationary seasonal time series (For more see [26, 27] that point to other challenges related to the use of PT and the quality of the power parameter estimation).

Advertisement

5. Applications

The PT models indicated in the proposed application algorithm have been applied to two examples of nonstationary time series of monthly temperature averages [21]. The first has a size of 200 observations of Nineveh City in Iraq (TSN) for the period 1976 to 2000 (Figure 1a). The second has a size of 300 observations of Tunisia (TST) of the period 1991 to 2015 (Figure 1b). R software was used to analyze the data. The data is available at https://climateknowledgeportal.worldbank.org.

Figure 1.

Plots of the monthly temperature averages series: (a) TSN; (b) TST [21].

Returning to the ideas of the of feasible solutions area, we must verify the results of choosing the optimal PT value according to the proposed density transformation approach and its contribution to achieving the analysis efficiency requirements: the concavity of Juλ, the normality of the transformed response, and the reduction of the prediction error in the functional nonparametric time series analysis.

Mathematically, Juλ is a concave function, but a number of authors state the possibility that there is no mini-point or is not unique [8, 18]. This conclusion may depend on the success in choosing the appropriate PT model [20]. The plots in Figures 2 and 3 show the curves of the ordered pairs λi^Juλ of Eq. (14).

Figure 2.

The curves of the ordered pairs λ̂iJuλ of the transformed responses of the two time series data sets using FT: (a) TSN. (b) TST.

Figure 3.

The curves of the ordered pairs λ̂iJuλ of the transformed responses of the two time series data sets using BCT: (a) TSN. (b) TST.

In Figure 2, it can be seen that the curves of the two time series data sets using FT has a concavity point in the range 30, while the Juλ values tends to zero in which the curves fades towards the horizontal line in the range 03.

While when applying BCT, it becomes clear from Figure 3 that there is no point of concavity in the curves of Juλ as its value goes to zero whenever the value of λ goes to −3. Therefore, it is not possible to obtain an optimal value for λ.

As for the normality of the transformed data, Table 1 shows for the two examples, that the response variable data in its original and transformed states are not normal. Both optimal values of λ corresponding to the minimum values of Juλ did not shift the data to the normal shape. But on the other hand, the improvement in the forecastability of the two-time series was evident through the estimates of mean square errors of the last curve (Table 2).

Time SeriesResponsesλ̂p-value
K-SmirnovSh.-Wilk
TSNZt1.00.00028.5E–7
ΨλZt−0.42.2E–162.8E–6
TSTZt1.02.0E–85.0E–11
ΨλZt−0.62.2E–165.9E–10

Table 1.

The data normality tests of the original and transformed responses in the two examples using FT model.

Time SeriesResponsesλ̂MSEZXn
TSNZt1.01.7616
ψ̂1Zt−0.40.9462
TSTZt1.00.4303
ψ̂1Zt−0.60.1994

Table 2.

The MSE estimates of the last curve Xn of the two-time series datasets.

Advertisement

6. Conclusions

In the analysis of parametric and non-parametric time series, like any statistical modeling process that requires the availability of certain conditions so that the results of statistical inference are reliable, which contributes to improving the forecastability.

Data is rarely ready for statistical analysis, which necessitates the use of power transformation to improve the required output. In this chapter, power transformation has been used with a new methodology to improve the outputs of the analysis with the following three directions: time series, nonparametric estimation and functional analysis. Therefore, the authors faced the challenge of choosing the optimal power parameter estimation method in accordance with the conditions of the feasible solutions area for the three directions. Using MISE as a criterion for choosing the power parameter in the proposed method did not achieve the normality of the data but it enhanced the forcasetibility of the time series.

By applying the FT and BCT models, the first was applicable and fulfilled the concavity condition for the transformation effect measurement function Juλ, while the function curve was divergent at both ends of the power parameter range using the second model.

In the future, we recommend developing the proposed methodology using other transformation models and looking into the possibility of using it in other shapes of time series.

References

  1. 1. Germán Aneiros-Pérez G, Cao R, Vilar-Fernández JM. Functional methods for time series prediction: A nonparametric approach. Journal of Forecasting. 2010;30(4):377-392. DOI: 10.1002/for.1169
  2. 2. Kidziński Ł. Functional time series. preprint arXiv:1502.07113. [stat.ME] 2015. https://doi.org/10.48550/arXiv.1502.07113
  3. 3. Kannel PR, Lee S, Kanel SR, Khan SP. Chemometric application in classification and assessment of monitoring locations of an urban river system. Analytica Chimica Acta. 2007;582(2):390-399. DOI: 10.1016/j.aca.2006.09.006
  4. 4. Dauxois J, Pousse A, Romain Y. Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. Journal of Multivariate Analysis. 1982;12(1):136-154
  5. 5. Ferraty F, Vieu P. Nonparametric models for functional data, with application in regression, time series prediction and curve discrimination. Nonparametric Statistics. 2004;16(1–2):111-125
  6. 6. Shang H, Xu R. Functional time series forecasting of extreme values. Communication Statistics. 2021;7(2):182-199
  7. 7. Maadooliat M, Huang JZ, Hu J. Integrating data transformation in principal components analysis. Journal of Computing Graphical Statistics. 2015;24(1):84-103. DOI: 10.1080/10618600.2014.891461
  8. 8. Wand MP, Marron JS, Ruppert D. Transformations in density estimation. Journal of the American Statistical Association. 1991;86(414):343-353
  9. 9. Ruppert D, Wand MP. Correcting for kurtosis in density estimation. Australian and New Zealand Journal of statistics. 1992;34(1):19-29
  10. 10. Chavez-Demoulin V, Davison AC. Modelling time series extremes. REVSTAT-Statistical Journal. 2012;10(1):109-133
  11. 11. Finney DJ. Statistical Method in Biological Assay, Charles Griffin. 1st ed. London: Charles Griffin &; Co. Ltd; 1952
  12. 12. Box GEP, Cox DR. An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological). 1964;26(2):211-252
  13. 13. Tukey JW. On the comparative anatomy of transformations. The Annals of Mathematical Statistics. 1957;28:602-632. DOI: 10.1214/aoms/1177706875
  14. 14. Yang L, Marron JS. Iterated transformation–kernel density estimation. Journal of the American Statistical Association. 1999;94(446):580-589
  15. 15. Bean A, Xinyi X, MacEachern S. Transformations and Bayesian density estimation. Electronic Journal of Statistics. 2016;10(2):3355-3373
  16. 16. Pitt D, Guillen M, Bolancé C. Estimation of parametric and nonparametric models for univariate claim severity distributions: An approach using R. Journal of Financial Education. 2011;42(1–2):154-175
  17. 17. Sakthivel KM, Rajitha CS. Kernel density estimation for claim size distributions using shifted power transformation. International Journal of Science and Research. 2013;6(14):2025-2028
  18. 18. Bolance C, Guillen M, Perch Nielsen J. Kernel density estimation of actuarial loss functions. Insurance Mathematics and Economics. 2013;32(1):19-36
  19. 19. Koekemoer G, Swanepoel JWH. Transformation Kernel density estimation with applications. Journal of Computational and Graphical Statistics. 2008;17(3):750-769
  20. 20. Bean A. Transformations and Bayesian Estimation of Skewed and Heavy-Tailed Densities. Ohio State University; Ohio LINK Electronic Theses and Dissertation Center 2017. 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1503015935192212
  21. 21. Othman SA, Ali HTM. Improvement of the nonparametric estimation of functional stationary time series using Yeo-Johnson transformation with application to temperature curves. Advances in Mathematical Physics. 2021;2021. DOI: 10.1155/2021/6676400
  22. 22. Castro PE, Lawton WH, Sylvestre EA. Principal modes of variation for processes with continuous sample curves. Technometrics. 1986;28(4):329-337
  23. 23. Ferraty F, Vieu P. Curves discrimination: A nonparametric functional approach. Computational Statistics & Data Analysis. 2003;44(1–2):161-173
  24. 24. Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Springer Science & Business Media; 2006
  25. 25. Febrero-Bande M, de la Fuente MO. Statistical computing in functional data analysis: The R package fda. usc. Journal of statistical Software. 2012;51(1):1-28
  26. 26. Atkinson AB, Riani M, Corbellini A. The Box–Cox transformation: Review and extensions. Statistical Science. 2021
  27. 27. Soleymani S. Exact Box-Cox Analysis. The University of Western Ontario; Electronic Thesis and Dissertation Repository, 2018. https://ir.lib.uwo.ca/etd/5308/

Written By

Haithem Taha Mohammed Ali and Sameera Abdulsalam Othman

Submitted: 14 May 2022 Reviewed: 13 June 2022 Published: 22 July 2022