Fractal Analysis for Time Series Datasets: A Case Study of Groundwater Quality

Fractal dimension (FD) is a highly used mathematical tool to measure long-term memory of time series dataset in various research areas and also applied in chaos theory and fractal and spectral analysis. FD analysis has been applied in various disciplines, e.g., from biophysics, hydrology to computer networking. In developing countries like India, the water quality parameter characterization is very much challenging due to the increase of the contaminated substances in groundwater. In view of health issues and drinking water standards, water quality assurance is a requisite on the region basis. In order to quantify the same, a numerical index known as water quality index (WQI) well adopted by worldwide researchers has been recognized for its significance and applicability for water characterization. Further, the water quality parameters, such as turbidity, chloride, ferrous (Fe), nitrate, pH, calcium (Ca), magnesium (Mg) fluoride (Fl), total dissolved salts ( TDS), alkalinity, hardness, and sulfate, could significantly improve the understanding through statistical and fractal modeling. Especially in the high mountainous regions of the Himalayas where there is scarcity of observed dataset, the predictability estimation will be highly applicable in WQI modeling. In the current study, statistical relationship among the sample datasets is obtained by regression equation, coefficient of correlation, Hurst exponent, and FD and probability index between water parameters for Tehri District. It is concluded that the fractal analysis is a better statistical and mathematical tool to calculate water quality indices. Fractal analysis among the various parameters suggested that the water samples are good for drinking and the health.


Introduction
The Himalayan region has large river basins, and this categorizes it as having a dynamic hydrology which required scientific approaches to study the water resource management and planning at regional basis. Due to large-scale developmental activities along the river basin, the hydrology of the region is affected and hence the water quality. These development activities impact overall water balance of the region, which has a negative effect on various environmental factors like flora, fauna, soil, air, drinking water, and ultimately the human health. Water is a valuable natural resource which contains various suspended substances of organic matter and minerals.
Huge urbanization and drilling activities affect the land transformation, and hence surface water pollution impacts the quality of drinking water [1,2]. In the indication of the water quality standards and the estimations of constituent concentrations, the laboratory processes and statistical methods are required, and worldwide researchers used the statistical analysis such as multiple linear regressions methods [3][4][5][6]. Researches emphasize to examine the physical properties of surface water, and the constituent concentrations can be used for the assessment of water quality situation or water balance with analyzing the parameter deviation and water quality standards [7][8][9][10][11]. Modeling of surface water quality is carried out that assesses numerical indices using earth observation datasets and laboratory methods such as X-ray diffraction technique over Allahabad district, India [12]. It is observed that the statistical methods are capable of comparing the numerical indices. Contaminated quantity separation in the groundwater samples were also studied with numerical index approach and found to be significant for water parameter characterization [13,14]. Remote sensing and GIS-based approaches along with groundbased dataset have also been applied to study the water quality parameters [15,16].
Fractal dimension (FD) and laboratory methods together have been used to study the water quality, and the indices are calculated as the weighted average of all observations of interest [17]. Fractal theory has been widely applied on diverse types of datasets in hydrology, geophysics, and climate as well as in other research areas to identify the patterns in time series datasets for describing the irregular and complex behaviors of dynamic systems [18][19][20][21]. The rainfall spatiotemporal variations are analyzed for flood seasons in China during 1958-2013 using Hurst exponent and concluded that the rainfall trends will persist in the future also having implications for the ecological restoration and farming operations [22]. Fractal approach has been applied to estimate the climatic indices for climatic variables (pressure, temperature, and rainfall) in the Himalayan foothill region [23]. Fractal dimension demonstrated significant variations from station to station with the values relatively closer to unity at high-altitude sites indicating better climate predictability than that of those over the low-altitude stations in the Himalayan foothills.
In this investigation fractal and statistical analysis is carried out to establish relationships among water parameters such as turbidity, chloride, ferrous, nitrate, pH, calcium, magnesium, fluoride, total dissolved salts (TDS), alkalinity, hardness, and sulfate and to get the significant understanding of WQI. Fractal analysis improves the understanding of WQI especially in the mountainous regions of the Himalayas where 3D models show limitations in resolving the highly complex geographical topography. Despite the advantage with statistical modeling to inherit the effects of terrain and correlated variations among various meteorological parameters, comprehensive investigations of such statistical relationships among observed water quality parameters are lacking over the Himalayan foothills and needs to be studied in detail.
This study intends to carry out the statistical and fractal analysis of groundwater parameters to establish the relationship among the various indices and to understand the behavior of water quality indices (WQI) with the predictability index (PI), Hurst exponent (H), and fractal dimension (D). This study may offer the basic understanding of the WQI of different water parameters regarding the regional hydrogeochemical processes with the laboratory testing methods.

Study region and observational dataset
The state of Uttarakhand located in the lap of the central Himalayan region has been identified as a hotspot of anthropogenic stress and one of the most vulnerable regions for climate-mediated risks. The region provides water resources supporting millions of people in South Asia. During the last few decades, the central Himalayan region is observing the cascading effects of the climate change including rise in temperature, receding of glaciers, erratic precipitation patterns, etc. [24]. In the aforementioned scenario, the Himalayan region is receiving global scientific attention for glacier and water resource studies. However, available climate models often have limitation in resolving the highly complex geographical topography in this region which directly or indirectly impacts the water balance over the region. Hence, timely and accurate relationship of water indices using statistical methods inheriting the relationship among water parameters complements the understanding of the available water in the fragile ecosystem of the state of Uttarakhand.
The present study is carried out over the Tehri District of Uttarakhand. The detailed study area map (Figure 1) shows the location of the main rivers and drainages along the water quality parameters collected. The observations of water   In this study, we compute the Hurst exponent, fractal dimensions, and the predictability index (PI) of water quality parameters such as (1) turbidity, (2) chloride, (3) ferrous, (4) nitrate, (5) pH, (6) calcium, (7) magnesium, (8) fluoride, (9) total dissolved salts (TDS), (10) alkalinity, (11) hardness, and (12) sulfate, at high-altitude Tehri stations in the Himalayan foothills using the fractal theory. Figure 2 shows the box plot of all the aforementioned 12 water parameters obtained for the study site. The irregular pattern in the WQI can be used in prediction purposes by analyzing its dynamic flow (i.e., chaotic, random, or deterministic structural pattern). Proper identification, classification, and mapping of water parameters of high-intensive and complex nature require frequent monitoring of these datasets especially in the context of drinking water.

Statistical analysis
Water quality parameters have been analyzed using the numerical index, multivariate statistics, and earth observation datasets [25]. The average value, positional average, and the maximum frequency values in the series datasets are estimated with mean, median, and mode correspondingly. Variability of the sample datasets is measured with standard deviation, and peakedness is estimated by kurtosis. The symmetry between data points is estimated with skewness approaches. Coefficient of variation gives the extent of variability of data in a sample.

Regression analysis
This analysis examines the influence of one or more independent variables on a dependent variable. The regression equation with dependent variable Y and independent variable X is represented as: where C is a constant of integration [5]: where σ y and σ x are standard deviation of variables Y and X, respectively, and Ψ(X), Ψ(Y), and Ψ(XY) are the expected value of variables X, Y, and XY, respectively.

Fractal dimension (FD)
The term fractal comes from the Latin word fractus means "fraction or broken"; the basic concept lies in the fact that fractals have a large degree of self-similarity within themselves which was coined by Benot Mandelbrot in 1975. Fractals are characterized by self-similarity property having similar characteristics when analyzed over a large range of scales, and individually a single entity will have similar characteristics to that of the whole fractal [26,27]. Fractal dimension estimation from a fractal set has various methods due its simplicity and automatic computability. The box counting is one of the major categories of fractal analysis and the most used technique to analyze image features such as texture segmentation, shape classification, and graphic analysis in many fields [28,29]. The variance and spectral methods are two other major categories of fractal dimension analysis of a time series that recognize the determinism and randomness in data [30]. To study the naturally complex features such as coastlines, river boundaries, mountains, and clouds, the fractal dimension analysis has also provided a mathematical model as a fractal geometry [31,32]. The glacial and fluvial morphologies are distinguished by using an automated approach (i.e., multifractal). In previous study, a multifractal detrended fluctuation analysis (MFDFA) has been carried out to estimate the variation of elevation profile of glacial and fluvial landscapes [33]. It has been observed that glacial landscapes reveal more complex structure than that of the fluvial landscapes as indicated by fractal parameters, such as degree of multifractality, asymmetry index, etc. The basic definition of fractal dimension is the Hausdorff dimension; however, box counting or box dimension is another popular definition which is easy to calculate.

The Hurst exponent
Hurst exponent (H) is used as a measure of long-term memory of time series and a real-valued time series defined as the exponent in the asymptotic scaling relation [30,34]. The Hurst exponent and fractal dimension are also directly related to each other and indicate the roughness of a surface. The Hurst exponent's value lies in a time series as persistent (0.5 < H ≤ 1) or anti-persistent (0 ≤ H < 0.5), and when the data are not intercorrelated, then H = 0.5 which implies that the series is unpredictable. This approach is used in various complex engineering fields as it provides statistical self-similarity relationship.
In terms of asymptotic scaling relation, the Hurst exponent of real-valued time series is defined as: where C is a constant, angular brackets ⋯ h i denote expected value, S(n) is the standard deviation of the first "n" data of the series X 1 , X 2 , ⋯, X n f g , and R(n) is their range: The Hurst exponent H is calculated from rescaled range technique and can also be computed from wavelet method for the time series X 1 , X 2 , ⋯, X n f g .

Estimate of the Hurst exponent: Wavelet approach
If f(t) is a self-affine random process, "t" a position parameter (time or distance), a > 0 is a scale (dilatation) parameter, w(t) is a mother wavelet, and is its shifted, dilatated, and scaled version, then the continuous wavelet transform of f(t) is defined as: If the time series f(t) is self-affine, the variance of W t, a ð Þ will scale with the dilatation parameter asymptotically as: When the exponent "δ" is between À1 and 3 (i.e., À1 ≤ δ ≤ 3), the Hurst exponent is defined as: where FGN is the fractal Gaussian noise and FBM is the fractional Brownian motion. The Hurst exponent is linked with fractal dimension (D) and defined as: Now the climate predictability index is given as: If PI is close to zero, climate is unpredictable. The closer the PI to 1, the more predictable the climate is.

Results and discussion
To distinguish the fresh and contaminated water and establish relationship between the parameters have become a major concern for environmentalists and health workers. And due to increased levels pollutants, it is very challenging for municipal authorities to make availability of clean drinking water especially in developing countries. The statistical relationship of water models depends on the dynamics of climatic as well as soil parameters and thermodynamic processes among the surface water parameters. The established statistical relationship among the various water quality parameters is shown in Table 1, which suggested that the variation among these parameters occurs due to variability in the originating environment and is affected by terrain conditions by which it flows down. In dynamic systems, this kind of response generates irregularity, which may show a random pattern of certain type. Figure 1 shows the box plot of the 12 water quality parameters over the study area. of the statistical and fractal analysis is shown in Table 2, and each WQI analysis is discussed subsequently.

Turbidity
The turbidity sample datasets exhibit normal behavior as the mean, median, and mode values are approximately equal. Standard deviation is found to be 1.5 and Y Parameters-X suggests that the sample data points are close together. The positive skewness (1.747) of data points reveals that the curve is not symmetrical, and the kurtosis value 3.13 shows that the sample datasets are platykurtic. Turbidity has persistent behavior with chloride, nitrate, fluoride, TDS, alkalinity, and sulfate and antipersistent behavior with ferrous, PH, Ca, Mg, and hardness parameters.

Chloride
Mean and mode values are in the order of AE0.5, and thus the data show normal behavior even though the median is 0. High standard deviation (5.009) is observed between sample points. Skewness value (1.52) suggests that the curve is not symmetrical, and the kurtosis value (1.3) is less than 3. Chloride has the Brownian time series (true random walk) behavior with Fl, sulfate, and turbidity parameters. Thus, the curve is platykurtic. Chloride has persistent behavior with turbidity, nitrate, TDS, and alkalinity and anti-persistent behavior with Fe, PH, Ca, Mg, and hardness parameters.

Ferrous (Fe)
Average, median, and mode values are approximately equal, and thus the data show normal behavior. Standard deviation value (1.031) exhibits that the sample points are close to each other. Skewness value (11.713) suggests that the curve is not symmetrical, and kurtosis value is very large; thus, the curve is not platykurtic. The sample dataset containing heavier outliers and Fe has Brownian time series (True random walk) behavior with nitrate, fluoride, and hardness parameters. It has persistent behavior with pH, Ca, and Mg and anti-persistent behavior with chloride, TDS, alkalinity, and sulfate parameters.

Nitrate
Mean and median values and standard deviation are approximately equal; thus data exhibit normal behavior. This suggests that the sample data are close to each other. The skewness value (1.011) and kurtosis are less than 3; hence the curve is not symmetrical and platykurtic. Nitrate has persistent behavior with turbidity, chloride, fluoride, TDS, alkalinity, hardness, and sulfate and anti-persistent behavior with Ca, Mg, Fe, and PH parameters.

pH
Average and median are almost same, i.e., 7.189 and 7.20, respectively, whereas the mode of pH is 0.25. These values are approximately equal and hence exhibit the normal behavior. Standard deviation (SD) is 0.691, and skewness is close to 0, and all values are also close to each other; thus pH is symmetrical. The curve is not platykurtic, as kurtosis is very large 74.92. It shows the Brownian time series behavior with fluoride (Fl) parameter; persistent behavior with Ca, Mg, Fe, and nitrate; and anti-persistent performance with different parameters, i.e., turbidity, chloride, TDS, alkalinity, hardness, and sulfate.

Calcium
Mean, median, and mode values are not close to each other; thus the curve does not show normal behavior. High standard deviation ($40) indicates that the Ca values are very much distributed from each other. It is positively skewed, and the curve is not platykurtic. With few parameters, i.e., turbidity, chloride, TDS, alkalinity, hardness, Mg, Fl, TDS, and sulfate, it shows persistent behavior and antipersistent behavior with Fe and pH parameters.

Magnesium
Mean and mode values are 30.75 and 22.0, respectively, and median is 0, so the sample dataset are not same, and thus the curve does not show normal behavior. Standard deviation value is high (50.368); thus, the values of Mg are very much distributed with each other. It is positively skewed, and the curve is not platykurtic. Mg has Brownian time series (true random walk) behavior with pH and alkalinity parameters. Mg has persistent behavior with turbidity, chloride, nitrate, Ca, TDS, hardness, Fl, and sulfate and anti-persistent behavior with Fe parameters.

Fluoride
Mean and median values are approximately equal, and thus the curve shows normal behavior. Standard deviation value (0) suggests that the sample data are close to each other, and the skewness and kurtosis value suggest that curve is platykurtic. Fl has Brownian time series (true random walk) behavior with nitrate and hardness parameters. Fl has persistent behavior with turbidity, chloride, TDS, alkalinity, and sulfate and anti-persistent behavior with Fe, pH, Ca, and Mg parameters.

Total dissolved salts (TDS)
Mean, median, and mode values are different; thus, the curve does not follow normal behavior. Standard deviation value is high (116.57); thus the values of TDS are not close to each other. TDS has Brownian time series (true random walk) behavior with turbidity, chloride, pH, Fl, and sulfate parameters. It is negatively skewed and the curve is platykurtic. TDS has persistent behavior with alkalinity, nitrate, and chloride and anti-persistent behavior with Fe, Ca, Mg, and hardness parameters.

Alkalinity
Average, median, and mode (1.27) values are nearly equal to each other, and sample data exhibit normal behavior. High standard deviation (63.72) is observed between the datasets, with the skewness value (0.331) which is close to 0. The kurtosis value is less than 3; thus the curve is symmetrical and platykurtic. It has Brownian time series behavior with chloride, Fl, TDS, and sulfate parameters. Alkalinity has persistent behavior with nitrate and anti-persistent behavior with turbidity, Fe, pH, Ca, Mg, and hardness parameters.

Hardness
The data series does not exhibit normal behavior as the mean and median values are a having large difference with the mode value (95.0). Standard deviation value (102.5) suggests that data are spread out, and skewness values observed to be 1.49; hence the curve is platykurtic. Only with Mg parameter, it has Brownian time series flow. Hardness has persistent behavior with turbidity, chloride, nitrate, PH, Ca, Fl, TDS, alkalinity, and sulfate and anti-persistent behavior with Fe parameters.

Sulfate
Mean and median values are different and mode value is 0. Standard deviation (17.16) reveals that the dataset has distributed form. The skewness value is equal to 2.40 with larger kurtosis value, i.e., 7.22, which indicates that the curve is not symmetrical. It has true random walk flow with Fl and nitrate parameters. Sulfate has persistent behavior with turbidity, chloride, TDS, and alkalinity and antipersistent behavior with hardness, Fe, PH, Ca, and Mg parameters.

Conclusion
The water parameters from different sources in the Tehri District of Uttarakhand have shown the non-platykurtic curve. The analysis of most of the water parameter combinations has shown the Brownian time series behavior with each other. The irregular pattern in the WQI can be used for prediction purposes by deciding if its dynamic follows a chaotic, random, or deterministic structural pattern. Mostly all groundwater variables like turbidity, chloride, iron, nitrate, pH, calcium, magnesium, fluoride, TDS, alkalinity, hardness, sulfate, etc. are affected by each other. The pH of the sample datasets shows the Brownian time series behavior with fluoride (Fl) parameter; persistent behavior with Ca, Mg, Fe, and nitrate; and anti-persistent performance with turbidity, chloride, TDS, alkalinity, hardness, and sulfate. Turbidity, chloride, nitrate, fluoride, TDS, alkalinity, sulfate, and chloride have shown persistent behavior with each other. Fe has persistent behavior with pH, Ca, and Mg, and nitrate has persistent behavior with turbidity, chloride, fluoride, TDS, alkalinity, hardness, and sulfate. pH has persistent behavior with Ca, Mg, Fe, and nitrate. Turbidity, chloride, TDS, alkalinity, hardness, Mg, Fl, TDS, sulfate, and Ca show persistent behavior. Mg has persistent behavior with turbidity, chloride, nitrate, Ca, TDS, hardness, Fl, and sulfate. Fl has persistent behavior with turbidity, chloride, TDS, alkalinity, and sulfate. TDS has persistent behavior with alkalinity, nitrate, and chloride. Alkalinity has persistent behavior with nitrate only. Hardness has persistent behavior with turbidity, chloride, nitrate, pH, Ca, Fl, TDS, alkalinity, and sulfate. Sulfate has persistent behavior with turbidity, chloride, TDS, and alkalinity.
Turbidity and chloride have anti-persistent behavior with Fe, pH, Ca, Mg, and hardness parameters. Fe has anti-persistent behavior with chloride, TDS, alkalinity, and sulfate parameters. Nitrate has anti-persistent behavior with Ca, Mg, Fe, and pH parameters. pH has anti-persistent performance with different parameters, i.e., turbidity, chloride, TDS, alkalinity, hardness, and sulfate. Mg has anti-persistent behavior with Fe parameters only and Ca with Fe and pH parameters. Fl has antipersistent behavior with Fe, pH, Ca, and Mg parameters. TDS has anti-persistent behavior with Fe, Ca, Mg, and hardness parameters. Alkalinity has anti-persistent behavior with turbidity, Fe, pH, Ca, Mg, and hardness parameters. Hardness has anti-persistent behavior with Fe parameter only, and sulfate has anti-persistent behavior with hardness, Fe, PH, Ca, and Mg parameters.
The persistent behavior is observed among the various indices which reveal that the variations of the water quality parameters are under an acceptable range with each other. This study is focused on the utility of the Hurst exponent, fractal dimension as an analysis tool, and predictability indices (PI) along with regression and coefficient of correlation among the water quality time series data points. It is concluded that the fractal analysis is a better statistical and mathematical tool to calculate water quality indices. Fractal analysis among the various parameters suggested that the water samples are good for drinking and the health.