Water Quality Parameters and Monitoring Soft Surface Water Quality Using Statistical Approaches

Water is the matrix of life and is indispensable on Earth. Water has a multitude of applications and all known life forms depend on it. Therefore, water quality is important for all of us. Water quality can be represented by a set of physical, chemical, biological and bacteriological characteristics. These parameters allow water to be classified in multiple categories leading to its use for a specific purpose. This chapter establishes the connections between external causes and their effect on water quality parameters. In order to provide information on water quality, different Water Quality Index (WQI) models can be used. In order to study the association between water quality parameters, several correlation coefficients have been developed. For a coherent statistical approach, we have used Pearson and Spearman correlations. In order to exemplify the manner in which WQI can be calculated and interpreted, we used a series of data from our previous work, consisting of 13 parameters measured for water samples taken from the Danube River, from Galati City area, Romania.


Introduction
Water is a common "good" of the whole society and is essential for human, animal and plant life and has a multitude of uses.

Water classification
Water can be classified according to its source into [1,2]: 1. Surface water (water located at the surface of the soil) -Surface water sources are represented by running waters, water seas, oceans, rivers, lakes, icebergs; 2. Groundwater water below the surface in the saturated area and in direct contact with soil or subsoil. Groundwater sources are represented by groundwater aquifers, deep aquifers, springs; and, 3. Atmospheric water.
According to the field of use, the water is classified as: 1. Drinking water: for domestic consumption and for agriculture; and, 2. Industrial water: auxiliary in manufacturing processes, raw material for various industries, power generator, coolant agent, heating agent, etc.
Wastewater is water that has changed its original properties through use, in other words has been contaminated by human beings [2].

Water quality
Water quality is represented by a set of physical, chemical, biological and bacteriological characteristics. These characteristics are also called parameters or indicators. Physical, chemical, biological and bacteriological parameters allow water to be classified in some categories, leading to its utilization for a specific use.
Water quality requirements depend on the purposes for which the water will be used. Thus, drinking water must not contain chemicals or micro-organisms which can affect the human health. Water used in agriculture must not contain large amounts of sodium ions, high concentrations of nitrates or high concentrations of other contaminants. Requirements for water use in industry are less rigorous than drinking water [3].
Water quality also depends on the type of water source and changes with geological, meteorological and land use conditions. The World Health Organization (WHO) has established regulations and standards for water safety in support of public health [3]. The European Union has, also, established a legal framework for water protection [4]. Water quality criteria in all countries have been established in accordance with the WHO guidelines [3]. In European countries, the framework directives of the European Union are closely followed [4].
The EU Framework Directive requires that operational monitoring should be specific and based on monitoring relevant biological, hydro-morphological and physic-chemical parameters. These world environmental monitoring systems provide for water quality measurements in three categories of parameters:  [4,5].
To understand the overall health of an ecosystem and the condition of water, a number of water quality parameters or indicators must be analyzed and monitored. In 1998, Sene and Farquharson [6] stated that monitoring of the surface water quality is necessary to assess spatial and temporal regional variations. The process of monitoring the quality of ambient water has led to the development of water standards and the periodic assessments of the environment.
The monitoring program and the parameters to be measured for the study of water quality should be chosen specifically for each locality and each type of water. Although many parameters of water are important for human health or the health of an ecosystem, the analysis of all parameters is not feasible. The standards recommend the analysis of specific parameters for both drinking water and nondrinking water [6,7].
Chemical and physical parameters are important in the rapid determination of water quality while biological parameters provide a detailed and complex analysis of the environment [8].

General physico-chemical parameters
Temperature is an important parameter that influences the chemical properties of water. Temperature affects the density and stratification of water, the density and viscosity of transported sediments, solubility of dissolved gases, vapor pressure [9].
pH is determined only at the place where the sample was collected, directly from the water source while also determining the air temperature. Due to the presence of carbon dioxide, bicarbonates, and carbonates, the pH of the water varies very little from the neutral pH. The pH of natural water is usually in the range 6.5-8. The pH of the wastewater can be alkaline if pH is higher than 7 or acidic if pH is lower than 7 [10].
The conductivity of water is given by the presence of ions in the solution, ions that have the property of transmitting electric current. The higher the ionic concentration of the solution is, the higher the conductivity gets. The conductivity value depends on the amount of substances dissolved in the water. As a rule, high turbidity value also implies a high conductivity [10].
Turbidity expresses the amount of light reflected or absorbed by particles suspended in a water sample and is a measure of its relative clarity. Turbidity is due to solid particles in the form of suspensions or in a colloidal state [10].

Oxygen regime indicators
Dissolved Oxygen is an indicator of water quality whose values are dependent on the type of the water. The amount of oxygen dissolved in water depends on water temperature, air pressure and the quantity of acidic substances and microorganisms.
Oxygen is necessary for aquatic life. A series of aerobic chemical processes take place through dissolved oxygen: the oxidation processes of organic matter, oxidation of mineral substances, and bio-chemical decomposition of the dead bodies in water [9]. With the decrease of oxygen, the self-purification capacity of natural water is reduced, favoring the persistence of pollution with its undesirable consequences. Other indicators of oxygen regime are Biochemical Oxygen Consumption (CBO 5 ) and Chemical Oxygen Consumption (CCO). Biochemical Oxygen Consumption (CBO 5 ) is the amount of oxygen consumed by microorganisms, during a 5 day period, for the biochemical decomposition of organic substances contained in water, at a temperature of 20°C. Chemical Oxygen Consumption with chromium (CCOCr) is an integral index of the existence of difficult degradable organic substances. Chemical Oxygen Consumption with manganese (CCOMn) is a comprehensive index of the existence of easily degradable organic substances [11].

Biogenic indicators
Nitrites in water represent the incomplete oxidation of organic nitrogen. Their presence in the water indicates an old pollution, because the transformation of organic substances containing nitrogen under the action of microorganisms first convert into ammonia then ammonia converts into nitrites. Therefore, the concentration of nitrites in the water may indicate an old pollution because all these transformations take time. Under normal oxygenation of natural water, nitrogen appears in the form of nitrates. The chemical forms of nitrite and ammonium are present when water pollution occurs and are toxic to living organisms [10].
Nitrates represent the final stage of oxidation of organic nitrogen. If ammonia, nitrites and nitrates are present simultaneously in the water, this indicates a continuous pollution. The simultaneous presence of ammonia and nitrates in the water indicates an intermittent pollution [11].
The phosphate content in natural water is relatively low. High amounts of phosphorus in water can come from excessive use of nitrogen and phosphorus fertilizers. Higher concentrations of phosphorus in surface water can result in eutrophication.

Salinity indicators
Salinity is the content of mineral salts in water, mainly metal salts such as sodium, magnesium and calcium. The salts present in natural water are formed by the following cations Ca 2+ , Mg 2+ , Na + , K + and anions HCO 3 À , SO 4 2À , Cl À . The chlorides in the water come either from natural soil layers, pollution or animal origin. The amount of chlorides that are released from the soil is relatively constant and varies slightly over time. A significant increase in chloride content is usually an index of organic pollution [11].
Hardness is an indirect indicator of the degree of mineralization of water.

Heavy metals
Heavy metals are those metals that have a high density (i.e. 5 g/cm 3 ) [10]. In low concentrations, heavy metal ions are essential for the development of metabolic processes in plants and animals. These metals (e.g., cadmium, chromium, cobalt, lead, nickel, mercury, selenium) can come from natural or anthropogenic processes. If certain concentrations are exceeded, then they become toxic substances for the living organisms.

Biological and bacteriological indicators
Water quality and its changes due to various forms of pollution may influence the composition of aquatic biocenoses. Biological analysis consists of an inventory of phytoplankton, zooplankton, benthic organisms or periphyton from water samples.
The microbial flora found in the water can be classified into two categories: water-specific microbial flora and microbial impurity flora. Water-specific microbial flora consists of microorganisms that commonly inhabit water and soil: cocci bacilli, different fungi and bacterial species which play a role in the natural degradation processes of organic substances. Microbial impurity flora consists of species of microorganisms of human or animal origin. This category can include pathogenic saprophytes. These microbes are generally accompanied by high concentrations of organic matter which provide their nutritional support [11].
In bacteriological analysis of water, the total number of germs and the determination of the bacillus coli have been adopted as bacteriological indicators.

Statistical analyses for assessing the surface water quality parameters
Water quality is determined by the biological, chemical and physical parameters of the water. Most often, it is not enough to measure these water quality indicators. In order to draw some solid conclusions, it is necessary to apply adequate statistical method to the measurements. These statistical methods can provide useful information that can lead to actionable advice regarding water management. There are a large number of statistical methods for examining water quality.
The main differences between these methods are the statistical techniques used and the significance of the values determined for each parameter. Statistical indices developed using water quality parameters can be linear, non-linear, segmented linear or segmented non-linear [12]. In order to have a global vision of the changes of the water quality in space and in time, various indices have been developed [13].
The water quality index (WQI) is represented by a number that expresses the general water quality in a particular location, over time, based on several water quality parameters. The aim of this index is to transform a large number of complex water quality measurements into information that is easy for water managers and the public to understand and to use. Are a multitude of methods for calculating water quality indices (WQI). In the following, we present the weighted average method. This method was proposed by Horton in 1965 and developed by Brown et al. in the year 1970 [14].
For the calculation of the WQI, the following expression was used [15][16][17][18][19]: where: n is the number of the water quality parameters. W n is the Unit Weight: q n is the Quality rating: where: V n represents the measured value for the n th parameter of the water corresponding to a given sample; V id is the ideal value for the n th parameter corresponding to pure water; V id values are zero for most parameters except for pH and dissolved oxygen. V id for the pH is 7 and for dissolved oxygen (DO) is 14.6 mg/l [20].
S n represents the standard value allowed for the n th parameter; K it is a constant of proportionality calculated with the formula: Water quality is Excellent if the WQI index score is between 0 and 25; Good for values of 26-50; Poor for WQI = 51-75; and, Very Poor for values between 76 and 100. If the value of the WQI index exceeds the value of 100, then the water is unsuitable for drinking and cannot be transformed into drinking water by any process [19,21].
To study the relationship between two parameters of water samples, several correlation coefficients can be used. The statistics used most often are Pearson and Spearman coefficients. Linear correlation can be determined using the Pearson correlation coefficient while non-linear correlation can be determined using the Spearman coefficient. The Pearson correlation coefficient is a statistical technique that measures and describes the degree of linear association between two normally distributed continuous quantitative variables [21]. Let x and y be two variables, in our case two indicators of water quality. The Pearson coefficient, r, is calculated using the expression: where S xy represents the covariance, S x and S y are the standard deviations of the two variables x and y; x and y and are the mean values of the two variables x and y [21]. The Pearson coefficient takes values between À1 and + 1. The value of the coefficient indicates the strength of the relationship between parameters while the sign of the coefficient indicates the direction of the linear association. If the sign is positive, the two variables are directly correlated and, if the sign is negative, the two variables are inversely correlated. The closer of the Pearson correlation coefficient is to the value of 1, the stronger the "intensity" of the linear relationship between the two variables [21]. The variables x and y are independent if r has the value 0 (Figure 1).
The minimum value of the Pearson coefficient (r = 0) is not an indicator of independence of the two characteristics (variables), but only of their noncorrelation. The coefficient of determination (r 2 ) is the square of the Pearson coefficient. The coefficient of determination indicates the percentage of the total variation of the dependent variable (y) which is explained by the independent variable (x).
Spearman method is a non-parametric method used when the relationship between two variables is not linear (monotonic correlation) [23][24][25]. The Spearman coefficient addresses some limitations of the Pearson coefficient. It is denoted either with ρ or with r S and represents an alternative to the Pearson coefficient. To calculate the coefficient, the data must have an order or rank. Coefficient can be calculated using the formula: where: n is the number of pairs of values ordered in ascending order, d i is the difference between the orders of each pair(i) of values or the rank of the value: where: r x i is the rank of the value of x i in the ascending ordered system and r y i is the rank of the value y i in the ascending ordered system [21][22][23][24][25].
Eq. (6) is usually used when all n ranks are distinct integers or do not have tied ranks. When there are tied ranks, Eq. (6) is replaced by the following form: where r x and r y are the mean ranks of value x and value y [23].
Spearman coefficient values are in the range [À1, 1]. The interpretation of these values is similar with that of the Pearson coefficient [21].
For a correct interpretation, the correlation coefficient must be accompanied by a significance test. The correlation coefficient has statistical significance if the value level of confidence factor p < 0.05. This significance coefficient p means the probability of making erroneous statements. If p < 0.05, we could reject the null hypothesis H0 and the computed results has certain statistical significance [24]. If the p result of the test is less than the significance threshold α (α = 0.05), hypothesis H1 is accepted: there is monotonic correlation. If p is greater than 0.05, then the H0 hypothesis is valid, which considers that there is no monotonic correlation [25].

Case study: monitoring water quality of the Danube River using the statistical approach
In this section, we provide an example of how to apply these methods in order to achieve a rapid assessment of water quality. The data set chosen for statistical analysis comes from our previous work [15,16] and consists of 13 water quality parameters that were determined from samples taken from the Danube River. Sampling points were located along the river in the neighborhood of Galati. Galati is a Danube port city in the south-eastern part of Romania. Water samples were collected from November 2016 to December 2017. We will use data from 3 locations coded with D1, D4 and D7. All locations are along the Danube's left bank, D1 being located upstream and D7 downstream (Figure 2). The measured parameters were: potassium and calcium ions, nitrites, nitrates, total nitrogen, ammonium, chlorides, total phosphorus, sulphates, cadmium, chrome, copper, lead, iron, zinc, density, dissolved oxygen, chemical oxygen demand (CCO-Cr), biochemical oxygen demand (CBO 5 ), electrical conductivity, the density of the conductivity, resistivity, pH, salinity, total dissolved solids [15].
From our previous work [15,16], the scatter plot diagrams and the box plot diagrams of the parameters indicated that quality class thresholds were exceeded during certain time periods. Correlations between the measured parameters could not provide a clear conclusion on the water quality condition.
For these reasons to provide clear information on the water quality condition, we calculated the Water Quality Index (WQI).
The Water Quality Index evaluation consisted of several stages. It is important to scale and weight the values of the monitored parameters according to the allowed limit values.
The water quality standards, S n , were determined from Romanian legislation [26]. In accordance with the requirements of the "Normative on the classification of surface water quality in order to establish the ecological status of water bodies" [26], the limit values of the parameters are given for five water quality classes. In accordance with this legislation, we have used the water standards (S n ) for the third quality class. The surface water belonging to this class is considered moderately polluted. Table 1 presents the intermediate results obtained from the application of the Water Quality Index method. The Unit Weights (W n ), the constant of proportionality (K), the ideal values (V id ) have the same values for all three locations -D1, D4 and D7. The Quality rating (q n ) was calculated with Eq. (3) for each parameter. The last stage of the method consists in calculating WQI using Eq. (1).
The obtained values for the water quality index corresponding to the three locations are presented in Figure 3.
According to the diagram from Figure 3, during the time interval November 2016-June 2017, the WQI values for the Danube River water were found in the , the water quality index shows that the water was not suitable for consumption and cannot be transformed into drinking water by any process. However, by the end of the monitoring time interval (December 2017) the water quality was good or excellent.
According to our previous work [15,16],   indicators have improved. This improvement is found in the low values of WQI. The substantial improvement in water quality that occurred is due to the actions taken by the organizations responsible for environmental protection. Figure 4 shows the boxplot diagram representing the main values of WQI in the 3 chosen locations. The average values of WQI are influenced by the extreme values. According to the 3rd quartile (Q 3 ) and the median of the upper half of the data set, 75% of the values in the data set lie below Q 3 . The high average and median values, the values of the 3rd quartile frame depict waters as having severe pollution.
The value of the third quartile indicates that 75% of the determined values of WQI fall into the category of highly polluted waters. Based on Figure 3, only 25% (the 1st quartile -Q 1 ) of the values of the WQI lie below low values that would classify the studied water into the category of unpolluted waters. The information obtained from the WQI calculation was particularly useful in order to analyze how the overall water quality has evolved over time.
An easy method to identify possible sources of pollution is to calculate the correlations between the measured parameters. Using a Pearson Correlation Matrix [15] there was a strong positive linear correlation between TDS and Salinity (r = 0.9394) and TDS and Electrical Conductivity EC (r = 0.9174). Significant correlations also existed between the nitrites concentration and pH and between the nitrates concentration and pH there was a moderate negative corelation (r = À0.65 and À 0.68 respectively).
To identify possible sources of pollution, the Pearson correlation matrix was computed between WQI and a series of measured parameters ( Table 2).
In the absence of dedicated statistical software, the correlation coefficients can also be determined using free tabular software tool. We could exemplify quite easily this technique, for the Pearson coefficient between WQI and CCOCr, for the first location D1 ( Table 3). Table 3 shows the values of Pearson Correlation Coefficient (r) and coefficient of determination (r 2 ) for the water quality data set. The major influence of several parameters on the high values of WQI is due to strong positive correlation values. Therefore, excessive pollution was likely due to the presence of high concentrations of chlorides, nitrates, nitrites, ammonium, sulphates, lead, cadmium, iron, zinc.
The values of coefficient of determination (r 2 ) indicate that 89% of the variance of WQI is explained by the chlorides and cadmium concentrations while 87% is due to effect of iron and zinc. Nitrates concentration in the Danube River water explains 86% variation of the WQI. The high levels of covariance explained by the three groupings suggest significant co-linearity among the nutrient groups.
The strength and direction of monotonic association between water quality variables can be highlighted by the Spearman correlation. Table 4 shows the   Spearman coefficients between the WQI and the water quality indicators that were measured. Table 4 shows the high values obtained for WQI are associated with the high values obtained for chlorides, nitrates, nitrites, ammonium, total nitrogen, sulphates, lead, cadmium, iron, zinc. The association between these variables would be considered statistically significant. Table 5 exemplifies such a calculation for the correlation coefficient between WQI and CCO-Cr.
For chlorides, nitrates, nitrites, ammonium, sulphates, lead, cadmium, iron, zinc, total nitrogen, the values of p coefficient are less than 0.001 (i.e., highly significant with confidence greater than 99.99%). For pH and DO, p < 0.01 means the statistical links are significant and the confidence is 99%.
Once the correlations between pollutants and WQI are identified, the sources of pollution can be established or the related process.

Conclusions
This chapter highlighted the importance of using statistical methods to display the water quality condition, using WQI evaluation and Pearson and Spearman correlations.
In order to exemplify the statistical methods, we have used a series of data from our previous work, consisting of 13 parameters measured for water samples taken from the Danube River, from Galati City area, Romania. Statistical correlations were made between quality parameters and Water Quality Index; thus, it was possible to identify which are the pollutants that determined an advanced degree of water pollution. The excessive pollution which occurred during the time interval November 2016-June 2017 is due to the presence of high concentrations of chlorides, nitrates, nitrites, ammonium, sulphates, lead, cadmium, iron and zinc. In recent times there are many statistical software for water quality analysis. If we do not have, for various reasons such programs, the statistical approach can be done classically. Water Quality Index (WQI) provides information on the overall quality of the water, while the correlation coefficients may indicate the parameters that influenced the changes in water quality.