The evaluation of the quality of water bodies is of fundamental importance to the study and use of water. Aiming to improve the understanding of the phenomena which occur in these environments, several indices have been proposed over the years, using several statistical, mathematical and computational techniques. For this, it is necessary to know the variables which influence different water bodies. However, not all places are able to make the most diverse analyses due to the financial and sanitary conditions, which can promote greater expenses in treatment as well as make the limits of tolerance of the water quality higher. Nowadays, there is a need to formulate indices which can address climate change in its variables, making it even closer to reality. In this context, seeking to reduce the number of variables used, collection costs, laboratory analyses and a greater representativeness of the indices, multivariate statistical techniques and artificial intelligence are being increasingly used and obtaining expressive results. These advances contribute to the improvement of water quality indices, thus seeking to obtain one which portrays the various phenomena which occur in water bodies in a more rapid and coherent way with the reality and social context of water resources.
- water quality index
- statistical techniques
- machine learning
- artificial intelligence
- environmental monitoring
Several quality indices are proposed for evaluation and definition of different uses of water. These proposals are interesting so that the selection criteria of the parameters are more effective and portray the true environmental state of the water body. In addition, other important factors in this selection are the availability of analysis of variables by laboratories (if they have equipment, reagents and staff for such analysis), the financial question of the region (poorer regions have more flexible tolerance limits, whereas that water treatment is not as efficient), collection logistics and representativeness for an audience which often does not have the ability to interpret the analysis variables, thus requiring the use of quality indices.
These indices ought to be elaborated quickly, simply and objectively. In this way, the use of water quality indices (WQI) is fundamental to represent a large number of parameters in a single number. However, for this number to portray the reality of a water body, the correct selection of environmental parameters is essential.
This selection of environmental parameters ought to be fundamental not only for the elaboration of indices, but also for the monitoring of water resources. This monitoring has been understood as the definition of strategies to mitigate environmental problems which can guarantee sustainable development. A water quality index inserted in the context of monitoring also makes it easier to reproduce the information for those involved in both management and those who make direct use of water.
In this context, the modeling and construction of a single index which frames the water bodies, whether these are lentic or lotic, involve several factors which combined make difficult the existence of this uniqueness. The morphoclimatic diversity of each region of the world, climatic changes and innumerable anthropogenic activities, as well as local social and economic development are preponderant factors for different classifications of water quality and, consequently, several indices. In order to face these adversities, multivariate statistical techniques and artificial intelligence play a fundamental role, facilitating the framing and understanding in the search for a globally accepted index.
In this sense, in order to reduce the number of environmental parameters to be measured, the costs involved in collecting the water samples and the laboratory analyzes, in addition to seeking greater representativity of the indexes, multivariate statistical techniques and artificial intelligence have been used and the results have been significant and promising. Such techniques allow greater agreement between the variables most suitable for the formulation of an index and ought to be part of an environmental monitoring program.
2.1. Method of selection of variables
The water quality index (WQI) was initially proposed by Horton  as a linear summation function. This index consisted of a weighted sum of subscripts, divided by the sum of the weights multiplied by two coefficients related to temperature and the pollution of a watercourse. Horton , in his work, used as criteria of choice the variables most used in the analysis of a water body in a total of 10, making the application of the index more practical. These variables should represent all the water bodies in the country and should reflect the availability of the data, in order to obtain the smallest deviation among them [1, 2, 3, 4, 5].
In order to construct an index, four steps are taken: parameter selection, obtaining subindices; establishment of weights; use of aggregation functions. A criteria was developed based on the existing indices, as presented in Table 1. Although indices are not expected to meet all these criteria, different weights can be attributed to each one, depending on the use, region, climate and water body. Thus, with the definition of weights, the criteria can be used to elaborate the index as comprehensively as possible [4, 6, 7].
|1||Relative ease of application|
|2||Balance between necessary technical complexity and simplicity|
|3||Present an understanding of the significance of the presented data|
|4||Include widely used and routinely measured variables|
|5||Include variables that have clear effects on aquatic life, recreational use, or both|
|6||Inclusion of toxic variables|
|7||Easy introduction of new variables|
|8||Based on recommended limits and water quality standards|
|9||Developed with a logical scientific reasoning or procedures|
|10||Tested in a number of geographic areas|
|11||Present agreement with expert opinion|
|12||Demonstrate compliance with biological water quality measures|
|15||Possessing statistical property which allows probabilistic interpretations|
|16||Avoid the eclipsing effect|
|17||Present sensitivity to small changes in water quality|
|18||Present tendencies over time aiming for applicability for comparisons of different locations and for communication with the public|
|19||To present ways of dealing with the absence of data|
|20||Clearly explain the limitations of the index|
2.2. Main indices
Based on the index proposed by Horton , Brown et al.  proposed the best-known and most widely used index in the world, i.e. the National Sanitation Foundation’s Water Quality Index (WQI-NSF). This index can be used to define water quality for different uses such as irrigation, water supply and navigation, as well as for various water bodies (lakes, reservoirs and rivers). In this index, nine parameters were used according to the criteria presented in Table 1: temperature, pH, turbidity, phosphate, nitrate, total solids, dissolved oxygen (OD), biochemical oxygen demand (BOD) and fecal coliforms [8, 9, 10, 11]. The WQI-NSF is calculated based on weights assigned to each parameter, according to a statistical survey conducted using the DELPHI technique, elaborated by 142 experts. The weights of each parameter are shown in Table 2.
In 1973, The Engineering Division of the Scottish Research Development Department began a research into the development of the Scottish Water Quality Index (WQI-SCO). Using the Delphi technique, and based on the WQI-NSF, 10 parameters were selected for the WQI-SCO with their respective weights: OD (0.18); BOD (0.15), free ammonia (0.12); pH (0.09); total oxidized nitrogen (0.08); phosphate (0.08); suspended solids (0.07); temperature (0.05); conductivity (0.06) and Escherichia coli (0.12). The sum of the weights at this index must be equal to 1. Two forms of calculation were then tested: the arithmetic form (Eq. 1) and the geometric form (Eq. 2), the second one being more efficient and less biased to high quality indices, and therefore more used. [9, 10, 11, 12]
Si corresponds to the parameter and wi to its weight.
whereupon: n – number of parameters; Ci – value of the subscripts after normalization; Pi – assigned weight to each parameter.
The index proposed by Bascaron  presents normalization of the parameters seeking to balance the influence of each one on the final value of the index. In addition, it is a very flexible index in addition of other parameters [9, 13, 14, 15].
In order to represent the trophic status of a reservoir through WQI, Steinhart et al.  developed the Environmental Quality Index (EQI) for the Great Lakes of North America. This index was elaborated to collaborate with the multimillion-dollar cleaning projects of these lakes developed in the 1970s. The authors sought to evaluate nine physical-chemical, biological and toxicity variables. These variables were based on specific electrical conductivity, concentration of chlorine, pollutants with specific characteristics of odor and color, organic and inorganic toxic contaminants. The obtained data were converted into subscripts through mathematical functions which included national and international tolerance limits. In addition, these sub-indices were multiplied by weights assigned to each type of variable (0.1 for chemical (C), physical (P) and biological (B) parameters and 0.15 for toxic (T) substances). The quality assignment range varies from bad (0) to optimal (100). Each index, then, has its corresponding symbology indicating which parameter was more problematic, for example, a 70C1P1 assessment means that a chemical parameter and a physicist are outside the limits stipulated by the legislation, even the quality attribution being considered good [9, 6].
In 1970, in Oregon, Canada, the O-WQI index was developed. This index was modified by Dunnette  for the evaluation of water in fishing regions in icy waters, and therefore very sensitive to high temperatures. This index became more used after Cude  re-evaluated it and added total phosphorus as an indication of the risk of eutrophication of the water body in Oregon [17, 18].
Cude , by proposing modifications, stated that the use of weights for the parameters which compose a WQI is only justifiable when used for evaluation of the water body for single use. Therefore, this author eliminated the weights of the variables and used a non-harmonic root equation. The parameters selected were DO, Fecal coliforms, pH, nitrate + ammonia, total solids, BOD, total phosphorus and temperature. The evaluation ranges defined for the index classified the water in excellent (90–100), good (85–89), acceptable (80–84), bad (60–79) and poor (10–59) [4, 14, 17, 18, 19, 20, 21].
The O-WQI index has been used in several parts of the world after the modifications proposed by Cude . The calculation of the harmonic mean by the square root, Eq. (4), although it is more sensitive to the variations of the parameters, sometimes it can present ambiguity, as for example, the individual variables have good quality values and the average ones do not corroborate for this result of quality [4, 18, 19, 20, 21].
whereupon Si – the value of the subscript corresponding to the variable under analysis; n – number of parameters.
Also in relation to Canada, another well-accepted and used index (WQI-CCME) is proposed by the Canadian Council of Ministers of the Environment . The objective was to manage the quality of water by the water treatment and distribution agencies in the country. This is an index which allows a flexibility in the alteration of the variables, being suitable for the most diverse uses and morphoclimatic characteristics of the hydrographic basin under analysis. In addition, there is no need to evaluate subindices, nor weights for each variable. The aggregation model consists of a few steps. First, the variables must be standardized and three factors are determined (F1, F2, F3), where F1 refers to how many parameters that are not within the quality standards (Eq. (5)), F2 is the percentage of samples which has one or more non-standard parameters (Eq. (6)), F3 is calculated in three steps: the number of times the individual concentration of a parameter is outside the limit allowed by the law, called the excursion calculation (Eqs. (7) and (8)); after the calculation of the excursion, the sum of the excursion values is divided by the total number of tests (Eq. (9)); then F3 is calculated through Eq. (10). It should be noted that Eqs. (7) and (8) for the determination of the excursion are for concentrations of the parameters above and below the limits, respectively. Thus, to calculate the value of the WQI-CCME, we use Eq. (11).
The WQI calculation can be developed by several types of mathematical and statistical techniques. The indices, when created, tend to meet the needs of the region, the use of water and the parameters which can be measured. These criteria may bring some deviations in the calculation and, consequently, in the interpretation of the results. Smith  states that using, for example, a multiplicative WQI calculation method can lead the index to a low score if one of the variables has its individual score low. Thus, Smith  used an index based on a minimum operator in the aggregation of the subscript, according to Eq. (12).
where upon: I, the sub-index of the ith parameter.
However, this type of WQI calculation ends up poorly reflecting the average quality since only one index will be attributed to the WQI, occurring the eclipsing phenomenon of the other variables. This index was adopted in New Zealand as a way of legislating and disseminating information about water quality [9, 14, 19, 21, 24].
The evaluation of WQI and its various forms of aggregation, the most commonly used being the harmonic mean, geometric, arithmetic and minimum function method mentioned above, should also take into account the financial question. The financial reality of the country, therefore, affects the choice of WQI. This factor must be observed, as this may reflect the amount of resources which will be allocated so that certain parameters can be better monitored. Methods such as the minimum function and the arithmetic method can eclipse and bring ambiguity to the final result and the financial investments destined to the improvement of certain parameters end up not bringing the desired solution. Experts affirm that there is a need for a better evaluation regarding the choice of the aggregation method and the weight criteria of the environmental variables.
Biological, limnological and ecological studies have been increasing in the last 40 years, and a revision of the proposed indices is necessary for either new variables to be included or their weights changed. In addition, there is a lack of indices which can better portray the impacts of climate change on the water body, as reported by Alves et al.  and that the improvement of the water quality indices would be fundamental. Alves et al.  point to the need for the development of indices that best portray tropical conditions such as those located in Brazil, since most of them were developed for temperate regions [4, 7, 18, 25, 26].
2.3. WQI for reservoirs
In Brazil, in 1975, the Environmental Company of the State of São Paulo (CETESB) started using the geometric quality index proposed by NSF (Eq. (2)), as shown in Figure 1. A number of studies developed in the country are based on this index [27, 28, 29, 30]. In addition, a specific index for reservoirs was developed in Brazil and is widely accepted by the scientific community and the National Water Agency (ANA), which is the Water Quality Index for Reservoirs (IQAR), developed by the Environmental Institute of Paraná (IAP) [31, 32, 33].
The Environmental Institute of Paraná (IAP) seeking an evaluation of the water quality in reservoirs created this IQAR. The selected variables were dissolved oxygen deficit, total inorganic nitrogen, total phosphorus, chemical oxygen demand, transparency, chlorophyll a, weather, average depth and cyanobacteria. It should be noted that the phytoplankton community (diversity, algal bloom and amount of cyanobacteria) was included in the matrix through the concentrations of chlorophyll a and cyanobacteria, due to their ecological importance in lentic ecosystems, however, this parameter received a different treatment. The developed matrix presents six classes of water quality, which were established from the calculation of the percentiles of 10, 25, 50, 75 and 90% of each of the most relevant variables and the selected parameters had weights assigned, as shown on Tables 3 and 4 .
|Variables “i”||Class I||Class II||Class III||Class IV||Class V||Class VI|
|Deficit of oxygen (%)||≤5||6–20||21–35||36–50||51–70||>70|
|Total phosphorus (mg L−1)||≤0.010||0.011–0.025||0.026–0.040||0.041–0.085||0.086–0.210||>0.210|
|Total inorganic nitrogen (mg L−1)||≤0.15||0.16–0.25||0.26–0.60||0.61–2.00||2.00–5.00||>5.00|
|Chlorophyll-a (mg m−3)||≤1.5||1.5–3.0||3.1–5.0||5.1–10.0||11.0–32.0||>32|
|Depth Secchi (m)||≥3||3–2.3||2.2–1.2||1.1–0.6||0.5–0.3||<0.3|
|COD (mg L−1)||≤3||3–5||6–8||9–14||15–30||>30|
|Residence time (days)||≤10||11–40||41–120||121–365||366–550||>550|
|Mean depth (m)||≥35||34–15||14–7||6–3.1||3–1.1||<1|
|Phytoplankton (flowering)||No flowering||Rare||Occasional||Frequent||Frequent/permanent||Permanent|
The water quality class to which each reservoir belongs is calculated through the Water Quality Index of Reservoirs (IQAR), according to Eq. (13).
whereupon IQAR—Water Quality Index of Reservoirs; qi—water quality class in relation to the variable “i,” which can range from 1 to 6; wi—calculated weights for the variables “i.”
A partial IQAR is calculated from the data collected for each monitoring campaign. Then, the arithmetic mean of two or more partial indices are calculated to obtain the final IQAR and to define the quality classification of each reservoir. A reservoir is said to be impacted to much impacted if the IQAR varies from 0 to 1.5; the reservoir is poorly degraded when IQARs are between 1.51 and 2.5; moderately degraded when the IQAR values are between 2.51 and 3.5; the reservoir is considered to be degraded to polluted between 3.51 and 4.5; the reservoir is in the most polluted condition when IQAR has values between 4.51 and 5.5; and finally the reservoir is said to be extremely polluted when the IQA is equal to or above 5.51 .
Reservoirs are lentic environments that suffer greatly from the variation of temperature and oxygen by depth, leading to the appearance of several problems such as the process of eutrophication and anoxia in the deep parts, especially if this reservoir has organic matter in high concentrations and metals in its sediments. In this context, Lee et al.  sought to develop a WQI for reservoirs and lakes in South Korea, selecting parameters which would contribute to the solution of these problems and to describe the geology, climate and morphology of more than 70 lakes and reservoirs evaluated. The Water Quality Index for Lakes (LQI) was developed after the identification of the interrelationship of the parameters through the principal component analysis (PCA), and then four environmental parameters were selected: total organic carbon, chlorophyll a, total phosphorus and turbidity. Through the logistic regression, using a sigmoidal function for each parameter, the average LQI of the water body is calculated (Table 5). The LQI ranges from 0 to 100, in which 0 corresponds to a reservoir classified as poor quality and 100 as optimal quality .
|LQI equations for each parameter||Unit|
Azevedo Lopes et al. , seeking to evaluate the quality of water for recreational purposes, proposed the Index of Conditions for Bath (ICB) to evaluate water bodies in Brazil, using the Delphi statistical technique to define the index composition variables: E. coli, density of cyanobacteria, visual clarity and pH. The authors used the minimal operator method, as originally proposed by Smith . These authors affirm that the minimal operator method is effective for the definition of an index of quality and use of water bodies for recreational purposes, since it is enough that one of the variables present a low score in order to define the quality of the water body [24, 36, 37].
2.4. WQI through fuzzy logic and recent statistical techniques
Searching for improvement and accuracy in water quality indices, some computational techniques, such as fuzzy logic and neural networks, have been used. The fuzzy logic presented by Zadeh  is widely used, mainly, in the environmental area in the issue of water quality, because it can avoid the ambiguity and the eclipsing effect of the variables. Several authors apply the fuzzy understanding to present the WQI with the smallest deviation and compare with the WQIs already developed. The selection of the variables does not obey a rigid criterion, and these can be chosen by the knowledge of the water bodies and the monitoring programs. Once the variables have been defined, they will be the input to the fuzzy system. The fuzzy inference system (FIS) consists then in the process of transforming these quantitative values into qualitative values (fuzzification) applying pertinence functions and established rules for the interaction between the parameters. Then, the inverse process (defuzzification) transforms these qualitative values into numerical ones (output). The accuracy of the use of fuzzy logic is related to the rules of interaction and the correct selection of parameters, which is a crucial step for the success of the index development [14, 20, 32, 38, 39, 40, 41, 42].
In Brazil, several pieces of research using the fuzzy logic for the development of water quality indices for reservoirs and rivers in the states of Rio de Janeiro, São Paulo, Sergipe and others have been developed over the years obtaining expressive results for WQI [32, 40, 41, 42].
In addition to the use of fuzzy logic, several indices have been proposed applying linear, nonlinear, multilinear regressions, principal component analysis (PCA) to identify weights, establish aggregation models and define variables which interfere with water body quality. These techniques for the development of water quality indices (WQI) are very recent in the area, showing at the same time a delayed insertion of this type of analysis in the improvement of WQIs, since many of the known indices were constructed using the statistical technique Delphi. The use of more sophisticated statistical techniques to reduce the size of the variables and to obtain more specific results in each analysis may be a way in the search for a single WQI which is possibly used worldwide. Although some authors defend this path, others believe that the need for an accepted model worldwide should be much more careful, especially regarding the use of the water body and the financial conditions of the country [9, 31, 43, 44, 45].
2.5. Future perspectives
The creation of new WQIs has been repetitive in terms of the aggregation models and the choice of parameters and, thus, basically without development of an index that is more adequate regarding the definition of the use and general applications. The use of less rigid, more objective formulas with flexibility in the choice of variables would contribute to the search for an accepted and efficient index worldwide. The construction of a more precise index is influenced by several factors, not only in terms of environmental parameters, but also in relation to the various aspects of environmental management. Thus, a method which would provide a kind of algorithm for the user to select the most coherent environmental variables would be very important. This selection should also be linked to the probable types of contamination of the water body in question, the financial aspects of the process for measuring these variables and the definition of the technical staff. Based on these considerations would be identified the use of water according to each legislation as well as the search for the appropriate water treatment would also be more efficient. In addition, the transmission of information would be faster between managers and society [4, 9, 19].
An important observation is the use of the WQI for agriculture which is still at an early stage, although works such as Stoner  present the division of the calculation for specific use of irrigation and public supply. Meireles et al.  used factor analysis (FA) and PCA to identify the variables which would most interfere with the amount of sodium in the soil, salinity and toxicity in the plants, in the Acaraú basin, Ceará. Out of 13 parameters evaluated by the authors, the ones which showed greater weight in the analysis were electrical conductivity, sodium, bicarbonate and Sodium Adsorption Index (SARo). The WQI was calculated by the sum of the quality product of each index multiplied by its respective weight and classified into five classes (Table 6) [9, 46, 47, 48].
|WQI||Restrictions on water use||Recommendation|
|85–100||No restrictions (NR)||It can be used for most soils with low probability of sodification and salinization||No risk of toxicity to most plants|
|70–85||Low restriction (LR)||Recommended use for irrigated soils of fine texture or moderate permeability, being recommended the leaching of the salts. Salinization of thicker textured soils may occur and it is recommended to avoid use in soils with clay levels 2:1.||Avoid use in plants with salt sensitivity|
|55–70||Moderate restriction (MR)||It can be used in soils with high or moderate permeability, suggesting moderate salt leaching.||Plants with moderate salt tolerance still grow|
|40–55||High restriction (HR)||Can be used on soils with high permeability without layers of compaction. Programmed high-frequency irrigations can be adopted in water with electroconductivity above 2000 dS m−1 and SAR above 7.||It should be used to irrigate plants with moderate to high salt tolerance with special salinity control practices, except waters with low levels of Na, Cl and HCO3|
|0–40||Severe restriction (SR)||Use for irrigation under normal conditions should be avoided. In special cases, it can be used occasionally. Soils with high salt water content have high permeability and excess water must be used to avoid accumulation of salts.||Only plants with high salt tolerance, except water with low levels of Na, Cl and HCO3.|
Zahedi  sought to compare the index proposed by Meireles et al.  with the one developed in his work, through several statistical techniques, and to observe possible conflicts between the use of water from the wells for human supply and irrigation. This analysis aimed to identify water quality for both uses through WQIs. Much still has to be done to analyze conflicts over water use, but the indices can be a great support solution to these conflicts [46, 47, 49, 50].
Thus, the present work contextualized the importance of the introduction of indices related to traditional mathematical techniques alongside the most modern techniques such as fuzzy logic (FL), neural networks (NN) and machine learning (ML). Water management is an area of human knowledge which involves not only social and economic aspects, but also involves the visualization of using analytical technology to obtain environmental data and advanced computational technology, as presented on Figure 2.
Environmental management models are composed of tools which evaluate natural phenomena and relate them to quantitative and qualitative values for decision making. Knowledge about the different parameters which characterize an environmentally degraded area is now one of the pillars of sustainable development and it is also a way of maintaining the sovereignty of a state in relation to the monitoring of water resources.
The evaluation of water quality by indices with several types of aggregation functions has been applied for many years. However, the use of these indices should always be linked to the area in which the water body is influenced, the climate of the region, the geology, the activities developed around it and its use by the population, so that it is possible to identify which physical, chemical and biological variables to be measured.
The type of the water body, lentic or lotic, will also influence this selection of variables for the construction of the index and another important factor which must be inserted in this one concerns the impacts of climate change. In this context, in order to meet these objectives, it is necessary to develop indices which are less influenced by eclipsing and ambiguity effects, requiring the use of better aggregation functions and statistics, as suggested by several authors.
In this sense, multivariate statistical techniques, regressions, machine learning and artificial intelligence, are now of relevance and importance for the creation of indices which encompass the most diverse variables for each specific use of water. In addition, this creation will facilitate decision making and communication between environmental managers and society. It is important to emphasize that such methods would allow a greater flexibility in the selection of variables, making water quality indexes more assertive, efficient, and easier to understand and manipulate.