Open access peer-reviewed chapter

Interpretation of Water Quality Data in uMngeni Basin (South Africa) Using Multivariate Techniques

Written By

Innocent Rangeti and Bloodless Dzwairo

Submitted: 22 July 2020 Reviewed: 29 October 2020 Published: 16 March 2021

DOI: 10.5772/intechopen.94845

From the Edited Volume

River Basin Management - Sustainability Issues and Planning Strategies

Edited by José Simão Antunes Do Carmo

Chapter metrics overview

488 Chapter Downloads

View Full Metrics


The major challenge with regular water quality monitoring programmes is making sense of the large and complex physico-chemical data-sets that are generated in a comparatively short period of time. Consequentially, this presents difficulties for water management practitioners who are expected to make informed decisions based on information extracted from the large data-sets. In addition, the nonlinear nature of water quality data-sets often makes it difficult to interpret the spatio-temporal variations. These reasons necessitated the need for effective methods of interpreting water quality results and drawing meaningful conclusions. Hence, this study applied multivariate techniques, namely Cluster Analysis and Principal Component Analysis, to interpret eight-year (2005–2012) water quality data that was generated from a monitoring exercise at six stations in uMngeni Basin, South Africa. The principal components extracted with eigenvalues of greater than 1 were interpreted while considering the pollution issues in the basin. These extracted components explain 67–76% of the water quality variation among the stations. The derived significant parameters suggest that uMngeni Basin was mainly affected by the catchment’s geological processes, surface runoff, domestic sewage effluent, seasonal variation and agricultural waste. Cluster Analysis grouped the sampling six stations into two clusters namely heavy (B) or low (A), based on the degree of pollution. Cluster A mainly consists of water sampling stations that were located in the outflow of the dam (NDO, IDO, MDO and NDI) and its water can be described as of fairly good quality due to dam retention and attenuation effects. Cluster B mainly consist of dam inflow water sampling stations (MDI and IDI), which can be described as polluted if compared to cluster A. The poor quality water observed at Cluster B sampling stations could be attributed to natural and anthropogenic activities through point source and runoff. The findings could assist in determining an appropriate set of water quality parameters that would indicate variation of water quality in the basin, with minimum loss of information. It is, therefore, recommended that this approach be used to assist decision-makers regarding strategies for minimising catchment pollution.


  • cluster analysis
  • multivariate technique
  • principal component analysis
  • uMngeni basin
  • water quality

1. Introduction

Water pollution is a global challenge undermining economic growth, health of millions of people as well as the physical status of the environment in both developed and developing countries. The current global water scarcity challenge is not only related to inadequacy in terms of quantity but also related to the progressive deterioration of quality making water unfit for some given uses such as potability. The deterioration of water quality is attributed to both natural (precipitation rate, weathering processes, soil erosion, etc) and anthropogenic (urban, industrial, agricultural activities, etc) factors. Seasonal variations in precipitation, surface run-off, ground water flow, interception and abstraction strongly affect the river discharge and the concentrations of water pollutants in a basin [1]. The effect of contaminant on water depends upon the characteristics of the water itself as well as quantity and characteristics of the contaminant.

Water pollutants, which are usually introduced through surface runoff or direct discharge, may in higher concentration result in rivers failing to provide adequate attenuation of pollutants, resulting in catchments failing to meet minimum compliance of quality for various uses such as potable water production. Furthermore, water quality deterioration is often a slow process not readily noticeable due attenuation effects until an apparent change occurs. Such situations are being exacerbated by the rapid increase in the demand for freshwater in many countries including South Africa. In view of the limited quantity of freshwater resources worldwide and the effect of anthropogenic activities, protection of these resources has become a priority [2, 3, 4]. It has, therefore, become imperative to monitor the quality of water in freshwater systems in order to prevent its further deterioration and thus ultimately ensure its continuous availability in a quality that meets various uses including potable water production. Pollutants in water can cause acute or chronic illness in humans especially when polluted water is consumed or when sewage is used to irrigate vegetables meant for human consumption. In specific cases this has resulted in loss of lives. For example, as at 2015, the bacterium Vibrio cholerae caused between 1.3 to 4.0 million infections and 21,000 to 143,000 deaths worldwide [5].

With concern of the detrimental effects of pollution, various agencies have been monitoring the quality of raw water within the uMngeni, a 232 km river that is then treated to serve almost 3.8 million people within and around Durban and Pietermaritzburg (South Africa) with potable water [6, 7, 8, 9, 10]. The primary objectives of such monitoring exercise have been to identify water quality problems, describe the spatio-temporal water quality trends, determine fitness compliance for specific uses and develop monitoring tools such as water quality indices for enhancing information dissemination. Although such monitoring programs are crucial to a better knowledge of hydrology and pollution problems in catchments such as uMngeni Basin, they tend to produce large amounts of complicated data-sets of various water parameters. The data-sets are often difficult to analyse and extract meaningful information and this makes it difficult to keep the public informed, who are the custodian of the resource. By keeping the public updated, it makes them more participatory in policy formulation and decision making regarding protection of the water resource [11, 12].

The classification and interpretation of monitoring stations are the most important steps in the assessment of water quality. Numerous studies have confirmed multivariate statistical techniques (cluster analysis, principal component analysis, factor analysis and discriminant analysis) as excellent tools for exploring and presenting the bulk and complex water quality data-sets [13, 14, 15]. These techniques allow for the determination of spatio-temporal water quality variability, classification of sampling stations and the identification of pollution sources [15, 16, 17, 18]. Furthermore, by eliminating subjective assumptions, multivariate techniques tend to reduce biasness when selecting parameters for developing tools such as water quality index. This ultimately assists in improving the accuracy of such monitoring tool. Nevertheless, the selection of a multivariate technique to apply depends on the nature of data-set and research objectives. While there are a number of multivariate techniques, studies have extensively applied the Principal Component Analysis (PCA) and Cluster Analysis (CA) due to their suitability in extracting information on various situations [15, 19, 20, 21, 22].

The application of principal component analysis (PCA) for the interpretation of a large and complex volume of data offers a better understanding of water quality, the ecological status of the basin being studied, while also allowing for the identification of possible factors/sources that influence the surface water systems [16]. Principal Component Analysis (PCA) aims to find combinations for certain variables to determine indices which describe the variation in the data while retaining as much information as possible. This reduction is achieved by transforming original variables into a new set of variables, known as principal components (PCs). These PCs, which are uncorrelated with the first few, retain most of the variation present in the original variables. The PCA technique transforms original variables into new uncorrelated variable known as principal components (PCs) [23, 24, 25]. The derived few variables can be used to provide a meaningful description of the entire data-set with a minimal loss of original information. The eigenvalues indicate the significance of each PC and a greater value, indicating the parameter’s importance [26]. Correlation of PCs and original variables are given by the loadings [27]. While loadings reflect the relative importance of a variable within the component, it should be highlighted that these values does not show the importance of the component itself [28].

The PCA has been successfully applied on hydrogeological and hydrogeochemical studies. The application of PCA by Razmkhah, Abrishamchi [29] distinguished the anthropogenic and natural polluting activities along Jairood River in Iran. The results identified 5 factors which explained 85% the variation in water quality. Mazlum, Ozer [30] applied the PCA to determine factors causing water quality variability along a tributary, Porsuk, in Turkey. The study identified four PCs which explained 70% of the total water quality variance. The factors were related to the discharge of domestic wastewater, nitrification, industrial wastewater and the seasonal effect. Haag and Westrich [31] applied PCA to analyse the water quality along Neckar River in Germany based on ten parameters monitored from 1993 to 1998. The four principal components extracted accounting for 72% of total variance were interpreted as; (i) dilution by high discharge (ii) biological activity, (iii) seasonal effects and (iv) wastewater impact [31].

The limitations with the PCA technique include ignoring the degree of data dispersion as well as a weakness in processing nonlinear data. The result of PCA can also be influenced by uneven sampling interval, missing values or observations below detection limits of analytical methods, which can be changed during the data collection period. It is thus important to treat water quality data before modelling in order to improve the accuracy [32].

On the other hand, cluster analysis (CA) is an unsupervised pattern recognition multivariate technique which group objects (e.g., water quality variables) based on either their similarities or dissimilarities [33, 34]. Its objective is to sort cases into groups or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. Most studies have applied the hierarchical clustering (HC) technique to sequentially category objects [13, 15, 34, 35]. Based on the hierarchical CA, water quality characteristics of each sampling location can be classified depending on pollution level. The results of a HC analysis are displayed graphically using a tree diagram commonly known as a dendrogram [13, 14, 36, 37]. The technique firstly groups the objects according to similarity. These groups are further merged according to their similarities or dissimilarities and eventually merge into a single cluster as the similarity among the subgroups decreases. The cluster analysis approach offers a reliable classification of surface water making it possible to design a future spatial sampling strategy that is cost-effective, with reduced number of sampling sites without losing any significant information [38].


2. Study area and water monitoring stations

uMngeni Basin, the study area, is situated in KwaZulu-Natal (KZN) Province, which lies along the eastern seaboard of the Republic of South Africa (Figure 1). uMngeni River (the main river in the basin), at 232 km long, is the primary source of raw water, which is then treated to serve a population of almost 3.8 million (as at 2013), in and around Durban metro as well as the city of Pietermaritzburg (PMB).

Figure 1.

uMngeni Basin, KZN Province, Drakensberg Mountains and South Africa.

Key activities that generate point and non-point pollution within the catchment include agriculture and animal faming while concentrated urban settlements provide a variety of supportive economic activities that generate solid and liquid waste. A consequence of the concentrated development in the catchment area has been the high levels of pollutants entering the water system, which are eventually flushed out to sea. The basin receives much of its rain in summer, with occasional snow falls in some of its high lying areas such as the Drakensberg Mountain [39]. The geology of uMngeni Basin varies from basalts, granites, sandstones, shale and tillites [40]. About half of uMngeni Basin sits on top of the Karoo in the KZN part of the Drakensberg Mountains. The other potion extends east on top of the South African Coastal Plate (Figure 2).

Figure 2.

Human settlements dominate from the central parts of the basin (PMB and its periphery) up to the Indian Ocean.

The 2009 Landuse map (Figure 2) indicates that there are mixed activities, where cultivation and plantations are located predominantly from central to the north-west of the basin.

2.1 Water quality monitoring points considered

Six water quality monitoring points shown in Figure 2, namely Midmar Dam Inflow (MDI) (Upstream point), Midmar Dam Outflow (MDO), Nagle Dam Inflow (NDI), Nagle Dam Outflow (NDO), Inanda Dam Inflow (IDI) and Inanda Dam Outflow (IDO) (Downstream point), were considered in this study. The dam inflow stations were assumed to give a reflection of the pollution activities along the river course while the dam outflow stations were expected to depict the dilution and retention effects.

2.1.1 Methods and materials

Multivariate statistical methods have been widely applied in environmental data reduction and interpretation of multi-constituent chemical and physical biological measurements. These techniques have been applied to identify factors that influence water systems, to assist in reliable water resource management as well as determine rapid solutions for pollution problems [16, 41]. This study applied PCA and CA techniques to extract information from the raw data regarding the significant parameters influencing the variation of water quality at each of the six stations studied. The Kaiser-Meyer-Olkin (KMO), which test the sampling adequacy, was used to determine the suitability of water quality data for PCA analysis [42, 43, 44]. Kaiser [43] recommended 0.5 as a minimum (barely accepted), values between 0.7–0.8 acceptable, and values above 0.9 as depicting excellence. The current study employed the PCA technique to determine the most significant parameters that would explain the variation in water quality.

PCA is a very powerful multivariate statistical analysis technique used to reduce the dimensionality of a data set consisting multiple inter-related variables, while retaining data variability [45]. The technique extracts primary information representative of the typical characteristics of the water environment from a large amount of data and then represents it as a new set of independent variables of the principal component. PCA reduces the dimensionality of a multivariate data-set to a small number of independent principal components. Each principal component contains all the variable information, thus reducing the omission of information.

The PCA method is composed of five main operational steps, as follows:

  1. The original data matrix is shown in Eq. 1:


    where xij is the originally measured data, n represents the monitoring station, and p represents each water quality parameter.

  2. Standardising the original data with Z-score standardisation formula to eliminate the impact of dimension (Eq. 2).


    where xij is the standard variable, xj is the average value for jth indicator, and sj is the standard deviation for the jth indicator.

  3. Calculating the correlation coefficient matrix, R, with standardised data and determining the correlation between indicators (Eq. 3).


  4. Calculating the eigenvalues and eigenvectors of the correlation coefficient matrix, R, to determine the number of principal components. The eigenvalues of the correlation coefficient matrix, R, are represented by i (i = 1, 2 _ _ _ n) and their eigenvectors are ui (Ui = Ui1, Ui2, …………Uin) (i = 1, 2 _ _ _n). The value corresponds to the variance of the principal component, and the value of variance is positively correlated with the contribution rate of the principal components. Further, the cumulated contribution rate of the first m principal components should be more than 80%, which means that as explained in Eq. 4:


    The principal component is represented by Eq. 5.


    where xi is the standardised indicator variable as shown in Eq. 6:


  5. The obtained principal components are weighted and summed to obtain a comprehensive evaluation function, as shown in Eq. 7:


Principal components with an eigenvalue greater than 1 were related to the major pollution sources in uMngeni Basin. Water quality parameters with loadings of greater than 0.5 (highlighted in bold in the results tables) were regarded as significantly influencing water quality variation in uMngeni Basin.

Thereafter, cluster analysis (CA) was applied to determine the spatial similarity of the six water sampling stations studied. The hierarchy cluster analysis was employed using the Ward’s method with Euclidean distances as a measure of dissimilarity [37, 46]. The number of subgroups for analysis were determined by drawing a line across the dendrogram and examining the main clusters branching out beneath that line [47]. Determination of the subgroups for analysis was subjective based on available information regarding pollution activities the along uMngeni River.

Eight-year (2005–2012) water quality data-sets obtained from then Umgeni Water was used in this study. Since monitoring generally depends on the pollution problem at any given time and space, the number and type of parameters monitored at each of the stations varied. As the study was data-driven, a monthly median was used for in-depth analysis. The adoption of median instead of the mean was in consideration that the latter is normally influenced by the outliers which are common in water quality data-sets while the former is resistant. The period studied was determined in consideration of a criteria explained by Schertz, Alexander [48] and Lettenmaier, Conquest [49]. These studies reported that at least a five-year monthly data and two-year monthly data should be sufficient for a defensible monotonic and step-trend (abrupt shift) study, respectively.


3. Results

3.1 PCA analysis results

The Kaiser-Meyer-Olkin (KMO) results for the six stations ranged from 0.610 to 0.786 showing the fitness of the data-sets for PCA analysis. The component matrix tables shown in the different sections of the results only depict PCs with eigenvalue of greater than one (1). Only parameters with a correlation coefficient of great than 0.5 (highlighted in bold black) with its respective principal component were considered as significantly influencing water quality variability at any given station.

3.1.1 Midmar dam inflow (MDI)

Table 1 show the extracted seven PCs with eigenvalues of greater than 1 that explain 75% of the water quality variation at the Midmar Dam Inflow sampling station. While considering the high positive correlations of nutrient, metal ion and organic related parameters with component 1 (Table 2), it can be hypothesised that 20.6% of the water quality variation at this station is a result of both anthropogenic and natural processes. The nutrient and organic related parameters can be explained by piggery, dairy and maize farming activities surrounding Midmar Dam [6]. Animal manure enters surface water, both accidentally and deliberately, from households, villages, communal farms and feedlots. Without treatment, manure runoff tends to result in algae blooms which can lead to human health problems if consumed. The second component explained 15.7% (Table 1) of the total variance at MDI and showed a positive correlation to Suspended Solids (SS), iron (Fe) and turbidity (Table 2). Turbidity and suspended solids can be related to surface runoff from agricultural activities along uMngeni River whilst Iron (Fe) can be attributed to weathering processes.

Midmar dam inflow: total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %

Table 1.

Extracted values of the significant components at Midmar dam inflow (MDI).

Extraction Method: Principal Component Analysis.

Component Matrixa
Potassium (K)0.890−0.094−0.0270.1110.090−0.054−0.074
Sulphate (SO4)0.852−0.027−0.213−0.078−0.128−0.0180.072
Chloride (Cl)0.824−0.322−0.169−0.099−0.129−0.008−0.071
Total Dissolved Solid0.743−0.199−0.161−0.197−0.161−0.001−0.106
Nitrate (NO3)0.666−0.243−0.231−0.097−0.2950.049−0.014
Total Organic Carbon(TOC)0.6620.4640.012−0.0530.0740.003−0.156
Escherichia coli (E. coli)0.4400.3560.0350.431−0.1530.135−0.085
Suspended Solid0.4160.7410.0050.1720.1810.1130.319
Iron (Fe)0.4010.713−0.0840.0370.133−0.1010.152
Calcium (Ca)0.503−0.6870.3040.0910.2660.0980.045
Magnesium (Mg)0.602−0.6810.1680.0250.0890.0730.009
Sodium (Na)0.581−0.6660.1560.0100.2070.0850.059
Alkalinity (Alk)−0.036−0.6320.4510.0900.3380.1570.149
Total Phosphate0.1920.5560.415−0.092−0.0890.260−0.284
Dissolved Oxygen (DO)−0.048−0.075−.5910.4100.0170.2620.093
Temperature (Temp)0.2300.4530.460−0.3830.017−0.222−0.137
Silicon (Si)0.3470.321−0.379−0.460−0.0260.0590.121
Ammonia (NH3)0.0850.1600.470−0.089−0.3760.6190.009
Soluble Reactive Phosphate (SRP)0.3130.2500.0190.5130.131−0.223−0.599

Table 2.

The correlation among the parameters measured and the extracted significant components at MDI.

7 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.

Extraction Method: Principal Component Analysis.

Agriculture, a sector responsible for the usage of 70% of water being abstracted globally, plays a major role in water pollution [50, 51]. Runoff from agricultural activities such as agrochemicals, organic matter, drug residues, sediments and saline drainage into water bodies can lead to nutrient enrichment and eutrophication. The resultant water pollution poses a risk to aquatic ecosystems, human health and productive activities. Poor land management practises and deforestation can also explain the water quality variation at MDI. It is important for communities to practise improved land management through planting vegetation such as trees and plants to cover the ground. The negative relationship of metal ions with component 2 can be explained by the seasonality effect. Increased flow in the wet season turns to reduce the concentration of mineral salt content in a river system as a result of the dilution effect.

3.1.2 Midmar dam outflow (MDO)

The results in Table 3 indicate that the first seven principal components with eigenvalues of greater than one (1) account for 73.7% of the total variance in the water-quality data set at Midmar Dam Outflow station. Component 1 which explains 27.1% of the total variance at MDO (Table 3) is mainly influenced by parameters related to human activities (turbidity and ammonia) as well as natural geological processes (silicon and calcium) (Table 4). Silicon is part of various essential plant minerals and it is released during weathering processes. Sodium and potassium which showed a moderate positive correlation to component 5 reflects the natural processes such as weathering.

Midmar dam outflow: total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %

Table 3.

Extracted values of the significant components at Midmar dam outflow.

Extraction Method: Principal Component Analysis.

Component Matrixa
Calcium (Ca)0.906−0.1000.117−0.0760.134−0.0650.054
Magnesium (Mg)0.856−0.1310.109−0.2360.171−0.0450.039
Ammonia (NH3)0.783−0.1360.201−0.051−0.1810.072−0.036
Suspended Solids (SS)0.6790.533−0.039−0.187−0.118−0.1630.079
Sulphates (SO4)−0.6630.052−0.2670.0880.3140.1070.315
Silicon (Si)0.6130.2530.0990.3900.123−0.1640.086
Dissolved Oxygen (DO)−0.5440.2600.062−0.3680.195−0.143−0.019
Nitrate (NO3)0.519−0.3340.0680.130−0.3030.288−0.061
Total Phosphate (TP)0.2050.532−0.285−0.3320.0160.291−0.048
% NH3%−0.2470.2920.7910.0650.0970.3140.154
Total Organic Carbon (TOC)0.2120.412−0.1690.459−0.116−0.3550.306
Potassium (K)0.3990.046−0.16080.3080.605−0.105−0.160
Sodium (Na)0.278−0.3600.245−0.4210.533−0.148−0.084
Chloride (Cl)0.3840.277−0.303−0.1040.3900.2110.193
Temperature `C0.320−0.159−0.0570.4370.2090.5580.160
Escherichia coli (E. coli)−0.0310.2300.2250.3550.127−0.195−0.718

Table 4.

The correlation among the parameters measured and the extracted significant components at MDO.

7 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.

Extraction Method: Principal Component Analysis.

3.1.3 Nagle dam inflow

The PCA technique identified five components, which cumulatively explained 67.4% of the total variance at NDI (Table 5). Component 1 which explained 24.7% of the total variance is significantly affected by parameters (highlighted in bold black) which normally originate from surface runoff of agriculture areas and effluent from wastewater treatment plants (Table 6). The pollution of water bodies when practicing agriculture is mainly due to fertiliser runoff after rainfall, nutrients (such as nitrogen) that percolate through the soil and contaminated groundwater, as well as sediment that is eroded from fields and washed into watercourses during and after rainfall. While studying the limnology of South Africa’s major impoundments, Walmsely and Butty [52] described Nagle Dam as a phosphate limited oligotrophic system.

Nagle dam inflow: total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %

Table 5.

Extracted values of the significant components at Nagle dam inflow (NDI).

Extraction Method: Principal Component Analysis.

Component Matrixa
Total Phosphate0.776−0.084−0.3500.3070.068
Soluble Reactive Phosphate (SRP)0.757−0.207−0.3230.2990.080
Suspended Solids (SS)0.7410.001−0.213−0.3330.093
Total Organic Carbon (TOC)0.7210.099−0.132−0.4100.152
Escherichia coli (E. coli)0.718−0.4040.0470.2730.234
Nitrate (NO3)0.677−0.2780.337−0.076−0.251
% NH3%0.3630.844−0.137−0.0880.152
Algal count cell0.2530.3010.0930.486−0.334
Ammonia- N (NH3)−0.051−0.185−0.1060.183−0.206

Table 6.

The correlation among the parameters and the significant components at NDI.

5 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.

Extraction Method: Principal Component Analysis.

3.1.4 Nagle dam outflow

Seven significant components which explained 75% of the total variance were extracted at Nagle Dam Outflow station (Table 7). The first component which contributed 23% (Table 8) of the total variance is dominated with metal ions which can be related on natural geological processes such as weathering. The second component which explains 15.5% of the total variation is dominated by E. coli, suspended solids and turbidity (highlighted in bold black) (Table 8). These pollutants can be related to the discharge of sewage effluent and runoff from a community practicing open defecation.

Nagle dam outflow: total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %

Table 7.

Extracted values of the significant components at Nagle dam outflow.

Extraction Method: Principal Component Analysis.

Component Matrixa
% NH3%−.158.051.364.690.483−.092.164
Calcium (Ca).650−.537−.454.154.103−.007.030
Chloride (Cl).673−.154.499−.032.055.037−.281
Dissolved Oxygen (DO)−.496−.280.224−.228.016.255−.213
Escherichia coli (E. coli).330.630−.300−.071.287.227−.175
Potassium (K).707−.235−.219.259−.269−.213.031
Magnesium (Mg).774−.557−.
Sodium (Na).549−.591−.482.257.017−.040−.050
Ammonia (NH3)−.027.091−.381.117−.047.453.426
Nitrate (NO3).498.053.511.232−.337.048−.408
Silicon (Si).442.404.255−.
Sulphate (SO4).326−.363.139−.445.583.266.000
Soluble Reactive Phosphate (SRP).056.148−.166−.385.302−.597−.028
Suspended Solids (SS).312.618−.−.197
Total Organic Carbon.489.497.031.116−.047−.133−.031
Total Phosphate (TP).127.354−.303−.227.283−.347−.130

Table 8.

The correlation among the parameters and the significant components at NDO.

7 Components extracted.

Extraction Method: Principal Component Analysis.

3.1.5 Inanda dam inflow

At Inanda Dam Inflow station, seven components which explained 76% of the total variance were extracted (Table 9). Component 1 (Table 10) which is mainly metal ions and explains 11% of the total variance was comprised of metal ions (highlighted in bold black) which suggest that geological processes in the area could be significantly attributing to the water quality variation. The high positive correlation of chloride and component 1 also reflects the effect of anthropogenic pollutants on this station. Both legal and illegal effluent discharges from industrial areas such as Willowton, are the predominant pollution sources that affect Inanda Dam [53]. In developing countries, 70 percent of industrial wastes are dumped untreated into waters, impacting on the usability of the water resource [54].

Inanda dam inflow: total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %

Table 9.

Extracted values of the significant components at Inanda dam inflow.

Extraction Method: Principal Component Analysis.

Component Matrixa
% NH3%−0.1550.2690.7180.1930.209−0.185−0.170
Calcium (Ca)0.8590.215−0.059−0.2470.0300.0930.063
Chloride (Cl)0.9110.1400.089−0.0540.1220.024−0.090
Dissolved Oxygen (DO)0.313−0.0330.1280.464−0.6400.1680.017
Escherichia coli−0.2970.400−0.0550.080−0.1220.538−0.355
Flourine (F)0.2040.3070.3180.329−0.0670.236−0.319
Potassium (K).5140.496−0.065−0.122−0.012−0.2400.116
Magnesium (Mg)0.7340.1450.075−0.1610.0740.4110.176
Sodium (Na)0.8820.1740.100−0.0170.0450.009−0.006
Ammonia (NH3)0.1630.3300.1940.239−0.1950.0120.681
Nitrate (NO3)0.4930.194−0.3750.336−0.191−0.1310.144
Silicon (Si)−0.5810.2160.1780.1740.3220.1920.306
Sulphate (SO42+)0.799.3390.0880.0760.0710.0610.005
Soluble Reactive Phosphate (SRP)0.0860.553−0.4610.3960.271−0.278−0.102
Suspended Solids−.5220.569−0.048−0.382−0.2190.0060.108
Total Organic Carbon (TOC)−0.1900.5570.036−0.516−0.192−0.249−0.049
Total Phosphate (TP)−0.3030.645−0.3510.2290.227−0.142−0.158

Table 10.

The correlation among the parameters and the significant components at IDI.

7 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.

Extraction Method: Principal Component Analysis.

Given the high positive correlation coefficient of Soluble Reactive Phosphate (SRP), Suspended Solids (SS), Total Organic Carbon (TOC), Total Phosphate (TP) and turbidity on the component 2, it can be claimed that pollutants in this group which explained 9% of the variation in water quality are emanating from agricultural activities (Table 10). The negative correlation noted between dissolved oxygen and component 5 indicates deterioration in the water quality. Since component 5 also exhibited a positive correlation to temperature, it can be deduced that climatic conditions could explain the 7.7% variation noted at this station. Component 6 which is mainly influenced by E. coli suggest the effects of sewage effluent into the water system. This could be attributed to the effluent from DV Wastewater Works which treat domestic effluent from Pietermaritzburg.

3.1.6 Inanda dam outflow

At Inanda Dam Outflow station, eight components explaining 75% of the total variance were extracted as depicted in Table 11. We hypothesised that pollutants in component 1 depicted in Table 12 (11.8% and highlighted in bold black) were mainly contributed by metal ions which reflects the geology of a catchment area. The positive correlation of sulphate and component 1 reflects the effect of anthropogenic polluting activities. Component 3 (Table 12) is mainly attributable to agricultural pollutant sources due to moderate positive high correlations with turbidity, nitrate and suspended solids. It is most plausible to suggest that turbidity and suspended solids is a result of surface runoff due to rainfall. Escherichia coli (E. coli) which dominates in component 6 normally can be attributed to wastewater treatment plants in Pietermaritzburg.

Inanda dam outflow: total variance explained
ComponentInitial eigenvaluesExtraction sums of squared loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %

Table 11.

Extracted values of the significant components at Inanda dam outflow.

Extraction Method: Principal Component Analysis.

Component Matrixa
Magnesium (Mg)0.903−0.053−0.074−0.034−0.1410.0230.013−0.014
Sodium (Na)0.8810.007−0.096−0.078−0.1020.030−0.107−0.029
Calcium (Ca)0.877−0.0330.0560.108−0.1460.174−0.1890.000
Chloride (Cl)0.841−0.050−0.0310.0290.0710.1700.2570.001
Potassium (K)0.800−0.076−0.093−0.100−0.138−0.411−0.045−0.011
Sulphate (SO4)0.6120.3740.080−0.262−0.1180.0550.095−0.324
% NH3%0.1650.816−0.177−0.033−0.212−0.037−0.0110.376
Nitrate (NO3)−0.0090.1920.7210.039−0.2790.294−0.290−0.197
Suspended Solids0.3250.3470.5310.1210.317−0.1940.170−0.178
Ammonia (NH3)0.271−0.3840.430−0.071−0.0940.0110.0040.290
Total Phosphate (TP)−0.008−0.2260.106−0.636−0.0320.172−0.0510.217
Soluble Reactive Phosphate (SRP)0.017−0.3660.232−0.605−0.017−0.0430.1600.371
Flourine (F)0.469−0.136−0.003−.1030.538−0.089−0.3000.130
Dissolved Oxygen (DO)0.041−0.213−0.1760.398−0.5080.0420.0830.045
Escherichia coli (E coli)−0.0940.149−0.0260.1710.2820.8320.1390.199
Silicon (Si)0.095−0.2830.3270.458−0.161−0.1460.5240.300

Table 12.

The correlation among the parameters and the significant components extracted at IDO.

8 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.

Extraction Method: Principal Component Analysis.

3.2 Cluster analysis

Cluster analysis was used to detect similarities among the sampling stations in the study area. The dendrogram shows that the six sampling stations in the area studied could be grouped into two significant clusters (A and B) as illustrated by Figure 3. Such is the case of the relatively large linkage distance at which the two groups combine, which indicates the Euclidean distances [47]. Cluster A mainly consists of four sampling stations that were located mostly in the outflow of the river (NDO, IDO, MDO and NDI) while Cluster B mainly consist of two stations mainly dam inflow stations (MDI and IDI). Except for Nagle Dam Inflow, Cluster A basically comprised of dam outflow stations. These stations can be described as less polluted due to the dilution and retention effect. On the other hand, Cluster B composed of dam inflow stations. These stations can be described as more polluted as a result of activities along the river course. The PC results explained Section 4.1 of this chapter showed that poor agriculture practises resulting in runoff of agrochemicals, organic matter, drug residues, sediments and saline drainage as well as sewage and industrial effluent discharges are key factors being reflected by the poor water quality results of the dam’s inflow stations (Cluster B). These practices pose a risk to aquatic ecosystems, human health and productive activities. The significant presence of E. coli, suspended solids and turbidity in both Cluster A and B sampling stations indicates that raw water along uMngeni Basin is not fit for potable use before treatment.

Figure 3.

Dendrogram of the stations along uMngeni basin.


4. Conclusions

Understanding the primary effects of anthropogenic activities and natural factors on river water quality is important in the study and efficient management of water resources. Hence, the PCA method assisted in the identification of significant parameters influencing water quality variations at the six stations studied in uMngeni Basin. The PCs extracted suggest that pollution sources along uMngeni Basin can be attributed to geological processes, sewage effluent, agricultural runoff and surface runoff pollutants. The findings could assist in reducing the number of parameters being monitored at any station and thus ultimately reducing the associated cost monitoring cost. It is recommended that, effluents be treated before discharge into the river. Additionally, it is recommended that buffer zone policies be enforced.

The result of the cluster analysis should also assist in categorising sampling sites according to pollution levels. Classification of sampling stations based on pollution level can assist in the designing of an optimal sampling strategy, which could reduce the number of sampling stations and associated costs. This study highlights the usefulness of multivariate statistical assessment such as PCA and CA in analysing complex databases, especially in the identification of pollution sources and to better comprehend the spatial and temporal variations for effective river water-quality management. It is worthwhile to conclude that PCA and CA are better tools for better understanding concealed information about parameter variance and datasets. The study recommends the application of PCA and CA for interpreting bulk surface water quality data-sets.



The authors gratefully acknowledge Durban University of Technology for hosting and funding the main author during his Master’s Degree study.


  1. 1. Vega M, Pardo R, Barrado E, Debán L. Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Research. 1998;32(12):3581-3592
  2. 2. Dinar S. Water, security, conflict, and cooperation. SAIS Review. 2002;22(2):229-253
  3. 3. Chinhanga JR. Impact of industrial effluent from an iron and steel company on the physico-chemical quality of Kwekwe River water in Zimbabwe. International Journal of Engineering, Science and Technology. 2010;2(7)
  4. 4. Wei S, Wang Y, Lam JC, Zheng GJ, So M, Yueng LW, et al. Historical trends of organic pollutants in sediment cores from Hong Kong. Marine pollution bulletin. 2008;57(6):758-766
  5. 5. Ali M, Nelson AR, Lopez AL, Sack DA. Updated global burden of cholera in endemic countries. PLoS neglected tropical diseases. 2015;9(6):e0003832
  6. 6. Department of Water Affairs and Forestry. uMngeni River and Neighbouring Rivers and Streams. Forestry DoWAa: State of the Rivers Report Pretoria; 2003
  7. 7. Umgeni Water. Umgeni Water Infrastructure Master Plan. (2012/2013–2042-2043). Pietremarizburg, South Africa: Umgeni Water. Palanning Services EaSSD: UW; 2012. p. 2012
  8. 8. VanDerZel DW. Umgeni River Catchment Analysis. Water SA. 1975;1(2)
  9. 9. Hemens J, Simpson DE, Warwick RJ. Nitrogen and phosphorus input to the Midmar dam. Natal. Water SA. 1977;3(4):193-201
  10. 10. Statement UWUWP, Volume 2 F-YBP. Business Plan. 2012/13 to 2016/17. Pietermarizburg. Water U: Umgeni Water; 2012
  11. 11. Couillard D, Lefebvre Y. Analysis of water indices. Journal of Environmental Management. 1985;21(2):161-179
  12. 12. Wepener V, Cyrus DP, Vermeulen LA. O’Brien GC, Wade P. Development of a Water Quality Index for Estuarine Water Quality Management in South Africa. Commission WR, editor. Pretoria; 2006
  13. 13. Yerel S, Ankara H. Application of multivariate statistical techniques in the assessment of water quality in Sakarya River, Turkey. Journal of the Geological Society of India. 2011;78(6):1-5
  14. 14. Alkarkhi AM, Ahmad A, Easa A. Assessment of surface water quality of selected estuaries of Malaysia: Multivariate statistical techniques. Environmentalist. 2009;29(3):255-262
  15. 15. Singh KP, Malik A, Mohan D, Sinha S. Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)—A case study. Water Research. 2004;38(18):3980-3992
  16. 16. Simeonov V, Stratis J, Samara C, Zachariadis G, Voutsa D, Anthemidis A, et al. Assessment of the surface water quality in northern Greece. Water Research. 2003;37(17):4119-4124
  17. 17. Boyacioglu H. Spatial differentiation of water quality between reservoirs under anthropogenic and natural factors based on statistical approach. Archives of Environmental Protection. 2014;40(1):41-50
  18. 18. Varol M, Gökot B, Bekleyen A, Şen B. Water quality assessment and apportionment of pollution sources of Tigris River (Turkey) using multivariate statistical techniques—A case study. River research and applications. 2012;28(9):1428-1438
  19. 19. Ouyang Y. Evaluation of river water quality monitoring stations by principal component analysis. Water Research. 2005;39(12):2621-2635
  20. 20. Bengraı̈ne K, Marhaba TF. Using principal component analysis to monitor spatial and temporal changes in water quality. Journal of Hazardous Materials. 2003;100(1–3):179-195
  21. 21. Kazi TG, Arain MB, Jamali MK, Jalbani N, Afridi HI, Sarfraz RA, et al. Assessment of water quality of polluted lake using multivariate statistical techniques: A case study. Ecotoxicology and Environmental Safety. 2009;72(2):301-309
  22. 22. Juntunen P, Liukkonen M, Lehtola M, Hiltunen Y. Cluster analysis by self-organizing maps: An application to the modelling of water quality in a treatment process. Applied Soft Computing. 2013;13(7):3191-3196
  23. 23. Jolliffe I. Principal component analysis: Wiley online. Library. 2005
  24. 24. Jackson JE. A user’s Guide to Principal Components: John Wiley & Sons; 2005
  25. 25. Lawless HT, Data Relationships HH. Multivariate Applications. Springer; 2010
  26. 26. Kim J-O, Mueller CW. Introduction to Factor Analysis: What it Is and How to do it. SAGE Publications; 1978
  27. 27. Cadima J, Jolliffe IT. Loading and correlations in the interpretation of principle compenents. Journal of Applied Statistics. 1995;22(2):203-214
  28. 28. Statistics DJC. Data Analysis in Geology. John Wiley & Sons; 1986
  29. 29. Razmkhah H, Abrishamchi A, Torkian A. Evaluation of spatial and temporal variation in water quality by pattern recognition techniques: A case study on Jajrood River (Tehran, Iran). Journal of Environmental Management. 2010;91(4):852-860
  30. 30. Mazlum N, Ozer A, Mazlum S. Interpretation of water quality data by principal components analysis. Turkish Journal of Engineering & Environmental Sciences/Turk Muhendislik ve Cevre Bilimleri Dergisi. 1999;23(1):19-26
  31. 31. Haag I, Westrich B. Processes governing river water quality identified by principal component analysis. Hydrological Processes. 2002;16(16):3113-3130
  32. 32. Rangeti I. Determinants of key drivers for potable water treatment cost in uMngeni basin. Durban: Durban Universty of. Technology. 2015
  33. 33. Panda UC, Sundaray SK, Rath P, Nayak BB, Bhatta D. Application of factor and cluster analysis for characterization of river and estuarine water systems – A case study: Mahanadi River (India). Journal of Hydrology. 2006;331(3–4):434-445
  34. 34. Ge J, Ran G, Miao W, Cao H, Wu S, cheng L. Water quality assessment of Gufu River in three gorges reservoir (China) using multivariable statistical methods. Advance Journal of Food Science and Technology. 2013;5(7):908-920
  35. 35. Ragno G, Luca MD, Ioele G. An application of cluster analysis and multivariate classification methods to spring water monitoring Data. Microchemical Journal. 2007;87:119-127
  36. 36. Johnson RA, Wichern DW. Education P. Prentice hall Englewood Cliffs, NJ: Applied multivariate statistical analysis; 1992
  37. 37. Otto M. Multivariate Methods. Analytical chemistry Weinheim: Wiley-VCH; 1998
  38. 38. Shrestha S, Kazama F. Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan. Environmental Modelling & Software. 2007;22(4):464-475
  39. 39. Basson MS, Rossow JD. Mvoti to Umzimkulu Water Management Area Overview of Water Resource Availability and Utilisation. Department of Water Affairs (DWAF); 2003. Report No.: PWMA 11/000/00/0203 Contract No.: PWMA 11/000/00/0203
  40. 40. Wilson AJ. Mvoti to Mzimkulu water management area situational assessment South Africa: A.J. Wilson & Associates International cc. In: cc AJWAI. 2001
  41. 41. Reghunath R, Murthy TS, Raghavan B. The utility of multivariate statistical techniques in hydrogeochemical studies: An example from Karnataka. India. Water research. 2002;36(10):2437-2442
  42. 42. Williams B, Brown T, Onsman A. Exploratory factor analysis: A five-step guide for novices. Australasian Journal of Paramedicine. 2012;8(3):1
  43. 43. Kaiser HF. An index of factorial simplicity. Psychometrika. 1974;39(1):31-36
  44. 44. Parinet B, Lhote A, Legube B. Principal component analysis: An appropriate tool for water quality evaluation and management—Application to a tropical lake system. Ecological Modelling. 2004;178(3–4):295-311
  45. 45. Jianqin M, Jingjing G, Xiaojie L. Water quality evaluation model based on principal component analysis and information entropy: Application in Jinshui River. Journal of resources and ecology. 2010;1(3):249-252
  46. 46. Ryberg KR. Continuous Water-Quality Monitoring and Regression Analysis to Estimate Constituent Concentrations and Loads in the Red River of the North, Fargo, North Dakota, 2003–05. 2006
  47. 47. Ryberg KR. Cluster Analysis of Water-Quality Data for Lake Sakakawea, Audubon Lake, and McClusky Canal, Central North Dakota, 1990–2003. 2007
  48. 48. Schertz TL, Alexander RB, Ohe DJ. The Computer Program Estimate Trend (Estrend), a System for the Detection of Trends in Water-Quality Data. U.S. Geological Survey, Water-Resources Investigations: Reston, Virginia; 1991
  49. 49. Lettenmaier DP, Conquest LL, Hughes JP. Routine streams and rivers water quality trend monitoring review. Washington: University of Washington; 1982. Contract No.: Technical Report 75
  50. 50. Mateo-Sagasta J, Zadeh SM, Turral H, Burke J. Water Pollution from Agriculture: A Global Review. Executive Summary: Rome, Italy: FAO Colombo. Sri Lanka: International Water Management …; 2017
  51. 51. Dzwairo B. Modelling raw water quality variability in order to predict cost of water treatment [DTech thesis]. Pretoria: Tshwane university of. Technology. 2011
  52. 52. Walmsely RD, Butty M. Limnology of some selected south African impoundments. Pretoria: Water Research Commission/National Institute for water research Council for Scientific and. Industrial Research. 1980
  53. 53. Neysmith J. Investigating Non-regulatory Barriers and Incentives to Stakeholder Participation in Reducing Water Pollution in Pietermaritzburg’s Baynespruit [Masters Degree]. Pietermaritzburg: University of KwaZulu-Natal; 2008
  54. 54. Geographic N. National Geographic. línea] Available: http://www nationalgeographic com es/historia/grandes-reportajes/los-barcos-de-losfaraones_8270/3[Último acceso: 20 Enero 2017]. 2017

Written By

Innocent Rangeti and Bloodless Dzwairo

Submitted: 22 July 2020 Reviewed: 29 October 2020 Published: 16 March 2021