Open access peer-reviewed chapter

Interpretation of Water Quality Data in uMngeni Basin (South Africa) Using Multivariate Techniques

By Innocent Rangeti and Bloodless Dzwairo

Submitted: July 22nd 2020Reviewed: October 29th 2020Published: March 16th 2021

DOI: 10.5772/intechopen.94845

Downloaded: 46

Abstract

The major challenge with regular water quality monitoring programmes is making sense of the large and complex physico-chemical data-sets that are generated in a comparatively short period of time. Consequentially, this presents difficulties for water management practitioners who are expected to make informed decisions based on information extracted from the large data-sets. In addition, the nonlinear nature of water quality data-sets often makes it difficult to interpret the spatio-temporal variations. These reasons necessitated the need for effective methods of interpreting water quality results and drawing meaningful conclusions. Hence, this study applied multivariate techniques, namely Cluster Analysis and Principal Component Analysis, to interpret eight-year (2005–2012) water quality data that was generated from a monitoring exercise at six stations in uMngeni Basin, South Africa. The principal components extracted with eigenvalues of greater than 1 were interpreted while considering the pollution issues in the basin. These extracted components explain 67–76% of the water quality variation among the stations. The derived significant parameters suggest that uMngeni Basin was mainly affected by the catchment’s geological processes, surface runoff, domestic sewage effluent, seasonal variation and agricultural waste. Cluster Analysis grouped the sampling six stations into two clusters namely heavy (B) or low (A), based on the degree of pollution. Cluster A mainly consists of water sampling stations that were located in the outflow of the dam (NDO, IDO, MDO and NDI) and its water can be described as of fairly good quality due to dam retention and attenuation effects. Cluster B mainly consist of dam inflow water sampling stations (MDI and IDI), which can be described as polluted if compared to cluster A. The poor quality water observed at Cluster B sampling stations could be attributed to natural and anthropogenic activities through point source and runoff. The findings could assist in determining an appropriate set of water quality parameters that would indicate variation of water quality in the basin, with minimum loss of information. It is, therefore, recommended that this approach be used to assist decision-makers regarding strategies for minimising catchment pollution.

Keywords

  • cluster analysis
  • multivariate technique
  • principal component analysis
  • uMngeni basin
  • water quality

1. Introduction

Water pollution is a global challenge undermining economic growth, health of millions of people as well as the physical status of the environment in both developed and developing countries. The current global water scarcity challenge is not only related to inadequacy in terms of quantity but also related to the progressive deterioration of quality making water unfit for some given uses such as potability. The deterioration of water quality is attributed to both natural (precipitation rate, weathering processes, soil erosion, etc) and anthropogenic (urban, industrial, agricultural activities, etc) factors. Seasonal variations in precipitation, surface run-off, ground water flow, interception and abstraction strongly affect the river discharge and the concentrations of water pollutants in a basin [1]. The effect of contaminant on water depends upon the characteristics of the water itself as well as quantity and characteristics of the contaminant.

Water pollutants, which are usually introduced through surface runoff or direct discharge, may in higher concentration result in rivers failing to provide adequate attenuation of pollutants, resulting in catchments failing to meet minimum compliance of quality for various uses such as potable water production. Furthermore, water quality deterioration is often a slow process not readily noticeable due attenuation effects until an apparent change occurs. Such situations are being exacerbated by the rapid increase in the demand for freshwater in many countries including South Africa. In view of the limited quantity of freshwater resources worldwide and the effect of anthropogenic activities, protection of these resources has become a priority [2, 3, 4]. It has, therefore, become imperative to monitor the quality of water in freshwater systems in order to prevent its further deterioration and thus ultimately ensure its continuous availability in a quality that meets various uses including potable water production. Pollutants in water can cause acute or chronic illness in humans especially when polluted water is consumed or when sewage is used to irrigate vegetables meant for human consumption. In specific cases this has resulted in loss of lives. For example, as at 2015, the bacterium Vibrio choleraecaused between 1.3 to 4.0 million infections and 21,000 to 143,000 deaths worldwide [5].

With concern of the detrimental effects of pollution, various agencies have been monitoring the quality of raw water within the uMngeni, a 232 km river that is then treated to serve almost 3.8 million people within and around Durban and Pietermaritzburg (South Africa) with potable water [6, 7, 8, 9, 10]. The primary objectives of such monitoring exercise have been to identify water quality problems, describe the spatio-temporal water quality trends, determine fitness compliance for specific uses and develop monitoring tools such as water quality indices for enhancing information dissemination. Although such monitoring programs are crucial to a better knowledge of hydrology and pollution problems in catchments such as uMngeni Basin, they tend to produce large amounts of complicated data-sets of various water parameters. The data-sets are often difficult to analyse and extract meaningful information and this makes it difficult to keep the public informed, who are the custodian of the resource. By keeping the public updated, it makes them more participatory in policy formulation and decision making regarding protection of the water resource [11, 12].

The classification and interpretation of monitoring stations are the most important steps in the assessment of water quality. Numerous studies have confirmed multivariate statistical techniques (cluster analysis, principal component analysis, factor analysis and discriminant analysis) as excellent tools for exploring and presenting the bulk and complex water quality data-sets [13, 14, 15]. These techniques allow for the determination of spatio-temporal water quality variability, classification of sampling stations and the identification of pollution sources [15, 16, 17, 18]. Furthermore, by eliminating subjective assumptions, multivariate techniques tend to reduce biasness when selecting parameters for developing tools such as water quality index. This ultimately assists in improving the accuracy of such monitoring tool. Nevertheless, the selection of a multivariate technique to apply depends on the nature of data-set and research objectives. While there are a number of multivariate techniques, studies have extensively applied the Principal Component Analysis (PCA) and Cluster Analysis (CA) due to their suitability in extracting information on various situations [15, 19, 20, 21, 22].

The application of principal component analysis (PCA) for the interpretation of a large and complex volume of data offers a better understanding of water quality, the ecological status of the basin being studied, while also allowing for the identification of possible factors/sources that influence the surface water systems [16]. Principal Component Analysis (PCA) aims to find combinations for certain variables to determine indices which describe the variation in the data while retaining as much information as possible. This reduction is achieved by transforming original variables into a new set of variables, known as principal components (PCs). These PCs, which are uncorrelated with the first few, retain most of the variation present in the original variables. The PCA technique transforms original variables into new uncorrelated variable known as principal components (PCs) [23, 24, 25]. The derived few variables can be used to provide a meaningful description of the entire data-set with a minimal loss of original information. The eigenvalues indicate the significance of each PC and a greater value, indicating the parameter’s importance [26]. Correlation of PCs and original variables are given by the loadings [27]. While loadings reflect the relative importance of a variable within the component, it should be highlighted that these values does not show the importance of the component itself [28].

The PCA has been successfully applied on hydrogeological and hydrogeochemical studies. The application of PCA by Razmkhah, Abrishamchi [29] distinguished the anthropogenic and natural polluting activities along Jairood River in Iran. The results identified 5 factors which explained 85% the variation in water quality. Mazlum, Ozer [30] applied the PCA to determine factors causing water quality variability along a tributary, Porsuk, in Turkey. The study identified four PCs which explained 70% of the total water quality variance. The factors were related to the discharge of domestic wastewater, nitrification, industrial wastewater and the seasonal effect. Haag and Westrich [31] applied PCA to analyse the water quality along Neckar River in Germany based on ten parameters monitored from 1993 to 1998. The four principal components extracted accounting for 72% of total variance were interpreted as; (i) dilution by high discharge (ii) biological activity, (iii) seasonal effects and (iv) wastewater impact [31].

The limitations with the PCA technique include ignoring the degree of data dispersion as well as a weakness in processing nonlinear data. The result of PCA can also be influenced by uneven sampling interval, missing values or observations below detection limits of analytical methods, which can be changed during the data collection period. It is thus important to treat water quality data before modelling in order to improve the accuracy [32].

On the other hand, cluster analysis (CA) is an unsupervised pattern recognition multivariate technique which group objects (e.g., water quality variables) based on either their similarities or dissimilarities [33, 34]. Its objective is to sort cases into groups or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. Most studies have applied the hierarchical clustering (HC) technique to sequentially category objects [13, 15, 34, 35]. Based on the hierarchical CA, water quality characteristics of each sampling location can be classified depending on pollution level. The results of a HC analysis are displayed graphically using a tree diagram commonly known as a dendrogram [13, 14, 36, 37]. The technique firstly groups the objects according to similarity. These groups are further merged according to their similarities or dissimilarities and eventually merge into a single cluster as the similarity among the subgroups decreases. The cluster analysis approach offers a reliable classification of surface water making it possible to design a future spatial sampling strategy that is cost-effective, with reduced number of sampling sites without losing any significant information [38].

Advertisement

2. Study area and water monitoring stations

uMngeni Basin, the study area, is situated in KwaZulu-Natal (KZN) Province, which lies along the eastern seaboard of the Republic of South Africa (Figure 1). uMngeni River (the main river in the basin), at 232 km long, is the primary source of raw water, which is then treated to serve a population of almost 3.8 million (as at 2013), in and around Durban metro as well as the city of Pietermaritzburg (PMB).

Figure 1.

uMngeni Basin, KZN Province, Drakensberg Mountains and South Africa.

Key activities that generate point and non-point pollution within the catchment include agriculture and animal faming while concentrated urban settlements provide a variety of supportive economic activities that generate solid and liquid waste. A consequence of the concentrated development in the catchment area has been the high levels of pollutants entering the water system, which are eventually flushed out to sea. The basin receives much of its rain in summer, with occasional snow falls in some of its high lying areas such as the Drakensberg Mountain [39]. The geology of uMngeni Basin varies from basalts, granites, sandstones, shale and tillites [40]. About half of uMngeni Basin sits on top of the Karoo in the KZN part of the Drakensberg Mountains. The other potion extends east on top of the South African Coastal Plate (Figure 2).

Figure 2.

Human settlements dominate from the central parts of the basin (PMB and its periphery) up to the Indian Ocean.

The 2009 Landuse map (Figure 2) indicates that there are mixed activities, where cultivation and plantations are located predominantly from central to the north-west of the basin.

2.1 Water quality monitoring points considered

Six water quality monitoring points shown in Figure 2, namely Midmar Dam Inflow (MDI) (Upstream point),Midmar Dam Outflow (MDO), Nagle Dam Inflow (NDI), Nagle Dam Outflow (NDO), Inanda Dam Inflow (IDI) and Inanda Dam Outflow (IDO) (Downstream point),were considered in this study. The dam inflow stations were assumed to give a reflection of the pollution activities along the river course while the dam outflow stations were expected to depict the dilution and retention effects.

2.1.1 Methods and materials

Multivariate statistical methods have been widely applied in environmental data reduction and interpretation of multi-constituent chemical and physical biological measurements. These techniques have been applied to identify factors that influence water systems, to assist in reliable water resource management as well as determine rapid solutions for pollution problems [16, 41]. This study applied PCA and CA techniques to extract information from the raw data regarding the significant parameters influencing the variation of water quality at each of the six stations studied. The Kaiser-Meyer-Olkin (KMO), which test the sampling adequacy, was used to determine the suitability of water quality data for PCA analysis [42, 43, 44]. Kaiser [43] recommended 0.5 as a minimum (barely accepted), values between 0.7–0.8 acceptable, and values above 0.9 as depicting excellence. The current study employed the PCA technique to determine the most significant parameters that would explain the variation in water quality.

PCA is a very powerful multivariate statistical analysis technique used to reduce the dimensionality of a data set consisting multiple inter-related variables, while retaining data variability [45]. The technique extracts primary information representative of the typical characteristics of the water environment from a large amount of data and then represents it as a new set of independent variables of the principal component. PCA reduces the dimensionality of a multivariate data-set to a small number of independent principal components. Each principal component contains all the variable information, thus reducing the omission of information.

The PCA method is composed of five main operational steps, as follows:

  1. The original data matrix is shown in Eq. 1:

    X=xijnp=x11x1pxn1xnpE1

where xijis the originally measured data, n represents the monitoring station, and p represents each water quality parameter.

  • Standardising the original data with Z-score standardisation formula to eliminate the impact of dimension (Eq. 2).

    xij=xijxj¯/sj,E2

    where xijis the standard variable, xj is the average value for jth indicator, and sj is the standard deviation for the jth indicator.

  • Calculating the correlation coefficient matrix, R, with standardised data and determining the correlation between indicators (Eq. 3).

    R=rijpp=1n1t=1nxtixtjij=12pE3

  • Calculating the eigenvalues and eigenvectors of the correlation coefficient matrix, R, to determine the number of principal components. The eigenvalues of the correlation coefficient matrix, R, are represented by i (i = 1, 2 _ _ _ n) and their eigenvectors are ui (Ui = Ui1, Ui2, …………Uin) (i = 1, 2 _ _ _n). The value corresponds to the variance of the principal component, and the value of variance is positively correlated with the contribution rate of the principal components. Further, the cumulated contribution rate of the first m principal components should be more than 80%, which means that as explained in Eq. 4:

    i=1mλj/i=1nλj0.80.E4

    The principal component is represented by Eq. 5.

    Fi=ui1x1+ui2x2++uinxni=12n,E5

    where xi is the standardised indicator variable as shown in Eq. 6:

    xi=xixi¯/si.E6

  • The obtained principal components are weighted and summed to obtain a comprehensive evaluation function, as shown in Eq. 7:

  • F=λ1λ1+λ2++λnF1+λ2λ1+λ2++λnF2+λnλ1+λ2+λnFnE7

    Principal components with an eigenvalue greater than 1 were related to the major pollution sources in uMngeni Basin. Water quality parameters with loadings of greater than 0.5 (highlighted in bold in the results tables) were regarded as significantly influencing water quality variation in uMngeni Basin.

    Thereafter, cluster analysis (CA) was applied to determine the spatial similarity of the six water sampling stations studied. The hierarchy cluster analysis was employed using the Ward’s method with Euclidean distances as a measure of dissimilarity [37, 46]. The number of subgroups for analysis were determined by drawing a line across the dendrogram and examining the main clusters branching out beneath that line [47]. Determination of the subgroups for analysis was subjective based on available information regarding pollution activities the along uMngeni River.

    Eight-year (2005–2012) water quality data-sets obtained from then Umgeni Water was used in this study. Since monitoring generally depends on the pollution problem at any given time and space, the number and type of parameters monitored at each of the stations varied. As the study was data-driven, a monthly median was used for in-depth analysis. The adoption of median instead of the mean was in consideration that the latter is normally influenced by the outliers which are common in water quality data-sets while the former is resistant. The period studied was determined in consideration of a criteria explained by Schertz, Alexander [48] and Lettenmaier, Conquest [49]. These studies reported that at least a five-year monthly data and two-year monthly data should be sufficient for a defensible monotonic and step-trend (abrupt shift) study, respectively.

    3. Results

    3.1 PCA analysis results

    The Kaiser-Meyer-Olkin (KMO) results for the six stations ranged from 0.610 to 0.786 showing the fitness of the data-sets for PCA analysis. The component matrix tables shown in the different sections of the results only depict PCs with eigenvalue of greater than one (1). Only parameters with a correlation coefficient of great than 0.5 (highlighted in bold black) with its respective principal component were considered as significantly influencing water quality variability at any given station.

    3.1.1 Midmar dam inflow (MDI)

    Table 1 show the extracted seven PCs with eigenvalues of greater than 1 that explain 75% of the water quality variation at the Midmar Dam Inflow sampling station. While considering the high positive correlations of nutrient, metal ion and organic related parameters with component 1 (Table 2), it can be hypothesised that 20.6% of the water quality variation at this station is a result of both anthropogenic and natural processes. The nutrient and organic related parameters can be explained by piggery, dairy and maize farming activities surrounding Midmar Dam [6]. Animal manure enters surface water, both accidentally and deliberately, from households, villages, communal farms and feedlots. Without treatment, manure runoff tends to result in algae blooms which can lead to human health problems if consumed. The second component explained 15.7% (Table 1) of the total variance at MDI and showed a positive correlation to Suspended Solids (SS), iron (Fe) and turbidity (Table 2). Turbidity and suspended solids can be related to surface runoff from agricultural activities along uMngeni River whilst Iron (Fe) can be attributed to weathering processes.

    Midmar dam inflow: total variance explained
    ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
    Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %
    15.74024.95524.9555.74024.95524.9554.74020.60720.607
    25.00221.75046.7055.00221.75046.7053.62115.74236.349
    31.9038.27254.9771.9038.27254.9773.01113.09049.440
    41.3255.76060.7371.3255.76060.7371.7877.77157.211
    51.1655.06565.8021.1655.06565.8021.4126.13963.350
    61.1034.79570.5981.1034.79570.5981.3906.04369.393
    71.0114.39474.9921.0114.39474.9921.2885.59874.992

    Table 1.

    Extracted values of the significant components at Midmar dam inflow (MDI).

    Extraction Method: Principal Component Analysis.

    Component Matrixa
    Component
    1234567
    Potassium (K)0.890−0.094−0.0270.1110.090−0.054−0.074
    Sulphate (SO4)0.852−0.027−0.213−0.078−0.128−0.0180.072
    Chloride (Cl)0.824−0.322−0.169−0.099−0.129−0.008−0.071
    Total Dissolved Solid0.743−0.199−0.161−0.197−0.161−0.001−0.106
    Nitrate (NO3)0.666−0.243−0.231−0.097−0.2950.049−0.014
    Total Organic Carbon(TOC)0.6620.4640.012−0.0530.0740.003−0.156
    Escherichia coli (E. coli)0.4400.3560.0350.431−0.1530.135−0.085
    Suspended Solid0.4160.7410.0050.1720.1810.1130.319
    Iron (Fe)0.4010.713−0.0840.0370.133−0.1010.152
    Turbidity0.4000.6920.0640.2090.1900.1410.404
    Calcium (Ca)0.503−0.6870.3040.0910.2660.0980.045
    Magnesium (Mg)0.602−0.6810.1680.0250.0890.0730.009
    Sodium (Na)0.581−0.6660.1560.0100.2070.0850.059
    Alkalinity (Alk)−0.036−0.6320.4510.0900.3380.1570.149
    Total Phosphate0.1920.5560.415−0.092−0.0890.260−0.284
    Colour0.2460.5450.225−0.1830.355−0.1980.026
    Dissolved Oxygen (DO)−0.048−0.075−.5910.4100.0170.2620.093
    Temperature (Temp)0.2300.4530.460−0.3830.017−0.222−0.137
    Silicon (Si)0.3470.321−0.379−0.460−0.0260.0590.121
    pH−0.059−0.147−0.433−0.3450.5500.222−.199
    Ammonia (NH3)0.0850.1600.470−0.089−0.3760.6190.009
    Conductivity0.286−0.3240.2250.099−0.252−0.5280.339
    Soluble Reactive Phosphate (SRP)0.3130.2500.0190.5130.131−0.223−0.599

    Table 2.

    The correlation among the parameters measured and the extracted significant components at MDI.

    7 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.


    Extraction Method: Principal Component Analysis.

    Agriculture, a sector responsible for the usage of 70% of water being abstracted globally, plays a major role in water pollution [50, 51]. Runoff from agricultural activities such as agrochemicals, organic matter, drug residues, sediments and saline drainage into water bodies can lead to nutrient enrichment and eutrophication. The resultant water pollution poses a risk to aquatic ecosystems, human health and productive activities. Poor land management practises and deforestation can also explain the water quality variation at MDI. It is important for communities to practise improved land management through planting vegetation such as trees and plants to cover the ground. The negative relationship of metal ions with component 2 can be explained by the seasonality effect. Increased flow in the wet season turns to reduce the concentration of mineral salt content in a river system as a result of the dilution effect.

    3.1.2 Midmar dam outflow (MDO)

    The results in Table 3 indicate that the first seven principal components with eigenvalues of greater than one (1) account for 73.7% of the total variance in the water-quality data set at Midmar Dam Outflow station. Component 1 which explains 27.1% of the total variance at MDO (Table 3) is mainly influenced by parameters related to human activities (turbidity and ammonia) as well as natural geological processes (silicon and calcium) (Table 4). Silicon is part of various essential plant minerals and it is released during weathering processes. Sodium and potassium which showed a moderate positive correlation to component 5 reflects the natural processes such as weathering.

    Midmar dam outflow: total variance explained
    ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
    Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %
    16.72932.04432.0446.72932.04432.0445.69027.09627.096
    22.0429.72641.7702.0429.72641.7701.8718.91036.006
    31.7598.37850.1481.7598.37850.1481.7948.54244.548
    41.3996.66056.8081.3996.66056.8081.7698.42452.971
    51.2716.05362.8611.2716.05362.8611.6597.90260.873
    61.1965.69468.5551.1965.69468.5551.5087.18168.054
    71.0785.13373.6871.0785.13373.6871.1835.63473.687

    Table 3.

    Extracted values of the significant components at Midmar dam outflow.

    Extraction Method: Principal Component Analysis.

    Component Matrixa
    Component
    1234567
    Calcium (Ca)0.906−0.1000.117−0.0760.134−0.0650.054
    Conductivity0.8690.0280.0830.1140.0300.060−0.050
    Magnesium (Mg)0.856−0.1310.109−0.2360.171−0.0450.039
    Alkalinity0.821−0.2080.103−0.103−0.0920.0800.075
    Ammonia (NH3)0.783−0.1360.201−0.051−0.1810.072−0.036
    Turbidity0.7360.442−0.045−0.171−0.183−0.1460.062
    Suspended Solids (SS)0.6790.533−0.039−0.187−0.118−0.1630.079
    Sulphates (SO4)−0.6630.052−0.2670.0880.3140.1070.315
    Silicon (Si)0.6130.2530.0990.3900.123−0.1640.086
    Dissolved Oxygen (DO)−0.5440.2600.062−0.3680.195−0.143−0.019
    Nitrate (NO3)0.519−0.3340.0680.130−0.3030.288−0.061
    Total Phosphate (TP)0.2050.532−0.285−0.3320.0160.291−0.048
    % NH3%−0.2470.2920.7910.0650.0970.3140.154
    pH−0.3360.4630.7320.0050.0610.1030.130
    Total Organic Carbon (TOC)0.2120.412−0.1690.459−0.116−0.3550.306
    Potassium (K)0.3990.046−0.16080.3080.605−0.105−0.160
    Sodium (Na)0.278−0.3600.245−0.4210.533−0.148−0.084
    Chloride (Cl)0.3840.277−0.303−0.1040.3900.2110.193
    Temperature `C0.320−0.159−0.0570.4370.2090.5580.160
    SRP0.0630.434−0.279−0.095−0.0480.488−0.443
    Escherichia coli (E. coli)−0.0310.2300.2250.3550.127−0.195−0.718

    Table 4.

    The correlation among the parameters measured and the extracted significant components at MDO.

    7 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.


    Extraction Method: Principal Component Analysis.

    3.1.3 Nagle dam inflow

    The PCA technique identified five components, which cumulatively explained 67.4% of the total variance at NDI (Table 5). Component 1 which explained 24.7% of the total variance is significantly affected by parameters (highlighted in bold black) which normally originate from surface runoff of agriculture areas and effluent from wastewater treatment plants (Table 6). The pollution of water bodies when practicing agriculture is mainly due to fertiliser runoff after rainfall, nutrients (such as nitrogen) that percolate through the soil and contaminated groundwater, as well as sediment that is eroded from fields and washed into watercourses during and after rainfall. While studying the limnology of South Africa’s major impoundments, Walmsely and Butty [52] described Nagle Dam as a phosphate limited oligotrophic system.

    Nagle dam inflow: total variance explained
    ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
    Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %
    14.96030.99830.9984.96030.99830.9983.96224.76024.760
    22.16413.52344.5212.16413.52344.5212.07312.95737.717
    31.4298.93453.4551.4298.93453.4551.97312.33250.049
    41.1617.25760.7131.1617.25760.7131.5589.73659.785
    51.0746.71067.4221.0746.71067.4221.2227.63867.422

    Table 5.

    Extracted values of the significant components at Nagle dam inflow (NDI).

    Extraction Method: Principal Component Analysis.

    Component Matrixa
    Component
    12345
    Turbidity0.822−0.253−0.036−0.2140.052
    Total Phosphate0.776−0.084−0.3500.3070.068
    Soluble Reactive Phosphate (SRP)0.757−0.207−0.3230.2990.080
    Suspended Solids (SS)0.7410.001−0.213−0.3330.093
    Total Organic Carbon (TOC)0.7210.099−0.132−0.4100.152
    Escherichia coli (E. coli)0.718−0.4040.0470.2730.234
    Nitrate (NO3)0.677−0.2780.337−0.076−0.251
    Conductivity0.6620.0200.383−0.180−0.336
    Temperature0.5660.3950.2050.131−0.325
    % NH3%0.3630.844−0.137−0.0880.152
    pH0.1450.819−0.0940.0100.228
    Scenedesmus0.1600.336−0.1190.246−0.283
    Nitzschia0.2060.1530.711−0.1300.087
    Algal count cell0.2530.3010.0930.486−0.334
    Navicula0.1430.0160.5170.4090.612
    Ammonia- N (NH3)−0.051−0.185−0.1060.183−0.206

    Table 6.

    The correlation among the parameters and the significant components at NDI.

    5 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.


    Extraction Method: Principal Component Analysis.

    3.1.4 Nagle dam outflow

    Seven significant components which explained 75% of the total variance were extracted at Nagle Dam Outflow station (Table 7). The first component which contributed 23% (Table 8) of the total variance is dominated with metal ions which can be related on natural geological processes such as weathering. The second component which explains 15.5% of the total variation is dominated by E. coli, suspended solids and turbidity (highlighted in bold black) (Table 8). These pollutants can be related to the discharge of sewage effluent and runoff from a community practicing open defecation.

    Nagle dam outflow: total variance explained
    ComponentInitial eigenvaluesExtraction sums of squared loadings
    Total% of VarianceCumulative %Total% of VarianceCumulative %
    14.83223.01123.0114.83223.01123.011
    23.26215.53238.5423.26215.53238.542
    32.18610.40948.9512.18610.40948.951
    41.8798.94657.8971.8798.94657.897
    51.3906.61864.5151.3906.61864.515
    61.1855.64370.1581.1855.64370.158
    71.0725.10375.2611.0725.10375.261

    Table 7.

    Extracted values of the significant components at Nagle dam outflow.

    Extraction Method: Principal Component Analysis.

    Component Matrixa
    Component
    1234567
    % NH3%−.158.051.364.690.483−.092.164
    Alkalinity.342−.099.339−.377.102.149.346
    Calcium (Ca).650−.537−.454.154.103−.007.030
    Chloride (Cl).673−.154.499−.032.055.037−.281
    Conductivity.782−.017.430−.134.114.024−.039
    Dissolved Oxygen (DO)−.496−.280.224−.228.016.255−.213
    Escherichia coli (E. coli).330.630−.300−.071.287.227−.175
    Potassium (K).707−.235−.219.259−.269−.213.031
    Magnesium (Mg).774−.557−.120.036.126.087.050
    Sodium (Na).549−.591−.482.257.017−.040−.050
    Ammonia (NH3)−.027.091−.381.117−.047.453.426
    Nitrate (NO3).498.053.511.232−.337.048−.408
    pH−.350−.039.181.671.524−.032−.052
    Silicon (Si).442.404.255−.162.043.011.364
    Sulphate (SO4).326−.363.139−.445.583.266.000
    Soluble Reactive Phosphate (SRP).056.148−.166−.385.302−.597−.028
    Suspended Solids (SS).312.618−.238.193.046.404−.197
    Temperature.506.429.292.184−.183−.177.441
    Total Organic Carbon.489.497.031.116−.047−.133−.031
    Total Phosphate (TP).127.354−.303−.227.283−.347−.130
    Turbidity.402.692−.282.006.084.109−.165

    Table 8.

    The correlation among the parameters and the significant components at NDO.

    7 Components extracted.


    Extraction Method: Principal Component Analysis.

    3.1.5 Inanda dam inflow

    At Inanda Dam Inflow station, seven components which explained 76% of the total variance were extracted (Table 9). Component 1 (Table 10) which is mainly metal ions and explains 11% of the total variance was comprised of metal ions (highlighted in bold black) which suggest that geological processes in the area could be significantly attributing to the water quality variation. The high positive correlation of chloride and component 1 also reflects the effect of anthropogenic pollutants on this station. Both legal and illegal effluent discharges from industrial areas such as Willowton, are the predominant pollution sources that affect Inanda Dam [53]. In developing countries, 70 percent of industrial wastes are dumped untreated into waters, impacting on the usability of the water resource [54].

    Inanda dam inflow: total variance explained
    ComponentInitial eigenvaluesExtraction sums of squared loadingsRotation sums of squared loadings
    Total% of VarianceCumulative %Total% of VarianceCumulative %Total% of VarianceCumulative %
    17.00531.84031.8407.00531.84031.8405.94627.02527.025
    22.88813.12744.9672.88813.12744.9672.44311.10438.129
    31.8658.47853.4451.8658.47853.4452.0199.17947.309
    41.4276.48559.9301.4276.48559.9301.9228.73756.045
    51.3636.19666.1271.3636.19666.1271.6957.70463.749
    61.1715.32571.4511.1715.32571.4511.3856.29770.046
    71.0124.59976.0501.0124.59976.0501.3216.00476.050

    Table 9.

    Extracted values of the significant components at Inanda dam inflow.

    Extraction Method: Principal Component Analysis.

    Component Matrixa
    Component
    1234567
    % NH3%−0.1550.2690.7180.1930.209−0.185−0.170
    Alkalinity0.746−0.1860.182−0.2410.2540.045−0.065
    Calcium (Ca)0.8590.215−0.059−0.2470.0300.0930.063
    Chloride (Cl)0.9110.1400.089−0.0540.1220.024−0.090
    Conductivity0.8890.1690.009−0.0480.1080.027−0.096
    Dissolved Oxygen (DO)0.313−0.0330.1280.464−0.6400.1680.017
    Escherichia coli−0.2970.400−0.0550.080−0.1220.538−0.355
    Flourine (F)0.2040.3070.3180.329−0.0670.236−0.319
    Potassium (K).5140.496−0.065−0.122−0.012−0.2400.116
    Magnesium (Mg)0.7340.1450.075−0.1610.0740.4110.176
    Sodium (Na)0.8820.1740.100−0.0170.0450.009−0.006
    Ammonia (NH3)0.1630.3300.1940.239−0.1950.0120.681
    Nitrate (NO3)0.4930.194−0.3750.336−0.191−0.1310.144
    pH−0.0750.2270.7170.019−0.260−0.408−0.102
    Silicon (Si)−0.5810.2160.1780.1740.3220.1920.306
    Sulphate (SO42+)0.799.3390.0880.0760.0710.0610.005
    Soluble Reactive Phosphate (SRP)0.0860.553−0.4610.3960.271−0.278−0.102
    Suspended Solids−.5220.569−0.048−0.382−0.2190.0060.108
    Temperature−0.5060.1870.3030.0880.5440.1840.215
    Total Organic Carbon (TOC)−0.1900.5570.036−0.516−0.192−0.249−0.049
    Total Phosphate (TP)−0.3030.645−0.3510.2290.227−0.142−0.158
    Turbidity−0.5850.620−0.028−0.241−0.1840.328−0.010

    Table 10.

    The correlation among the parameters and the significant components at IDI.

    7 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.


    Extraction Method: Principal Component Analysis.

    Given the high positive correlation coefficient of Soluble Reactive Phosphate (SRP), Suspended Solids (SS), Total Organic Carbon (TOC), Total Phosphate (TP) and turbidity on the component 2, it can be claimed that pollutants in this group which explained 9% of the variation in water quality are emanating from agricultural activities (Table 10). The negative correlation noted between dissolved oxygen and component 5 indicates deterioration in the water quality. Since component 5 also exhibited a positive correlation to temperature, it can be deduced that climatic conditions could explain the 7.7% variation noted at this station. Component 6 which is mainly influenced by E. colisuggest the effects of sewage effluent into the water system. This could be attributed to the effluent from DV Wastewater Works which treat domestic effluent from Pietermaritzburg.

    3.1.6 Inanda dam outflow

    At Inanda Dam Outflow station, eight components explaining 75% of the total variance were extracted as depicted in Table 11. We hypothesised that pollutants in component 1 depicted in Table 12 (11.8% and highlighted in bold black) were mainly contributed by metal ions which reflects the geology of a catchment area. The positive correlation of sulphate and component 1 reflects the effect of anthropogenic polluting activities. Component 3 (Table 12) is mainly attributable to agricultural pollutant sources due to moderate positive high correlations with turbidity, nitrate and suspended solids. It is most plausible to suggest that turbidity and suspended solids is a result of surface runoff due to rainfall. Escherichia coli (E. coli)which dominates in component 6 normally can be attributed to wastewater treatment plants in Pietermaritzburg.

    Inanda dam outflow: total variance explained
    ComponentInitial eigenvaluesExtraction sums of squared loadings
    Total% of VarianceCumulative %Total% of VarianceCumulative %
    15.57525.34225.3425.57525.34225.342
    22.59711.80437.1462.59711.80437.146
    32.0669.39046.5362.0669.39046.536
    41.5827.19253.7281.5827.19253.728
    51.3606.18359.9111.3606.18359.911
    61.2175.53465.4451.2175.53465.445
    71.1355.16170.6061.1355.16170.606
    81.0094.58975.1941.0094.58975.194

    Table 11.

    Extracted values of the significant components at Inanda dam outflow.

    Extraction Method: Principal Component Analysis.

    Component Matrixa
    Component
    12345678
    Magnesium (Mg)0.903−0.053−0.074−0.034−0.1410.0230.013−0.014
    Sodium (Na)0.8810.007−0.096−0.078−0.1020.030−0.107−0.029
    Calcium (Ca)0.877−0.0330.0560.108−0.1460.174−0.1890.000
    Chloride (Cl)0.841−0.050−0.0310.0290.0710.1700.2570.001
    Potassium (K)0.800−0.076−0.093−0.100−0.138−0.411−0.045−0.011
    Conductivity0.671−0.096−0.1860.0420.2410.1820.046−0.131
    Alkalinity0.664−0.334−0.3670.2680.1380.059−0.0090.158
    Sulphate (SO4)0.6120.3740.080−0.262−0.1180.0550.095−0.324
    % NH3%0.1650.816−0.177−0.033−0.212−0.037−0.0110.376
    pH0.2120.781−0.046−0.084−0.2950.039−0.0900.310
    Temperature0.0270.596−0.134−0.1200.347−0.1510.4700.041
    Turbidity0.3270.0990.7850.1530.143−0.0570.0280.046
    Nitrate (NO3)−0.0090.1920.7210.039−0.2790.294−0.290−0.197
    Suspended Solids0.3250.3470.5310.1210.317−0.1940.170−0.178
    Ammonia (NH3)0.271−0.3840.430−0.071−0.0940.0110.0040.290
    Total Phosphate (TP)−0.008−0.2260.106−0.636−0.0320.172−0.0510.217
    Soluble Reactive Phosphate (SRP)0.017−0.3660.232−0.605−0.017−0.0430.1600.371
    Flourine (F)0.469−0.136−0.003−.1030.538−0.089−0.3000.130
    Dissolved Oxygen (DO)0.041−0.213−0.1760.398−0.5080.0420.0830.045
    Escherichia coli (E coli)−0.0940.149−0.0260.1710.2820.8320.1390.199
    Silicon (Si)0.095−0.2830.3270.458−0.161−0.1460.5240.300
    TOC−0.0470.1690.0640.4070.283−0.203−0.4960.363

    Table 12.

    The correlation among the parameters and the significant components extracted at IDO.

    8 Components extracted.Bold: Significant contributors to the respective principal component in their respective there column.


    Extraction Method: Principal Component Analysis.

    3.2 Cluster analysis

    Cluster analysis was used to detect similarities among the sampling stations in the study area. The dendrogram shows that the six sampling stations in the area studied could be grouped into two significant clusters (A and B) as illustrated by Figure 3. Such is the case of the relatively large linkage distance at which the two groups combine, which indicates the Euclidean distances [47]. Cluster A mainly consists of four sampling stations that were located mostly in the outflow of the river (NDO, IDO, MDO and NDI) while Cluster B mainly consist of two stations mainly dam inflow stations (MDI and IDI). Except for Nagle Dam Inflow, Cluster A basically comprised of dam outflow stations. These stations can be described as less polluted due to the dilution and retention effect. On the other hand, Cluster B composed of dam inflow stations. These stations can be described as more polluted as a result of activities along the river course. The PC results explained Section 4.1 of this chapter showed that poor agriculture practises resulting in runoff of agrochemicals, organic matter, drug residues, sediments and saline drainage as well as sewage and industrial effluent discharges are key factors being reflected by the poor water quality results of the dam’s inflow stations (Cluster B). These practices pose a risk to aquatic ecosystems, human health and productive activities. The significant presence of E. coli, suspended solids and turbidity in both Cluster A and B sampling stations indicates that raw water along uMngeni Basin is not fit for potable use before treatment.

    Figure 3.

    Dendrogram of the stations along uMngeni basin.

    4. Conclusions

    Understanding the primary effects of anthropogenic activities and natural factors on river water quality is important in the study and efficient management of water resources. Hence, the PCA method assisted in the identification of significant parameters influencing water quality variations at the six stations studied in uMngeni Basin. The PCs extracted suggest that pollution sources along uMngeni Basin can be attributed to geological processes, sewage effluent, agricultural runoff and surface runoff pollutants. The findings could assist in reducing the number of parameters being monitored at any station and thus ultimately reducing the associated cost monitoring cost. It is recommended that, effluents be treated before discharge into the river. Additionally, it is recommended that buffer zone policies be enforced.

    The result of the cluster analysis should also assist in categorising sampling sites according to pollution levels. Classification of sampling stations based on pollution level can assist in the designing of an optimal sampling strategy, which could reduce the number of sampling stations and associated costs. This study highlights the usefulness of multivariate statistical assessment such as PCA and CA in analysing complex databases, especially in the identification of pollution sources and to better comprehend the spatial and temporal variations for effective river water-quality management. It is worthwhile to conclude that PCA and CA are better tools for better understanding concealed information about parameter variance and datasets. The study recommends the application of PCA and CA for interpreting bulk surface water quality data-sets.

    Acknowledgments

    The authors gratefully acknowledge Durban University of Technology for hosting and funding the main author during his Master’s Degree study.

    © 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    How to cite and reference

    Link to this chapter Copy to clipboard

    Cite this chapter Copy to clipboard

    Innocent Rangeti and Bloodless Dzwairo (March 16th 2021). Interpretation of Water Quality Data in uMngeni Basin (South Africa) Using Multivariate Techniques, River Basin Management - Sustainability Issues and Planning Strategies, José Simão Antunes Do Carmo, IntechOpen, DOI: 10.5772/intechopen.94845. Available from:

    chapter statistics

    46total chapter downloads

    More statistics for editors and authors

    Login to your personal dashboard for more detailed statistics on your publications.

    Access personal reporting

    Related Content

    This Book

    Next chapter

    Riparian-Buffer Loss and Pesticide Incidence in Freshwater Matrices of Ikpoba River (Nigeria): Policy Recommendations for the Protection of Tropical River Basins

    By Azubuike Victor Chukwuka and Ozekeke Ogbeide

    Related Book

    First chapter

    Vulnerability, Urban Design and Resilience Management

    By Bruno Barroca

    We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

    More About Us