Temporal Water Quality Assessment of Langat River from 1995-2006

Water


Introduction
Water quality is generally described according to biological, chemical and physical properties (Coke et al 2005). Based on these properties, the quality of water can be expressed via a numerical index (i.e. Water Quality Index, WQI) by combining measurements of selected water quality variables. The index is important in evaluating the water quality of different sources and in observing the changes in the water quality as a function of time and other influencing factors (Sarkar and Abbasi 2006). The time when samples are taken is one of the contributing factors that can influence the concentration of a particular water quality variable (Coke et al 2005). Thus, temporal assessment is a good indication in determining the presence or absence of trend and seasonality to which water quality is responding to changes in the catchment and time.
However, the assessment on the temporal effect of water quality sub indices variables and WQI are rarely carried out by the Malaysian Department of Environment and Malaysian Department of Irrigation and Drainage (DID). Several studies by past researchers investigated the water quality assessment in Langat River especially in spatial assessment such as evaluating the polluting effects from various land use pattern (Suki et al 1988), relationship between water quality and sewage discharge and location Lee et al (2006) and on spatial variations of water quality variables (Juahir et al 2010a). Most of the studies did not consider the temporal assessment in details. Therefore, the influence of time on selected water quality variables and water quality index of Langat River are studied by using box plot to examine the annual and quarterly pattern. Then, regression time series and decomposition analysis are carried out on normally distributed variables with no outliers at particular stations along the Langat River. Both methods are helpful in evaluating the changes with time and in determining the best fitted models of the selected variables. www.intechopen.com

Description of study area 2.1 Background
Pollution prevention improvement programme has been introduced by the Malaysian Department of Environment (DOE) from 2001 to improve the condition of polluted rivers in Malaysia. Langat River which is situated in the state of Selangor, Peninsular Malaysia with a total catchment area of approximately 1,815km 2 i s c h o s e n f o r t h i s p r o g r a m m e . T h e catchment area is shown in Fig. 1. Data used in the analysis for Langat River were collected from six monitoring stations as shown in Table 1

Land use changes
The land use of Langat catchment consists of mainly agriculture, forest, urban areas (commercial and residential) and water bodies. There are 3 types of forest in Langat River catchment area such as dipterocarp, peat swamp and mangrove. Agriculture is the dominant land use, followed by forest, urban areas and water bodies of Langat catchment as presented by Table 2 and Fig. 2. However, Langat River as a tropical catchment area is experiencing rapid urbanization (Amini et al 2009) where the urban expansion occurred since 1981 (see Table 2 and Fig. 2). The gain in size of urbanised area was also reported by Jaafar et al (2009) and the urban development which occurred in the Langat River catchment was due to extensive land exchange from agriculture to urban-industrial-commercial use. Year

Sources of pollutions
Langat River is one of the most important raw water resources for drinking water, recreation, industry, fishery and agriculture. The river flows from the highest peak of 1493 meter of Gunung Nuang across Langat Basin to Kuala Langat and land use activities along the river banks contribute to deterioration of river water quality (Charlie 2010). The sources of the Langat River pollution are identified as industrial discharge (58%), domestic sewage from treatment plants (28%), construction projects (12%) and pig farming (2%) (Khairuddin et al 2002). A study from Juahir et al (2010a) showed that major sources of surface water quality variations in Langat River come from industrial effluents, wastewater treatment plants, domestic and commercial areas.
The declining quality of the river water is caused by two main sources of water pollution, i.e. point sources and non-point sources. Point source (PS pollution) is single and identifiable source that discharge pollutants into the environment such as discharge from manufacturing and agro-based industries, sewage treatment plants and animal farms. On the other hand, non-point sources pollution (NPS pollution), also known as polluted runoff is pollution where sources cannot be traced to a single point. NPS is defined as pollution originating from diffused sources such as agricultural activities and surface runoffs which contributed by storm runoff, e.g. from rainfall, snowmelt, or irrigation over land surfaces into the drainage system (Sapari et al 2009). The increase of PS and NPS pollution loading such as discharges of surface runoff, domestic sewage, ship wastes and industrial discharges into coastal waters may resulted by rapid urbanization along the river. The NPS pollution is seen as the main contributor to the pollution load in Langat River compared to PS pollution (UPUM 2002). It is also obvious that the Langat River ecosystem is under stress from the discharge of effluents particularly domestic sewage (Lee et al 2006). Table 4 indicates some of the major source of pollution in the study area (Department Of Environment, DOE 2007). Due to rapid urbanization and changes from undeveloped to developed area, Langat River experienced changes of discharges and direct runoff volume. A study done by Juahir et al (2010b) has showed that there is a relationship between land use and discharge or flow rate and runoff in Langat River. The annual mean discharge and direct runoff of Langat River at two selected gauging stations, i.e. Dengkil Station (station number 2816441) and Lui Station (station number 2917401) are shown in Fig. 5 and Fig. 6. The Dengkil gauging station is located at the downstream of Langat River (2 ○ 51'20''N, 101 ○ 40'55''N) and Lui gauging station is located at the upstream of Langat River (3 ○ 10'25''N, 101 ○ 52' 20''E). The increment trend of discharge and direct runoff at Dengkil Station compared to Lui Station is consistent with the increasing trend of urban development and the decrease of agriculture and forest areas within the region (Juahir et al 2010b).

Water quality data
Water quality data used in this study were obtained from the Malaysian Department of Environment (DOE). The data obtained however, were not collected at regular time intervals and to facilitate the analysis, quarterly data was used instead. Time series data from September 1995 until December 2007 for selected parameters and stations were used in the present study. Since quarterly data is used in the analysis, the data that represent the first quarter is taken from the last month of that quarter i.e. data from the month of March. Similarly, for the second quarter, data from the month of June will represent the data for that second quarter. However, if the last month of a quarter does not contain any data, then data from either the first or second month of that quarter will be considered . For example, for quarter 1, the data that represent quarter 1 will either be from the month of February or January, likewise for quarter 2 the data from the month of May or April will be taken to represent the second quarter data.
The six selected water quality variables used in this study are Suspended Solids (SS), Biochemical Oxygen Demand (BOD), Ammoniacal Nitrogen (AN), Chemical Oxygen Demand (COD), Dissolved Oxygen (DO) and pH. These variables were selected by the panel of experts as the variables that when calculated and use collectively will give some indication on the water quality level or water quality index of a river (DOE 1997). According to the best-fit relationship for each six parameters, the new variables of the 6 sub indices (SI) were determined and the overall trend for Langat River were obtained using the formula given by Generally, WQI is a unitless number varies between 0 and 100. Measurements of each of these parameters are taken and compared to a classification table (see Table 5), where the water is identified as excellent, good, fair, poor or very poor (DID 2009

Graphical analysis
Graphical analysis is very useful in data analysis and helps the researcher in seeing pattern, trends and other features not easily apparent using numerical summaries. Box-Whisker plot, normal probability plot and scatter plot were used to analyze data graphically in this study. Displaying data using graphs allow for more effective visualization and presentation of large data sets in a small space (Cooke et al 2005). By using graphical analysis, we can visualize any gaps in the data, relationship between variables and trends that might exist in the selected water quality data.

Box-whisker plot
Box-Whisker plot is a powerful exploratory data analysis tool. It is also called the fivenumber summary (Tukey, 1977). To plot, the given sub index data are ranked from smallest to largest value. Then, the five-number summary which include the smallest and largest values, the median (a measure of central tendency that is more robust and not sensitive to outlying values, indicates the 50 th percentile), and the lower and upper hinges (i.e. 0.25 and 0.75 quartiles respectively) are obtained. This information is then represented by the Box-Whisker plot. The box represents the inter-quartile range and the whiskers are lines that extend from the box to the highest and lowest values, excluding outliers. The outliers are individual points with values beyond the highest and lowest limits and are plotted with asterisks. A line across the box indicates the median of the www.intechopen.com data. If the median lie in the middle of the box with upper and lower whiskers of similar length, it shows that data for any given year or quarter are symmetric. The lacks of symmetry suggest departure from normality. Apart from that, the plot can also give immediate visuals about the center, the spread, and the overall range of distribution. Additionally, the median confidence intervals can be plotted along the boxes in Minitab as shown in Fig. 8-Fig. 10 and are useful in offering a rough guide to determine the differences of medians. If the two boxes do not overlap, this offers evidence of a statistically significant difference between the medians (Mcgill et al 1978). In this study, the plots are used to evaluate several things, for example to ascertain how the sub indices and water quality index time series data are distributed, to demonstrate outlier and to evaluate normality of the data. It also can be used to visualize the median differences and to track the annually and quarterly changes to the water quality data.

Normal probability plot
Normal probability plot is a graphical method for testing normality. If our data follows the hypothesized normal distribution, then the plotted points fall approximately along a straight line. There are various test statistics for normality and Anderson-Darling statistic is a widely used test (Montgomery et al 2008). If the p-value is smaller than the critical value (usually 0.05), the underlying population is not normal.

Scatter plot
The sub indices water quality parameters with normal distribution will be selected based on the results from Box-Whisker plot and probability plot. The scatter plot of the selected parameters was constructed to see any trend and seasonal patterns. Scatter plot is used to show the relationship between dependent variable (sub indices water quality parameters) and independent variable (time). Each value of dependent variable is plotted against its corresponding time. If the sub indices for water quality parameters values tend to increase or decrease in a straight line fashion as the time increases, and if there is a scattering of the (time, sub index) points around the straight line, then it is reasonable to describe the relationship between the sub index and time by simple linear regression model. This could help to identify whether the differences in water quality variables are due to the actual trend or due to the changes in the water quality variables. Fig. 7 shows the analysis framework to be used in this study. The first step is to assess the temporal variations of the sub indices and water quality variables. The assessments were performed using Minitab 15.1. Then, exploratory assessment is carried out to determine the annual and quarterly changes of the selected water quality variables. From this analysis, if there seem to be some apparent trends or oscillations in the selected water quality variables, it will be evaluated. To assess the statistical significance of the changes and to examine the uncertainty about the possible trend and seasonality, regression and additive decomposition analysis were performed. To continue with these analyses, the variables considered are those without any outliers and are normally distributed. Models fitted for the selected variables at certain stations can be used to predict future values.

Linear regression analysis
Linear regression analysis is an important parametric method to identify the monotonic trend in a time series. It is useful to describe the relationship between variables. The method is often performed to determine the slope of selected variables. In this study, the regression analysis was used to investigate and to model the relationship between the selected water quality variables versus time. The slope indicates the mean temporal change of the variables. Positive values of the slopes show increasing trends in the mean temporal change while negative values of the slopes indicate decreasing trends.

Trend analysis
The trend model where, = variable values, = trend and = error term, in time period t.
This model explains that the time series can be represented by an average level (denoted = + * ) that changes over time according to the equation = and by the error term, .

Additive decomposition analysis
The additive model is useful when the seasonal variation is relatively constant over time. In the present study, the selected parameters were separated into linear trend, seasonal and error components.
where, = trend, = seasonal, = cyclical and = error term in time period t.
Basic steps in decomposition method are: 1. Estimate the trend using centered moving average. 2. De-trend the series by subtracting the trend estimated in (1)

Annual changes
Box plots provide a visual impression of the location and shape of the underlying distributions (Vega et al 1998). In this section, the presence or absence of trends over time was examined graphically through Fig. 8. From Fig. 8, SIDO and WQI showed a positive trend except for certain particular years for example in 1997 and 1999 for SIDO and 1997 for WQI where the medians were clearly lower than the median for the rest of the data. For SICOD and SIpH, the median were fairly stable without many variations for most of the years selected. However for SIBOD, the trend highly fluctuates from 1995 up to 1999 and after 2000 the fluctuation stabilised. The trend for SISS, however, showed a decreasing trend for three consecutive years from 1998 to 2000 and then in 2001 the trend increased. Similar decreasing pattern was observed again in 2004. The decreasing and increasing trend in SISS was also observed in SIAN but the decreasing trend started in 1995. The changes that occurred to the annual trend of water quality variables were influenced by many factors for example, certain hydrological events and/or developments in the river basin (Ravichandran 2003).

Quarterly changes
To plot the graphs in Fig.9, all quarterly data for each parameter and each station were combined from all selected years from 1995-2006. For example, quarterly data for SIDO for all selected years were combined for all stations i.e. Station 1 to Station 6. The distribution of each parameter and each quarter can then be examined as shown in Fig. 9. From Fig. 9, SIDO, SIBOD, SIAN, SISS and WQI showed that the quarterly median values did not differ much. The median values for SIDO in quarter 3 and quarter 4 were fairly equal. However, SISS showed a decreasing trend in quarter 4. The quarterly medians in SICOD and SIpH were generally similar for all quarters except in quarter 4 where SICOD showed a slight increase in the quarterly median.

Outliers detection
The box plots in Fig. 10 show that most of the variables depart from normality in their skewness. Many variables also have outliers and extreme values. To ease our analysis, variables with no outliers were selected i.e. SIDO at Station 1-5, SIBOD at station 5, SICOD at station 1 and 5, SIAN at station 2-5, SISS at station 2-3, SIPH at station 2 and WQI at station 1,4 and 5.

Normality test
Normality tests were applied to selected variables in Table 6 at particular sampling stations. From Fig. 11-Fig. 15, normal probability plots showed that SIDO at Station 2, 4 and 5, SIBOD,SICOD,SIAN at Station 5, SISS at Station 2, SICOD at Station 1 and SIAN at Station 4 were not normally distributed based on the Anderson-Darling test. Table 6 shows the list of the parameters with the Anderson-Darling test statistics and the corresponding p-values in the bracket.

Variables
Station 1

Trend and seasonal analyses
Trend and seasonal analyses using scatter plot were performed for selected stations in Fig.  16 i.e. SIDO at Station 1 and 3, SIAN at Station 2 and 3, SISS at Station 3 and WQI at Station 1, 4 and 5. Scatter plots in Fig.16 showed positive trend patterns in SIDO and WQI as well as quarter-to-quarter variations; hence, the model should include both trend and seasonal variations. The magnitudes of the seasonal variations were fairly constant around the level of the series, so an additive model is appropriate. To check on the significance of trend and seasonality, regression analysis and decomposition analysis were carried out. The trendonly models in regression analysis exclude the seasonal variation for both variables. Notice that the values for the trend-only models in Table 7 are between 16.70 and 38.20. To improve the forecast accuracy, seasonal variations for both variables were taken into account. Four indicator variables which represent quarter 1 to quarter 4 were used to model the seasonal variations and to test whether the seasonality were statistically significant. In Minitab, the last indicator variable is removed because it is highly correlated with the first three indicator variables for both SIDO and WQI. The results in Table 8 show that the values for the trend-and-seasonal models were increased between 39.70 and 50.30. Even though the increase was not substantial, but it is acceptable for quarterly time series data.  Table 8. Results of regression trend-and-seasonal models analysis for SIDO and WQI in selected stations On the other hand, Fig. 17 showed the results from Minitab time series decomposition analysis of WQI showing the original data (labelled 'actual') along with the fitted line ("Trend") and the predicted values ("Fits") from the additive model which include both the trend and seasonal components. Details of the seasonal analysis were shown in Fig. 18 and Table 9. Estimates of the quarterly variation from the trend line for each season (seasonal indices) are shown in Fig. 18a with box plots of the actual differences shown in Fig. 18b. The percentage of variation by seasonal period was illustrated in Fig. 18c and model residuals by seasonal period in Fig. 18d. Unfortunately, many decomposition methods do not perform significance tests on seasonal indexes. The significance tests on seasonal indexes are important to believe that the seasonal variations exist. The indexes can be tested to confirm that they are statistically different from zero. Since decomposition methods do not perform such tests, the easiest way to test their significance is to create indicator variables as performed in the previous regression analysis. Additional details of the component analysis are shown in Fig. 19. Fig. 19a is the original time series. Fig. 19b is the plot of the time series with the trend removed. Fig. 19c is a plot of the time series with the seasonality removed (should see a trend pattern) and Fig. 19d is a residual plot of the detrended and seasonally adjusted data. The wave-like pattern in Fig. 19d suggests the constant variance over time in WQI at Station 5.

Forecasting model evaluation
The measure of forecast accurracy were evaluated as part of model validation effort (Montgomery et al 2008). To evaluate the resulting model in Section 5.4, the accuracy measures of the model were determined as summarized in Table 9. There were three www.intechopen.com measures reported in this study, Mean Average Percentage Error (MAPE), Mean Average Median (MAD) and Mean Square Error (MSE). Focusing on the WQI model at Station 5, the accuracy measures were 20% for MAPE, 9.05 for MAD and 116.88 for MSE. Obviously, small variability in forecast errors is prefered in all forecasting models, but a larger forecast error (i.e. residual) or a relatively small one is very subjective. Therefore, the results of MAPE, MAD and MSE in this case can be reasonably accepted. Further, normality test on the distribution of forecast error was examined. From Fig. 20, the p-value is 0.211, so the hypothesis of normality for the forecast error could not be rejected at the 0.05 level. The forecasts that could adequately model all the structure in the data and the sequence of forecast errors would have no systematic or nonrandom pattern (Montgomery et. al 2008) as shown in Fig. 21. From Fig. 22, the sample autocorrelation function (ACF) shows that all spikes in the sample ACF at lower lags are inside the confidence interval limits. This suggest that there is no pattern in the forecast errors. Therefore, there is strong evidence to support the claim that the residuals are not correlated. Since quarterly data were used, the forecast values in Table 10 with MSD value is considered reasonable for this model.

Conclusion
In this study, various techniques were utilized to evaluate temporal variations in the surface water quality of Langat River. From box plot and median analysis, the annual changes of SIDO and WQI showed an oscillating trend in annual variation with noticeable increment. The annual variation for parameters SICOD, SIBOD, SIAN and SISS showed very little or no trend. There is no trend exhibited by SIpH and its quarterly values do not differ much. The quarterly analyses were examined and the results showed that the median of SIDO, SIBOD, SIAN, SIpH and WQI were lower in quarter 2 than the median of SISS in quarter 4. Significant trends in water quality were found in SIDO at Station 1 and 3 and WQI at Station 1, 4 and 5. However, the effects of quarters appear to be prominent only in WQI at Station 5. WQI is the most significant variable contributing to water quality variations for all quarters. Therefore, further analysis should be carry out to study the relationship between the location of the station (i.e. Station 5) and, sampling measurement time with all the variables that strongly influencing WQI such as urbanization, population density, water shortages and pollution (Cheng et al 2003). In addition, Sapari et al (2009) mentioned that urban NPS pollution has become a growing concern for most major towns in Malaysia due to the serious threat of pollution to river water quality in urban environment. Since this study focus only on the sub indices and index, it would be additionally informative if the same analyses are repeated to all available variables with longer data sets (i.e. monthly data in [1995][1996][1997][1998][1999][2000][2001][2002][2003][2004][2005][2006]. Longer data sets could provide clearer indication of trend and seasonal pattern inherent in the time series data. Imputation method should also be considered to overcome the problem of unequal spacing in the measurements.