## Abstract

In recent years, discharge of synthetic dye waste from different industries leading to aquatic and environmental pollution is a serious global problem of great concern. Hence, the removal of dye prediction plays an important role in wastewater management and conservation of nature. Artificial intelligence methods are popular owing due to its ease of use and high level of accuracy. This chapter proposes a detailed review of artificial intelligence-based removal dye prediction methods particularly multiple linear regression (MLR), artificial neural networks (ANNs), and least squares-support vector machine (LS-SVM). Furthermore, this chapter will focus on ensemble prediction models (EPMs) used for removal dye prediction. EPMs improve the prediction accuracy by integrating several prediction models. The principles, advantages, disadvantages, and applications of these artificial intelligence-based methods are explained in this chapter. Furthermore, future directions of the research on artificial intelligence-based removal dye prediction methods are discussed.

### Keywords

- multiple linear regression (MLR)
- artificial neural networks (ANNs)
- least squares-support vector regression (LS-SVM)

## 1. Introduction

Recently, pollution of water sources by various contaminants becomes a global environmental issue [1]. Among different types of water contaminants, dyes as part of human’s life are a major contamination group [2]. Dyes are widely used as coloring agents in the textile, plastics, paper, leather, food, antiseptics, cosmetics, fungicides, and so forth and can be entered to the environment through colored wastewater from these industries [3]. However, during the coloration processes, a significant amount (20–50%) of these dyes is lost and released into the environment as colored wastewater. However, due to toxicity, carcinogenicity, mutagenic and teratogenic properties, and a long-standing environmental pollution, contaminants have become a great environmental concern with potential adverse effects to human health [4]. Also, it can enter the body via the pulmonary system or the digestive system by ingesting contaminated water or food [5, 6, 7]. Even though they can be influenced on the photosynthesis process through reduce light penetration and result in reduction oxygen levels in water and, in severe case, consequential the suffocation of aquatic flora and fauna.

Therefore, it is necessary to be removed from wastewater before discharging into bodies of water. Though a number of processes such as ozonation, filtration, membrane, coagulation, precipitation, adsorption, electrochemical techniques, and biosorption have been applied to treat colored textile wastewater from aqueous media [8, 9, 10, 11]. However, most of the mentioned methods have shown various restrictions including generation of huge amounts of sludge by the means of electrochemical and chemical coagulation processes or request to high technology with high cost in the membrane technology and advanced oxidation process [12]. Among different treatment methods, adsorption has found particular attention from the researchers worldwide due to the fact that the adsorption is an easy operating, effective, single, and cost-effective option for pollutant removal from aqueous environment.

Although the adsorption is an easy operating technique, it is known as a complicated process in the chemistry and dependent on several factors which have a direct impact on the process performance. Thus, it is vital to select an appropriate mathematical model for optimizing and predicting the removal process. The modeling and optimization of adsorption are still in the stage of research. Commonly used models for describing kinematic and/or equilibrium studies (e.g., second-order models or Langmuir, Freundlich) may be inadequate in determining the relationship between factors and evaluating their effect on the absorption process. Optimization is a way for determining the best solution in terms of certain quality criteria, such as process efficiency, and results in improving the performance of the process or designed system [13, 14].

The optimization of the adsorption process tries to find out the design and/or environmental parameters at which the adsorption process would give the best efficiency (Figure 1) [15].

Typically, experiments are carried out in a way that one factor is applied and then analyzed, while other factors remain unaffected. In the usual way, one-factor-at-a-time approach is generally time-consuming, and it is impossible to achieve optimal desirability due to the need for a large number of experiments and lack of interaction among factors. This approach is time-consuming; the researcher must screen all the variables independently and require a large number of tests, leading to high cost of study. In addition, a variable in time does not include interactions between selected parameters. This method is called one variable at time (OVAT) [16].

Multivariate statistics techniques (MST) can significantly reduce the number of experiments and explanations of the independent variable (in combination or individually) in the process. MST helps to develop and optimize the operating system, significantly reducing the cost of testing [17].

Adsorption process is a complex process; therefore, due to complexity of the relationship between output and input parameters, it is difficult to be modeled using statistical approaches. Computational intelligence models are often more flexible than statistical models when modeling complex datasets with possible nonlinearities or missing data [18]. Recently, powerful AI prediction method, such as random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), least square-support vector regression (LS-SVR), radial basis function neural network (RBF-NN), boosted regression tree (BRT), and artificial neural network (ANN) in modeling adsorption process have been successfully used [18, 19, 20, 21, 22, 23, 24]. AI is the branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology; it can be employed to explain and model many complex chemical systems because of its reliability, robustness, simplicity, and nonlinearity. AI approach can be learned from experimental data to solve of the complex nonlinear, multidimensional functional relationships without any prior assumptions about their nature [25].

The previous reviews of adsorption procedure and engineering applications of AI confirmed that there is no specific review on the usage of AI for adsorption process.

The two main objectives of this study are (i) summarizing research on the absorption of dyes by AI models and (ii) providing more research needs for AIs for dye absorption.

## 2. AI definition

AI is a subcategory of computer science. Its goal is to enable the development of computers that are able to do things normally done by people. The Stanford researcher, John McCarthy, was named, in 1956, at the current Dartmouth conference, where the mission of the AI field was defined. If we start with this definition, each program can be considered as AI if it does something that we usually think as intelligent in humans.

The typical AI-based prediction method consists of four main steps. The first step is to acquire input and output data. Input data are those aspects that affect or relate to output data. These aspects include but are not limited to pH, sonication time, adsorbent dose, temperature, and initial concentration of contaminant. The output data is removal percentage. The next step is to preprocess the collected data in an appropriate format before using them to train the forecast model. Some data pre-processing techniques such as data normalization, data transfer, and data interpolation are applied at this stage to improve data quality and reduce negative impact. When the data is ready, the third step is to train the prediction model.

Since the crucial concept of empirical modeling is learning from historical data, a training process is essential for the development of the model. This step is achieved by selecting the appropriate parameters for the model. The type of parameters is determined by the algorithms selected by the researcher, while selecting the proper parameter can guarantee the performance of the model. The last step is testing the model. At this stage, the data test is examined to test the prediction of model performance. Performance indicators such as RMSE, R^{2}, MAE, and AAD are used to evaluate performance.

The AI can be more classified into four types (i.e., multiple linear regression (MLR), artificial neural network (ANN), least square-support vector machine (LS-SVM), and ensemble prediction models) based on the learning algorithms; the following part of this section describes main techniques used for AI-based prediction model.

### 2.1. Multiple linear regression

MLR is a statistical technique that uses several explanatory variables to predict a response variable. The goal of MLR is to model the relationship between the input and response variables [26, 27].

The model for MLR, given n observations, is

The response surface methodology (RSM) is the most popular methods used in the absorption research literature. RSM determine the mathematical relation between parameters and responses. Modeling or model fitting in RSM consists of two steps: coding of experimental data and regression. For the first, input and output data was coded by using the general equation due to RSM operate on coded input values like +1, 0, and − 1 instead of actual values [28].

In the next step, coded experimental data are fitted to a selected model using multiple linear regression (MLR). In spite of the simplicity of MLR, it was used recently in the removal processes. As the first reported work was in 2000, conducted by Annadurai. Annadurai [29] developed regression models to predict Direct Scarlet B. The inputs for the regression models include the pH, temperature, and the particle size. The proposed models showed promising features to be easy and efficient forecast tools for calculating removal percentage dye from aqueous solution. More recently, our group simplified their MLR model by introducing only four inputs, namely, pH, sonication time, adsorbent dose, and the initial dye concentration. Their results indicated that the proposed method cannot well predict the removal dye percentage [13, 30, 31, 32, 33, 34, 35].

Ease of use is accounted as one of the main advantages of the MLR method because no parameter needs to be adjusted. Meanwhile, since no detailed physical information is required, this method is efficient and cost-effective. Nonetheless, the MLR is the main constraint due to the inability to deal with nonlinear problems, although the previous research has proven that the MLR can be used as an efficient tool for predicting parentage removal [10, 25, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48].

### 2.2. Artificial neural network (ANN)

ANNs are computing systems inspired by the biological neural networks that constitute animal brains [25, 49]. An ANN is based on a collection of connected units or nodes called artificial neurons (analogous to biological neurons in an animal brain). Each connection between artificial neurons can transmit a signal from one to another. The artificial neuron (AN) that receives the signal can process it and then signal artificial neurons connected to it. ANs typically have a weight that increases or decreases the strength of the signal at a connection. Signals travel from the first input, to the last output layer, possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. Over time, attention is focused on matching-specific mental abilities, leading to deviations from biology. ANNs have been used on a variety of tasks, including speech recognition, computer vision, social network filtering, machine translation, playing board and video games, and medical diagnosis.

In the past two decades, many studies have been carried out to predict various types of decolorization dye from aqueous solution, such as electrocoagulation process [49], Fenton process [50], and adsorption [51] by applying ANNs. M. Ahmadi and Kh. Naderi applied general regression neural network (GRNN) to predict the removal of methylene blue (MB) and Basic Yellow 28 (BY28) from aqueous solution. Their findings indicated that a well-designed GRNN is able to predict the removal of azo dye based on sonication time, initial dye concentration, and adsorbent mass. Ahmadi and J. Pooralhossini used backpropagation neural network (BPNN) to predict the decolorization of sunset yellow (SY) and disulfine blue (DB) [52]. The obtained results show that the BPNN model outperforms the classical statistical model in terms of R^{2}, RMSE, MAE, and AAD for both dyes. Ahmadi and team used BPNN to predict the efficiency of two carcinogenic dye (methylene blue (MB) and malachite green (MG)) adsorption onto Mn@ CuS/ZnS nanocomposite-loaded activated carbon (Mn@ CuS/ZnS-NC-AC) as a novel adsorbent to identify the model parameters in order to improve the prediction performance [35]. Ahmadi and Dastkhoon used neural network to predict Safranin-O (SO) and indigo carmine (IC) adsorption onto Ni:FeO(OH)-NWs-AC. In this work, the influence of process variables (initial dye concentration, adsorbent mass, and sonication time) on the removal of both dyes was investigated by central composite rotatable design (CCRD) of RSM, multilayer perceptron (MLP) neural network, and Doolittle factorization algorithm (DFA). The ANN model was found to be more precise than the other models. We performed the sensitivity analysis (by using of weight neuron) and confirmed that sonication time has the essential factor affecting the removal of SO and IC [33]. Ahmadi et al. developed a BP neural network model and partial least squares (PLS) to predict the ultrasonic-assisted simultaneous removal of fast green (FG), eosin Y (EY), and quinine yellow (QY) from aqueous media following the use of MOF-5 as a metal organic framework and activated carbon hybrid (AC-MOF-5). The obtained results show that ANN and PLS model is a powerful tool for prediction of under-study dye adsorption by AC-MOF-5 [53].

The main advantage of ANN method is its ability to detect complex nonlinear relationship between the inputs and outputs that this characteristic makes it possible to be applied for real systems. However, ANN method fails to establish any interconnection relationship between building physical parameters and removal percentage, which limits the model’s fitting ability.

### 2.3. Least square-support vector machine (LS-SVM)

SVM as a learning method was developed by Vapnik and is a powerful tool [34, 54]. This supervised learning method can be used for regression or classification in nonlinear models, and density estimation leads to complex optimization problems, typically quadratic programming. However, this method (SVM) is often time-consuming and difficult to adapt, suffering from the problem of a large memory requirement and CPU time when trained in batch mode. This limitation is overcome by LS-SVM as the modified version of SVM which solves the set of linear equations instead of the quadratic programming problem to minimize the complex nature of the optimization processes. The theory and more details of SVM and LS-SVM can be found in the literature.

Liang and team first applied LS-SVM in the area of water quality measurements in 2011. They showed that the model output made a relatively good training fitting effect and the predict effect was relatively satisfactory. The model has not only a good learning accuracy but a good generalization ability. The predictive fitting precision of the test data set was more than 90%, and the prediction error is minimum, and RMSE is 0.0028 [55].

Similarly, our group used LS-SVM for the optimization and/or modeling of pH, ZnS-NPs-AC mass, MB concentration, and sonication time to develop respective predictive equations for the simulation of the efficiency of MB adsorption. The obtained results using LS-SVM exhibit a nonlinear approach which shows better performances in comparison to central composite design (CCD) for the prediction of MB adsorption [34].

Niyaz Mohammad Mahmoodi and team used of least square-support vector machine (LS-SVM) to model the dye removal. The graphical plots and the values of statistical parameter showed LS-SVM as an intelligent model suitable for modeling of dye adsorption [56].

Our group compared LS-SVM with other AI-based prediction methods in removal dye prediction. We compared SVR with several ANN models for prediction of the adsorption of methylene blue (MB) from aqueous solutions by zinc sulfide nanoparticles with activated carbon (ZnS-NPs-AC). Also, a multiple linear regression (MLR) model and LS-SVM model with principal component analysis (PCA) were used for pre-processing to predict the efficiency of methylene blue adsorption onto copper oxide nanoparticle loaded on activated carbon (CuO-NP-AC) by Ghaedi et al. Both studies indicate that SVM has a better performance in building model and prediction than other AI-based prediction methods [57].

#### 2.3.1. Advantages and limitations

The main advantage of LS-SVM that was introduced by Foucquier et al. [58] is based on the structural risk minimization principle which aims to minimize the upper bound of the general error consisting of the sum of the training error. Also, SVM provides a better balance between prediction accuracy and computation speed comparing with ANNs and RSM [34]. The limitation of SVM method is the determination of kernel function. There is no uniform standard for determining which kernel will result in the most accurate SVM. Researchers have to determine the kernel function based on the characteristics of the data as well as their own experience.

#### 2.3.2. Ensemble prediction models

Whereof each prediction method has its own limitations, currently, a trend has evolved to introduce new mathematical methods, called ensemble learning. These methods have been more extensive in analytical chemistry for complex data analysis. The adaptability of data mining methods makes them able to deal with typical problems: too many descriptors in the model, mixtures of different data types, complex data having missing values, multiple classes, or unbalanced data sets. For instance, some data mining methods such as support vector machine [34] and artificial neural networks (ANNs) [59] have been applied in different realms of science and engineering. The Ensemble prediction has become increasingly popular in chemistry [60, 61].

This method differs from each different prediction method because it creates a composite model which integrates different individual prediction models. Instead of a prediction algorithm, this method acts as a framework to reduce prediction errors by combining different reduction algorithms together. Ensemble prediction methods have been successfully applied in several areas of chemistry that they have large data volumes, including spectroscopy [62, 63], quantitative structure–activity relationship (QSAR) modeling [64, 65], and omics sciences [66]. Ensemble prediction is able to manage with many types of responses and predictors such as categorical or numeric and loss functions such as Laplace, Gaussian, Poisson, and Bernoulli [67]. De’ath [68] has shown that ensemble prediction methods, unlike many other regression methods, can be used for both prediction and explanation of the underlying relationships between response and predictors.

The two main outputs of an ensemble prediction model which are the partial dependence plots and the variable importance rankings can be used together for model interpretation. Friedman [69] has proposed that while these outputs might not offer a complete description, they can at least give an insight of the relation between the response and the predictors.

Some authors (e.g., Hastie et al.) [70] have shown that ensemble prediction is one of the most powerful machine/statistical learning ideas than have been presented during the 1990s, and it has been suggested [71, 72] that the application of ensemble prediction to classification and regression trees results in individual classifiers (e.g., classification trees, regression trees) which generally are competitive with any other method.

In addition, Breiman [71] shows that applying ensemble prediction to classification and regression tree can actually be much quicker than fitting a neural net classifier.

Due to the high prediction accuracy, ensemble learning method has become a favorable topic in recent years and has already been applied to many fields successfully. For example, our group [73] developed boosted regression tree (BRT), an ensemble method for fitting statistical models that differ fundamentally from conventional techniques that aim to fit a single parsimonious model. In this study, response surface methodology (RSM), artificial neural network (ANN), and BRT have been used for the optimization and/or modeling of stirred time (min), pH, adsorbent mass (mg), and concentrations of MB and Cd^{2+} ions (mg L^{−1}) to develop respective predictive equations for simulation of the efficiency of MB and Cd^{2+} adsorption based on experimental data set achieved in batch study. All three models showed good predictions in this study. But the BRT model was more precise than the other models, and it showed that BRT is a powerful tool for modeling and optimizing removal of MB and Cd(II).

Similarly, our group [32] used adaptive network-based fuzzy inference system (ANFIS) ensemble models as a support tool for examining data and making prediction to recognize and predict the removal percentage in MB and SY dye solution of different concentrations. The predictive capabilities of MLR and ANFIS are compared in terms of square correlation coefficient (R^{2}), root mean square error (RMSE), mean absolute error (MAE), and absolute average deviation (AAD) against the empirical data. It is found that the ANFIS model shows the better prediction accuracy than the CCD model.

In another work by our group [13], random forest (RF) and response surface methodology (RSM) were used to model and predict the efficiency of malachite green removal from aqueous solution by ultrasound-assisted adsorption onto the silver hydroxide nanoparticles loaded on activated carbon (AgOH-NPs-AC). The parameters such as pH, initial MG concentration, sonication time, and adsorbent dosage involved in the adsorption process were set within the ranges 2.0–10, 4–20 mg L^{−1}, 2–6 min, and 0.005–0.025 g, respectively. The performance of the RF and CCD models for the description of experimental data was evaluated in terms of R^{2}, RMSE, MAE, and AAD. The obtained results showed that the RF model outperformed classical statistical model for modeling the process of dye adsorption.

Also, ensemble prediction approach (i.e., radial basis function neural network (RBF-NN) and random forest (RF)) was developed and evaluated against a quadratic response surface model to predict the maximum removal efficiency of brilliant green (BG) from aqueous media in relation to BG concentration (4–20 mg L^{−1}), sonication time (2–6 min), and ZnS-NP-AC mass (0.010–0.030 g) by ultrasound-assisted adsorption [31]. All three (i.e., RBF network, RF, and polynomial) models were compared against the experimental data using four statistical indices, namely, R^{2}, RMSE, MAE, and AAD. Graphical plots were also used for model comparison. The obtained results using RBF network and RF exhibit a better performance than MLR for both dyes.

The main advantage of the ensemble method is the improvement of accuracy. Also, these methods incorporate important advantages such as accommodating missing data and handling different types of predictor variables. In addition, they have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and can automatically handle interaction effects between predictors. However, compared with other predictive methods, ensemble models require more time to calculate and a high level of knowledge as a combination of different base models. Another disadvantage of the model is the fact that its predictive function depends on the selection of base models. In the previous study, the researchers selected the base model based on their previous knowledge. There is a lack of approach to determine which base model should be considered and included in the ensemble model.

## 3. Discussion

According to previous researches, each type of AI-based prediction method has its own disadvantages and advantages; thus scientists have to select suitable method to solve their problems. For example, MLR is more appropriate than other methods in predicting removal dye because of its ease of use and high calculation speed. While LS-SVM and ANNs are more suitable for real system with high nonlinearity because of their high level of prediction accuracy. On the other hand, some researchers have tried to compare AI-based prediction methods with other methods in removal dye prediction. For example, Tanzifi [74] compared ANNs with RSM for predicting removal Amido Black 10B. The comparison of the adsorption efficiencies obtained by the ANN model and the experimental data evidenced that the ANN model could estimate the behavior of the Amido Black 10B dye adsorption process under various conditions. Their study proposed ANN model as a simpler and more efficient building energy prediction tool when compared with energy simulation software. Based on these researches, the advantages and disadvantages of AI-based prediction methods are summarized below.

### 3.1. Advantages

The advantages of AI model are:

Comparing AI with other engineering approaches shows that the building and/or development of the AI-based prediction methods does not need any detailed physical information which in return saves both cost and time for leading the prediction.

Based on previous study, if model is well trained, AI methods give promising prediction accuracy.

The data gaining and data loading process is relatively simple, which means the prediction model can be easily obtained.

## 4. Conclusion

There are many published papers on the prediction of dye adsorption using OVAT. However, in this chapter, we review the important research studies on dye adsorption forecasting using AI methods. The literature survey in this chapter showed that the AI approaches can be successfully used for the modeling and predication of dye adsorption process with acceptable accuracy compared to conventional linear models such as RSM. The future research proposed for AIs in the field of dye removal for carrying out extensive studies are as follows:

The prediction capability of other AI models such as group method of data handling (GMDH), random forest (RF), neural gas network, regression tree, and radial basis function network (RBFN) for dye adsorption needs further research studies.

The hybridization of the AIs together such as LS-SVM and GMDH forecasting methods, regarding the potential of predict dye adsorption, was proposed.

A few studies have been reported about combining ANN approaches with optimization algorithms. However, it is necessary to extend the optimization of network configuration for modeling adsorption process using an evolutionary computation method such as ant colony, PSO, GA algorithm, tabu search, artificial bee colony, firefly algorithm, teaching-learning-based optimization, harmony search, shuffled frog-leaping algorithm, simulated annealing, and invasive weed optimization.

Based on the reported study and discussions presented in this chapter, it can be concluded that AI methods are excellent approaches for the adsorption of dyes. The information offered in this chapter would be highly useful to the scientists working in the field of dye adsorption and AIs in their investigations.