Open access peer-reviewed chapter

Soft Sensors for Biomass Monitoring during Low Cost Cellulase Production

By Chitra Murugan

Submitted: June 1st 2020Reviewed: January 15th 2021Published: February 10th 2021

DOI: 10.5772/intechopen.96027

Downloaded: 145


Low cost cellulase production has become a major challenge in recent years. The major hurdle in the production of biofuel and other products from biomass is the lack of efficient economically feasible cellulase. This can be achieved by proper monitoring and control of bioprocess. In order to implement any control scheme, the accurate representation of the system in the form of a model is necessary. There are many challenges associated with modeling the fermentation process such as inherent nonlinear dynamic behavior, complexity of process due to co-existence of viable and nonviable cells, presence of solid substrates, etc. Toward the achievement of this goal, researchers have been developing new techniques that can be used to monitor the process online and at-line. These newer techniques have paved the way for designing better control strategies that can be integrated with quality by design (QbD) and process analytic technology (PAT).


  • biomass
  • lignocellulosic substrates
  • hybrid model
  • NARX
  • LSTM

1. Introduction

Monitoring and control of biomass plays a vital role during fermentation process [1]. Low cost Production of ethanol/cellulase from lignocellulosic substances has been a topic of research for the past few years as it uses the carbon source from industrial or agricultural waste [2, 3]. These processes mostly involve fungi such as Aspergillus, Trichoderma. One of the main challenges during production of low cost cellulase is the estimation of biomass in the presence of insoluble solid substrates. The conventional methods such as dry weight filtration, optical density become unusable during fungal biomass estimation [4]. Methods such as monitoring DNA concentration, Image analysis were widely used. However, for the control of fermentation process, continuous monitoring of biomass is essential.


2. Biomass estimation techniques

The first principles model based on mass or energy balance equations is widely used in industries. Unstructured models such as monod model captures the process dynamics effectively only during log phase and on the other side, structured models have more parameters and are difficult to use [5]. The kinetic parameters in first principles model such as specific growth rate, biomass yield are found using laborious experimentation methods [6].

New process monitoring approaches use several online sensors to determine the concentration of viable biomass that are found useful during control of fermentation process. A simple on-line method for fungal biomass estimation based on agitation rate has been evolved for DO stat cultures. The estimator is developed based on changes in dissolved oxygen concentration in the initial transient time and yield change. The estimation parameters are found using agitation rate at 20% of DO concentration [7].

Dielectric spectroscopyhas been widely used for monitoring biomass during submerged fermentation. The working of capacitance probe is based on cell membrane polarization. When the microbial cells are placed in ionic solution and are subjected to alternating electric field, they will act as capacitor due to restriction of ion movement by cell membrane. The on-line capacitance value represents viable biomass as dead cells do not polarize [8]. The Biomass Monitor TM, model 214 M (Aber Instruments, Aberystwyth, UK), dual frequency version (0.2–1.0 MHz, and approximately 9.5 MHz) is the commonly used capacitance probe.

Another method for estimation of viable biomass is the radio-frequency (RF) impedance spectroscopy [9]. This method provides useful information on the live cell concentration both in fixed as well as in dual frequency mode. To identify important changes in the process or to control the biomass at a constant level, the determination of on-line live cell concentration can be useful. Electrochemical impedance spectroscopy (EIS)is another method used to monitor biomass during fermentation [10]. The increase in biomass during cultivation is proportional to the increase in the double layer capacitance (Cdl), determined at frequencies below 1 kHz. A good correlation of Cdl with cell density is found and in order to get an appropriate verification of this method, different state-of-the-art biomass measurements are performed and compared. Since measurements in this frequency range are largely determined by the double layer region between the electrode and media, rather minor interferences with process parameters (aeration, stirring) are to be expected. It is shown that impedance spectroscopy at low frequencies is a powerful tool for cultivation monitoring. Though these dielectric spectroscopy, impedance spectroscopy techniques have been reported in literature, these methods require costlier instrumentation.

In recent years, soft sensors are widely used for the estimation of biomass. Soft sensorsestimate the unknown state variable by using some other measured variables that influences the unknown state [11]. The data-driven methods widely used for the soft-sensor modeling are support vector machine, multiple least square support vector machine, neural network, deep learning, fuzzy logic and probabilistic latent variable models.

Artificial Neural Networks (ANN) based soft sensor have the capability to learn nonlinearity of the process using experimental plant data and thus can be used to estimate the state of bioprocess such as biomass concentration [12, 13]. The rapid development of algorithms and information technology is the major motivation behind the broad application of ANNs in research and development [14]. Currently, ANNs are employed in the prediction of various outcomes including process control, medicine, forensic science, biotechnology, weather forecasting, finance and investment and food science. However, it is noteworthy to state that the use of ANNs in biofuel production is currently in the early phases of its development. Generally, microbial fermentations exhibit non-linear relationships which could pose several problems during bioprocess modeling and optimization. The application of robust models such as ANN helps to capture this nonlinear behavior, and thus provides a model that links the process inputs to the corresponding output parameters. On comparison with other empirical models, neural networks are relatively less sensitive to noise and hence can be applied to process control systems with higher level of uncertainty [15]. During batch/fed-batch, ANN can be effectively used for estimation of biomass or product, optimization of fed-batch run and online control of bioprocess systems [16, 17, 18]. ANNs are suitable for many applications such as nonlinear filtering, prediction of output using input are widely used in the modeling of dynamic systems [19]. Several ANNs such as feed forward back-propagation neural network (FFBPNN) [20], Hopfield [21], radial basis function (RBF) networks [22], recurrent [23] and hybrid neural network (HNN) [24] found extensive application in bioprocess industries based on their functions. The BPNN with supervised learning has been reported in biofuel process modeling as shown in Table 1.

Input parametersOutput parametersANN TypeANN structureR2 valueReferences
pHBiomass concentration1–2-1[3]
Sin (glucose)Biomass concentrationHNN1–3-1[24]
Sin (glucose), sodium nitrate concentration, yeast extract concentrationLipid Productivity, biomass concentrationFFBPNN3–10-10.99[25]

Table 1.

Application of ANN in bioprocess industries.

Sin, initial substrate concentration; R2, coefficient of determination.

3. Biomass modeling methods

The Food and Drug Administration (FDA) of United States has initiated the online monitoring and closed loop control of a bioprocess via Process Analytical Technology (PAT) initiative. PAT highlights the concept of process understanding in order to deliver high quality products and this can be achieved by the design of accurate bioprocess models. The mathematical models are in general classified as black box, white box and gray box models.

3.1 Black box models

Black box models are input/output models that do not require a prioriknowledge about the process and describe the system based on experimental data. An example of input/output model that has an output y which depends on past and present inputs is given in Eq. (1) as follows:


where qis the backward shift operator for the polynomials A(q)and B(q)as given in Eq. (2), Eq. (3) and Eq. (4)


The input output experimental data is used to determine the values of variables a, b and the order of the polynomials nand m. Artificial Neural Network (ANN) models are another type of black box models that has wide application in bioprocess technology due to their ability to represent non-linear functions.

3.2 White box models

White box models are mechanistic models that are been used widely used incorporates the available process knowledge in the form of first principle model equations. As an example, the first principles model widely used for fungal fermentation during cellulase production are represented in Eq. (5), Eq. (6), Eq. (7) as follows [26].


where Xrepresents the biomass concentration (g/l), Srepresents the substrate concentration (g/l), μmis the maximum specific growth rate (1/h), Ksis the substrate saturation constant (g/l), kdis the cell death constant (1/h), k1, k2are rate constants for cellulase synthesis (IU/ml h) and cellulase decay (1/h), Kiis the substrate inhibition coefficient(g/l), Etis the total cellulase activity, YX/Sis the stoichiometric biomass yield coefficient (g/g) and msis the specific maintenance coefficient (1/h).

3.3 Hybrid models

Gray box models also known as hybrid models are considered as an effective tool for model identification. These models combine a prioriknowledge of the process and black box representations. The black-box model can be ANN, Fuzzy, NARX, Neuro-Fuzzy etc. It provides the flexibility to develop model based on both process data and available knowledge about the process [27, 28]. Hybrid models provide higher estimation accuracy, interpretability and extrapolation. In particular, during fungal fermentation using lignocellulosic substrates, the kinetic parameters μmand YX/Sare generally assumed to be constant throughout the batch/fed-batch process. However, there might be a slight change in their values and accurate estimation of these kinetic parameters such as aids greatly in process control and product enhancement. The multivariate interactions during the process operation can be found using statistical design of experiments (DoE). The hybrid models are commonly represented in two configurations as parallel and cascade as shown in Figure 1.

Figure 1.

Configurations of a hybrid model. (a) Parallel form (b) Serial form.

The parallel configuration Figure 1(a) uses complete first principles model and the error between outputs of this model and real-time process are modeled. The serial configuration Figure 1(b) is used when there is less number of unknown parameters. This is the most frequently used hybrid model structure. The choice of hybrid configuration strongly depends on the First principles model structure. When the First principles model is not accurate, Parallel configuration is a better choice as parallel hybrid model can compensate for the First principles model mismatch [29]. When the First principles model is accurate, serial configuration seems to be a better choice as it offers better extrapolation [30].

3.3.1 Hybrid fuzzy models

A non-linear process such as bioreactor can be modeled using Fuzzy logic via Rule base analysis. Initially a model structure is chosen wherein the number of inputs and number of fuzzy sets per variable are found, then estimation of fuzzy membership function parameters with regard to its shape, position is carried out and finally the rule base mapping from fuzzy sets to functions are carried out. The fuzzy logic system draws decision with the help of Fuzzy Inference System which uses “If ..Then” rules along with OR, AND connectors. For example, if specific growth rate is high, biomass concentration is high; else if specific growth rate is low, biomass concentration is low. Fuzzy models render transparency and therefore have the advantage of interpretability when compared to other data- driven methods. The popular fuzzy models in use are the Takagi-Sugeno and the Mamdani type. Takagi-Sugeno type fuzzy models are suitable for modeling the non-linearity of the process and can be represented as many linear models in parallel, wherein the sub-models are chosen based on some specified rule. These are generally well suited to perform mathematical analysis and can be applied in Multiple Input Single Output (MISO) systems. Also, in Takagi-Sugeno model, output membership function are either constant or linear. The Mamdani type fuzzy model has a fuzzy logic set as the output of each rule. These are well suited to perform with manual input and can be applied in Multiple Input Single Output (MISO) systems and Multiple Input Multiple Output (MIMO) systems. The output membership function is present and the output is not continuous. Moreover, the Mamdani type fuzzy model is more accurate than Takagi-Sugeno, but it requires estimation of huge number of parameters.

Hybrid Fuzzy models comprises both first principles model equations and Fuzzy models [31]. The parameter estimation of hybrid fuzzy models are commonly done by Kalman filter. Fuzzy models are identified from experimental process data commonly by Fuzzy Clustering Method (FCM). Improvements in accuracy and performance of fuzzy models can be achieved by implementation of FCM based on Artificial Bee Colony (FCMABC), FCM based on Chicken Swarm Oprimization (FCMCSO), etc.

A case study to exhibit the Hybrid fuzzy modeling approach, the low cost cellulase production process is considered where the objective is to find the product concentration (cellulase) based on the interactions between the biomass and substrate.

The following assumptions are made during the batch/fed-batch operation of the bioreactor. The reactor is completely mixed and the feed flow rate (F) is known. Measurements for biomass concentration (X), Substrate concentration (S), product concentration (P) and volume (V) are available. The sampling interval is 30 minutes for these measurements.

The first principles model consists of 4 state equations for biomass concentration X (g/l), Substrate concentration S (g/l), Product concentration P (g/l) and volume of the bioreactor V (l) as defined in Eq. (811).


where μis the specific growth rate (h−1), Frepresents substrate feed rate, Sinis the substrate concentration in the feed, qsis the substrate consumption rate (h−1), qpis the product formation rate (h−1), Kis the product decay constant (h−1). No direct measurements were made for these kinetic parameter rates. Therefore, a fuzzy model structure is represented as in Eq. (1214).


where μm, Yx/s, Yp/s, Ks, msare constants.

The kinetic parameters μand qpdepends on Xand Sand their values are unknown. Hence, fuzzy models are developed to estimate their values. An extended Kalman filter is designed to obtain the estimated values of parameters. The filter is tuned by fixing the process noise covariance matrix Q as [0.001,0.001, 0.001, 0.054,0.003] and measurement error covariance matrix R as [0.1, 0.05, 0.03, 0.1]. The filter performance is evaluated by stability border criterion λand significance level, Lα(Table 2). Smaller values of these two criterions represent good tuning of the Kalman filter.

Lαfor X5
Lαfor S3
Lαfor P7

Table 2.

Results of Kalman filter.

From the table, it is observed that the filter is tuned properly. In this work, the fuzzy sub-model identification for specific growth rate and product formation rate are done with fuzzy clustering. The basic idea is to form clusters (similar groups) with the available experimental data. Each cluster exhibit an independent rule in the rule base. The advantage of using fuzzy clustering method is that the experimental data is focused and from that, the fuzzy model with independent rules is developed. The fuzzy models for kinetic parameters are represented in Figure 2 and the hybrid model output in comparison with experimental data is illustrated in Figure 3. The optimization of fuzzy model parameters will improve the performance of hybrid model.

Figure 2.

Fuzzy model for kinetic parameters μ andYp/s.

Figure 3.

Performance of hybrid model for biomass(a), substrate(b) and product (c) concentrations.

3.3.2 Hybrid ANN models

Hybrid ANN models are combination of first principles and ANN wherein the ANNs are used to estimate the kinetic parameters (black box models) [32]. The hybrid model shown in Figure 4 is a combination of neural network estimator with the Mass Balance equations (Mathematical model). The neural network estimator is capable of estimating the process parameters from the real time measurements and these kinetic parameters (μ and Ys/x) are updated in the mass balance equations to give the value of the state variables in the next time instant.

Figure 4.

Hybrid model structure.

In general, the kinetic parameters are determined offline from experimental data. Due to the ability of neural networks to learn and model non-linear relationships, the parameter values can be estimated after proper training. Neural network with varying number of hidden neurons has been trained and MSE between the actual data and estimated data are calculated. Network with less MSE has been selected to find optimal hidden neurons. In this case study, a neural network structure comprising of two layer feed-forward network with sigmoid hidden neuron and linear output neuron is used. The state variables XtXtand StStare the inputs and the parameters μ̂and ŶS/Xare the outputs of the neural network. The parameters are found using the Eq. (15) and Eq. (16) given below for every time instant.


where S(t) is the substrate concentration, kS– saturation constant which is found experimentally to be 0.01 g/L.


Where Stand St1are the present and past substrate concentrations, XtXtand Xt1Xt1are the present and past biomass concentrations respectively. For generalization, the experimental data is divided for training, testing and validation in the ratio 70:15:15. The training of neural network has been carried out in MATLAB. Levenberg–Marquardt algorithm is used with single hidden layer. Sigmoid and Linear activation functions are used for hidden and output layers respectively. Mean Square Error is the performance evaluation criterion and accordingly the number of hidden neurons is chosen to be 15. The number of iterations is fixed at 1000.

The response from the hybrid model obtained for training input is shown in Figure 5(a) and test input is shown in Figure 5(b).

Figure 5.

Validation of hybrid model with training and testing data set.

The process parameters μ and YX/S change with respect to time and the corresponding state variable measurements. The time varying natures of the parameters are shown in Figure 6.

Figure 6.

Variation of kinetic parameters μ and YX/S with respect to time.

3.4 NARX models

Non-linear regressive models with exogenous input (NARX) are found to be effective for non-linear system identification, as they have good predictive capability. To predict the system behavior without a deep mathematical knowledge [33, 34], model identification with input–output measurements is generally used. For closer approximations of actual process, a NARX model is commonly employed [35]. The NARX is a recurrent dynamic network, with feedback connections enclosing several layers of the network. The defining equation for the NARX model is given in Eq. (17).


where ytand utare output and input signal, ‘f’ is a nonlinear function, nyand nuare the output and input delays of nonlinear model and etis the error term. The next value of the dependent output signal is regressed on previous values of the output signal and previous values of an independent (exogenous) input signal. The successful estimation of process state by soft sensor greatly depends on input output data set. The input variable should be chosen such that it has a direct or indirect relation with the estimation variable. The microbial cell metabolism is influenced by pH value, agitation speed and substrate concentration inside the bioreactor vessel, hence it is directly related to the biomass concentration [36].

As a case study [37], a NARX model is developed for estimation of biomass concentration using the dataset of pH, agitation speed and substrate concentration values starting from the time of inoculation till the end of fed batch process. The experimental data is divided into training, testing and validation in the ratio 70:15:15. The inputs for the NARX model are present values of pH, substrate concentration (S), agitation speed and previous sampling instant biomass concentration, X(k-1) and the output is the estimated biomass concentration, X(k) as shown in Figure 7.

Figure 7.

NARX model for estimation of biomass concentration.

To obtain best performance from NARX model, two hidden layers are used and the numbers of hidden neurons in each layer are chosen based on Mean Square Error (MSE). ANN parameters used in the NARX model development are listed in Table 3.

ArchitectureDynamic neural network (NARX)
Training AlgorithmLevenberg–Marquardt algorithm
Number of iterations1000
Number of hidden layer2
Number of hidden neurons in first layer18
Number of hidden neurons in second layer10
Activation function (hidden layer)Sigmoid
Activation function (Output)Linear
Performance EvaluationMean Square Error

Table 3.

ANN parameters for NARX model development.

Experimental validation of NARX Model is done with the help of capacitance probe. The annular dielectric probe when inserted into the bioreactor, gives a capacitance value that can be directly related to the concentration of biomass. The probe is useful particularly during fungal biomass cultivation due to the nonexistence of accurate offline measurements [38]. As the probe measures only the viable cells excluding the dead cells and insoluble substrates, it is an optimal choice for process validation. The probe generates a dielectric spectrum at 2 different frequencies given, based on cell size, morphology etc. [39]. In this case study, the capacitance probe is utilized in microbial mode and the biomass is measured at frequencies 0.6 MHz and 15 MHz. The 15 MHz reading is used as a form of auto zero and subtracted from the 0.6 MHz. Therefore, the capacitance of background matter is automatically subtracted from the signal. A resolution of 0.1 pF/cm on the instrument typically represents 106 Cells/ml, or 0.5 grams per liter. The dynamic NARX network are trained with different sets of input–output data. The response of NARX network to test inputs is shown in Figure 8.

Figure 8.

Comparison of NARX model output with experimental biomass concentration data.

It is inferred from Figure 8, that the trained dynamic NARX network can be used in the place of biosensor. The error response for single NARX network is shown in Figure 9.

Figure 9.

Error plot for the NARX model.

The performance of NARX network is analyzed based on the performance criteria, Root Mean Square Error (RMSE) and coefficient of determination (R2). RMSE is calculated by the Eq. (18).


where Nis the length of data, yi isthe predict value and ti isthe target value (Rafsanjani et al.2016). The RMSE and R2 values of NARX network are obtained as 0.01 and 0.8789 respectively and the correlation graph is shown in Figure 10.

Figure 10.

Correlation graph between biomass concentrations monitored with real-time capacitance probe and estimated by the NARX model.

It is observed that the NARX model has a low value of RMSE, a very high value of R2 and good correlation to real time probe data, which confirms that this dynamic neural network soft sensor performs well in the estimation of biomass concentration.

3.5 LSTM models

Recurrent Neural Networks (RNNs) are a type of deep networks that are structured to capture the temporal dependencies of the process effectively [40]. Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. The LSTM network was invented with the goal of addressing the vanishing gradients problem. The key insight in the LSTM design was to incorporate nonlinear, data-dependent controls into the RNN cell, which can be trained to ensure that the gradient of the objective function with respect to the state signal does not vanish [41] and hence LSTMs are well suited for classification and prediction problems. The LSTM model developed for the estimation of maximum specific growth rate is shown in Figure 11. The input gate, forget gate and output gate equations are given by Eq. (19), (20) and (21) respectively.

Figure 11.

LSTM model developed for maximum specific growth rate estimation.

The new memory cell and final memory cell equations are given as Eq. (22) and Eq. (23). The hidden state equation is represented in Eq. (24).



xt          Input word

it          Input state

ft          forget state

c¯t1, c¯t, ct      Past, new and final memory

ht1,ht        Previous and current hidden state

ot          Output state

Wi, Wf,Wc, Wo  Weight vectors connecting previous and current hidden layers

Uf,Uc, Uo      Vectors connecting inputs to the current hidden layer

σ           Sigmoid activation function

Similar to the development of NARX model, the modeling of LSTM network also includes data collection, parameter determination, training, testing, validation. The modeling can be done either in MATLAB or using python coding. As a case Study, consider a LSTM network in which two hidden layers are chosen and the number of neurons in each hidden layer is varied till the MSE reaches minimum value. The parameters chosen to frame the LSTM network are listed in Table 4. The training to testing ratio is chosen as 67:33. The predicted values of maximum specific growth rate calculated from the LSTM model are presented in Figure 12.

Batch size50
No. of hidden layers2
No. of hidden neuronsHidden layer 1Hidden layer 2
Activation functionSigmoid
Learning rate0.01
No. of epochs800

Table 4.

Parameters for LSTM model development.

Figure 12.

Comparison of LSTM predicted maximum specific growth rate data with experimental data.

The performance of LSTM model is evaluated by the statistical measures RMSE, R2 and Accuracy factor (Af). The Af averages the distance between every point and the line of equivalence as a measure of finding the closeness between predicted and observed values. The RMSE, R2, Af values of the LSTM model are found to be 0.011, 0.994 and 1.024 respectively. The RMSE and Af values are minimum which suggests that the LSTM predictive model fit well with the experimental data.

4. Conclusions

Several modeling techniques that will aid in the monitoring and estimation of fungal biomass in the presence of lignocellulosic substrates during fed-batch fermentation are discussed in this chapter. Moreover, the bioprocess models are validated with experimental data as discussed in case studies. The use of these soft sensors in industries with accompanying control system will improve the cellulase concentration yield.


The author acknowledge Anna University and Department of Science and Technology for their funding and support.


Conflict of interest

The author declares no conflict of interest.

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Chitra Murugan (February 10th 2021). Soft Sensors for Biomass Monitoring during Low Cost Cellulase Production, Biotechnological Applications of Biomass, Thalita Peixoto Basso, Thiago Olitta Basso and Luiz Carlos Basso, IntechOpen, DOI: 10.5772/intechopen.96027. Available from:

chapter statistics

145total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Biomass Conversion Technologies for Bioenergy Generation: An Introduction

By Abdurrahman Garba

Related Book

First chapter

Assessment of Sugarcane-Based Ethanol Production

By Rubens Eliseu Nicula de Castro, Rita Maria de Brito Alves, Cláudio Augusto Oller do Nascimento and Reinaldo Giudici

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us