Operating conditions of the bioreactor.

## Abstract

Fermentation process of Saccharomyces cerevisiae has been investigated by many researchers for higher product quality and yield with lower cost. Operating parameters such as pH, dissolved oxygen (DO) concentration, temperature, substrate type and concentration, agitation speed, air flow rate should be optimized to achieve valuable products. In this point, system identification and advanced control techniques emerge to provide solutions. Dynamic analysis of pH and DO of the growth medium were performed at aerobic conditions in a batch bioreactor by applying step and square wave inputs to the base and air flow rates, respectively. Input–output data of the process and linear Auto Regressive Moving Average with eXogenous (ARMAX)-type model were used to determine the relationship between controlled and manipulated variable in baker’s yeast production by system identification. The model parameters were estimated using the recursive least squares (RLS) method. The most suitable parametric model was determined by carrying out estimations with different values of initial value of the covariance matrix, forgetting factor, and order of the ARMAX model. Self-tuning generalized minimum variance (ST-GMV) control was performed with the ARMAX model for controlling pH and DO. Integrated square error (ISE) values were considered as a performance criteria for modeling and control studies.

### Keywords

- Baker’s yeast
- S. cerevisiae
- system identification
- ARMAX model
- RLS
- ST-GMV control
- dissolved oxygen control
- pH control

## 1. Introduction

*Saccharomyces cerevisiae* microorganism, also known as baker’s yeast, is used in a variety of applications such as ethanol, glycerol, β-glucan, invertase enzyme, and mostly yeast production. Using molasses or glucose as the carbon source with batch or fed-batch operation, it is possible to produce ethanol and the yeast itself on commercial scale under anaerobic or aerobic conditions, respectively [1]. Due to the process economy, cells with high volumetric efficiency should be obtained at the growth phase of microorganisms [2]. *S. cerevisiae* research is continuing in biotechnology and genetics fields as well as R&D works in the food and pharmaceutical industry [3]. Many studies on β-glucan production emphasize the importance of *S. cerevisiae* production for use in the pharmaceutical field [4]. Production data must be examined well in order to better understand the process [3]. Microorganisms used as biocatalysts can be produced at high concentrations with high enzyme activity at suitable values of bioreactor operating conditions such as pH, temperature, dissolved oxygen (DO) concentration, air flow rate, agitation speed, and substrate concentration [2].

If the facts of bioprocessing are mentioned, the operating parameters of the systems change with time, and these changes are not linear. In addition to that, they lack the mathematical models that define the complex reactions, which take place during cell growth and product formation [5, 6]. Furthermore, there are limited online sensors that can detect state variables such as cell, substrate, and product concentrations. Also, inhibitory effects of the substrate, oxygen or formed product, on the activity of the biocatalyst are present. Alpbaz et al. emphasized that *S. cerevisiae* is very sensitive to changes in the growth environment [7]. Because of these constraints in bioprocesses, it is necessary and important to determine the optimal operating conditions and to control the operating parameters at the determined optimum values to ensure economic gain, high-quality product, and safe operation [8]. Despite installation, operation and modeling studies are present in the literature for batch and fed-batch operation, and it is very difficult to control them due to both their biological process nature and the dynamics throughout that process. [9].

The control performance can be affected by the controller-tuning parameters and the process model parameters besides, the choice of control algorithm and the structure of the process model. Parameters of parametric and nonparametric models are calculated using dynamic analyses of different disturbance effects and system identification algorithms. Parametric models are generally constructed using a discrete time polynomial model [10]. Step, square wave, pseudorandom binary sequence (PRBS), impulse, pulse, and random inputs are generally applied as input variables (such as flow rates of acid/base for pH, cooling/heating fluid flow rate for temperature, air flow rate for DO, and substrate feed for substrate concentration) to perform dynamic analyses and obtain input–output data from process [11]. Various system identification algorithms such as Biermann, Levenberg–Marquardt, genetic, least squares (LS), and recursive least squares (RLS) have been studied for the calculation of the model parameters [10, 11]. The parameters of nonparametric models are calculated using the corresponding curves, such as reaction curve and Bode diagrams [12, 13]. The actual part of the system can be modeled with approximated structure and estimated parameters of it [3]. During control, the closed-loop performance of the system is largely depend on incompatibility of the actual process with the model. Therefore, a particular model structure should include all the known information about operating conditions and approximate the system to a chosen degree. Also, it should be flexible and lead to fast parameter estimation procedures [14]. Auto Regressive Moving Average with eXogenous (ARMAX)-type input polynomial model has been widely used in the literature due to the basic structure to describe the process dynamics [2, 3, 10–12, 15–18].

The need for self-regulating controllers stems from the desire to control processes whose parameters are unknown or slowly changing over time [11]. The fundamentals of this method were based on the self-tuning regulator (STR) developed by Åström and Wittenmark [19]. The control objective of this method is to reduce the variance of the output variable (such as pH, temperature, DO, and substrate concentration) to a minimum. The STR predicts the future output variance and then tries to implement a control action that forces the estimated variance to be zero [11]. However, in the applications of the STR technique, some difficulties have been experienced such as lack of online tuning parameters, weakness for control of nonminimum phase systems and poor control on changing or unknown time-delayed systems [2]. Later, Clarke, and Gawthrop modified the STR to the self-tuning generalized minimum variance (ST-GMV) to overcome these difficulties [20]. ST-GMV is an adaptive algorithm based on GMV cost function and a predictive form of the process model. This formulation leads to an easier tuning [21]. ST-GMV method has become very popular nowadays and is widely used in the industrial applications [22, 23].

In this chapter, dynamic analysis and system identification of the most important operating parameters for the baker’s yeast production process which are pH and DO of the growth medium were investigated. Dynamic analysis were conducted at an optimal temperature for baker’s yeast production determined from previous studies as 32°C were done in order to explain the process behavior and obtain the data to be used in the system identification step. It is clear that this study must be performed as process specific due to the data depends on mostly the physical structure of the process. This chapter especially focused on the success of the system identification step on the control applications. For examining the controller performance dependence on the process model structure and the model parameters, two contrary models which were called as suitable and unsuitable models were used in the ST-GMV simulation studies. Controller performances were evaluated according to the constant set point trajectory with various noisy conditions for both controlled variables. Similarly, as most of the studies given in the literature, simulation study results conducted at the MATLAB environment and interpreted in the base of intergrated squared errors.

## 2. Effects of operating parameters on baker’s yeast production

*S. cerevisiae* yeast is undoubtedly one of the most important microorganisms that have been consumed safely throughout human history. Yeast cells need both nutrients and energy for growth and product formation. They use sugars such as glucose, maltose, and sucrose, vitamins such as biotin, pantothenate, inositol, and minerals such as Cu, Zn, Fe, Mo, and Mn to provide the necessary nutrients and energy [22]. High substrate concentrations may cause inhibition on microorganism production due to that substrate is binding to a second, nonactive site on a form of enzyme [22]. In the same case, ethanol production also occurs due to oxygen deficiency. This is known as the Crabtree effect, which is undesirable because it causes low yield in the fermentative growth of the cells [22]. Substrate level that inhibits the production of yeast is essentially dependent on the cell and substrate type. Glucose concentration over 200 g/L inhibits the microorganism growth in yeast production [2].

Another essential requirement is oxygen. Yeast cells use oxygen together with sugar to grow without ethanol production. As in the case of beer and wine production, if there is not enough oxygen in the environment, yeast will continue to grow by producing ethanol. In order to produce the yeast in the desired way, the oxygen and sugar transfer to the growth medium should be good and the ethanol formation should be low. Oxygen is a limiting substrate due to its lower solubility in water. The solubility of oxygen in water is 8 mg/L at 30°C [24]. It is known that the lower the DO concentration, the lower the substrate consumption and the rate of carbon dioxide formation, and this is called as Pasteur effect [25]. When working at aerobic conditions, it is not enough to feed only the oxygen source to the system. Providing a homogeneous distribution of oxygen in the liquid medium is also an important parameter [24]. The transfer of oxygen from the gas phase to the microorganism in the feed medium is of great importance in determining bioreactor design and operating conditions. Depending on whether the medium condition is aerobic or anaerobic, the following reactions occur during yeast production.

Carbon dioxide and water form in the medium when the yeast can grow in a suitable environment under aerobic conditions. If the goal is to produce yeast, aerobic conditions are required [12]. Otherwise, ethanol production occurs under anaerobic conditions [12]. Since ethanol itself is a carbon source, yeast cells can also use the ethanol to grow primarily during the diauxic phase [25]. Since ethanol is toxic, yeast prefers sugars to ethanol in order to grow. During growth, the yeast cell produces carbon dioxide. Another reliable variable that can be used to describe the state of the yeast since it does not use oxygen while producing ethanol is the ratio of the carbon dioxide production rate (CPR) to oxygen uptake rate (OUR). This ratio is defined as respiratory quotient (RQ) [2].

The pH of the growth medium is an important operating parameter because it affects the activity of enzymes in the microorganisms. Yeast is resistant to acidic environments, as well as being very sensitive to alkali environments. It is preferable for the growth medium to be between pH 3 and 6 [24]. It is difficult for cations and anions to pass through the cells when not working at the appropriate pH for the microorganism. Disruption of cell permeability affects enzyme activity and causes protein synthesis to stop. Besides, cells become more susceptible to toxic substances. For these reasons, the pH of the growth medium is influential on the substrate consumption and production yield of the yeast.

Like all microorganisms, yeasts have minimum (5–25°C), maximum (40–50°C), and optimum (30–40°C) growth temperatures [26]. The activity of enzymes involved in microorganisms’ structures is greatly affected by temperature. In general, the temperature of the growth medium has a great influence on the growth of microorganisms, respiration, and product formation. According to the chemical kinetics, rise of temperature increases the reaction rate; however, enzymatic reactions adhere to this rule until a certain temperature. The temperature increase caused by the heat generated during the production of the baker’s yeast under aerobic conditions is undesirable in terms of product efficiency.

## 3. System identification

Before working on a system, it is necessary to know the system’s upper and lower limits in order to be aware of the possible situations that can be faced. To control the system, firstly the relations between the input variables and the output variables must be obtained and the model of the system must be developed. While modeling the systems, it is possible to use input–output data obtained by experimental studies or mass-energy balances. However, it is difficult to obtain a model by using mass-energy balances in complex systems, and in some cases these balances may be insufficient to accurately identify the system. In such cases, it is more useful to create models using system identification methods from the experimental input–output data [27]. System identification might be described as a method based on giving a disturbance to the input variable and consequently obtaining the output variable data of the system and determining the model. The unknown parameters in the parametric model are found by an appropriate method using input–output data. Then, the model is compared with the experimental data and the fitness is tested.

### 3.1. Signals used in system identification

The first step of system identification is the selection of input signals that will affect the system. Step, square wave, sinusoidal wave, PRBS, impulse, pulse, and random signals are generally applied as inputs [11]. Often, discrete time models are used to describe the system [8–11]. For this purpose, input and output variable signals are sampled and recorded at a suitable time interval. This chapter was focused on step effect as input signal.

### 3.2. System models

Linear model of an open-loop discrete time system can be written in terms of u(t) as input variable and x(t) as output variable as shown in Eq. (3).

Backward shift operator is defined as Eq. (4).

Here, x(t) represents the x value at time t, x(t − 1) represents the x value at time (t − Δt) for Δt = 1, and x(t − i) represents the x value at time (t − iΔt). Eq. (5) is defined as a discrete time transfer function.

Here, polynomials of A and B can be written as Eqs. (6) and (7).

Roots of polynomials A and B are poles and zeros of the system, respectively. If one of the poles or zeros of the system is placed outside the unit circle of the z-plane, system is defined as unstable or nonminimum phase, respectively [28]. In a self-tuning system, disturbance effects can occur in a wide variety of forms. The disturbance signal s(t) can be a part of the control system and is often treated as an additional disturbance factor at the output of the controlled process. In this case, the self-tuning controller will attempt to eliminate this disturbance effect. Such signals can be summed up in two groups as defined signals and random signals. In general, model of a constant random signal source is shown as follows.

Here, C polynomial is defined as Eq. (9).

Eq. (8) is also described as auto regressive moving average (ARMA) model. Whole system output can be written in various forms as Eqs. (10–12).

This model is obtained by adding a control input to the ARMA-type signal, and the system is called ARMAX model.

### 3.3. Estimation of model parameters

#### 3.3.1. LS method

Eq. (11) is written in matrix form, and transpose of data and parameter vectors are defined as (ϕ^{T}) and (θ^{T}), respectively.

Using Eqs. (13) and (14), output variable can be written as follows.

Eq. (15) is known as linear in the parameter model. There are N measurement values, where N is the number of data samples. Eq. (15) can be redefined as follows.

Eq. (16) can be written in vector form as follows.

The estimated parameter matrix (

The difference between the measured output variable and the output variable calculated from the model is defined as the estimation error (ε(t)) and given in Eq. (19).

Eq. (19) is rearranged for N measurements, and Eq. (20) is obtained.

Eq. (20) can be written in vector form as follows.

In order to calculate

To find the value that makes the Eq. (23) minimum, the derivative is taken, then equalized to zero, and rearranged and the predicted parameter vector calculated by the LS method is found as follows.

Although the LS is a widely used method, it is not suitable for self-tuning and predictive control methods because the parameter calculation is not made in real time. In such control methods, the data and parameters must be solved at every t instant and updated.

#### 3.3.2. RLS method

Real-time parameter estimation is possible with RLS method and can be easily applied to self-tuning and predictive control algorithms for calculating time-varying model parameters. In the RLS method, the new value of the output variable is calculated by using the model parameters based on past data and the new input–output variable values. The actual value (y(t)) is compared with this estimated value, and the error (ε(t)) is found. The model parameters calculated in the previous step are updated with the newly calculated model parameters [27]. If Eq. (25) is written for any t instant, the parameter calculation equation will be as follows.

Description of terms in Eq. (26) is in terms of vector forms as follows.

If Eq. (26) is written for the next sampling time (t + 1), parameter calculation equation will be as follows.

Terms *φ*(*t* + 1) and *Y*(*t* + 1) of Eq. (28) are written in vector form as follows.

Using these equations, the terms in Eq. (28) are updated.

Covariance matrix (P(t)) is defined and written in Eq. (30) as follows.

Using covariance matrix definition, Eqs. (26) and (28) can be rewritten as follows.

The term of *φT*(*t* + 1)*Y*(*t* + 1) in Eq. (31) is written in Eq. (35) and rearranged as follows.

Eq. (37) is obtained using Eq. (34) as follows.

Eq. (33) is combined with Eq. (37) and rearranged as follows.

Eq. (38) is written in place at *φT*(*t*)*Y*(*t*) term of Eq. (36).

Estimation error at time (t + 1) is defined as Eq. (40), and model parameter vector at time (t + 1) is found as Eq. (41).

Matrix inversion is applied to Eq. (33), and future value of covariance matrix is obtained as Eq. (42).

RLS method consists of Eqs. (40–42), and the algorithm used is given below.

At time t + 1,

## 4. ST-GMV control

Cost function of STR, defined as the difference between the set point and the measured value for an input–output model, is as follows.

where y is the output variable, r is the set point, u is the manipulated variable (input), *Ξ* is the expectation, and k is the default time delay [16]. It is possible to minimize this cost function by choosing u(t) which can be defined as an appropriate control output at time t. At the next sampling time step (t + Δt), a new situation occurs between y and r, and u will need to get a new value. If the default time delay is smaller than the time delay to be encountered in the real system, then the control output will try to remove the noise components before being transmitted to the system with the time delay in the real system. This would result in large feedback gains, resulting in an unrealizable controller that would make the system unstable. On the other hand, if the default time delay is greater than the time delay of the real system, then the lowest possible noise value will not be obtained since the highest rate for manipulation is not provided [16]. Clarke and Gawthrop set out the ST-GMV method using the control cost of the STR of Aström and Wittenmark to remove the difficulties in the STR altogether [29]. ST-GMV control is a one-step ahead optimal control strategy. The cost function of this technique is expressed by the following equation.

This type of controller design can internally stabilize the system, and the stability depends on the selected λ values. ST-GMV algorithm has a good set point tracking characteristic and has the ability to control nonminimum phase systems. If the default time delay is implemented within the generalized system, then the control signal compensates the pseudo output φ(t) accordingly and directs the feed-forward path.

Using Eq. (44), ST-GMV method relies on maintaining closed-loop stability by taking λ as small as possible while maintaining a minimum output change to stay reasonably close to the expectation. Cost function can be generally expressed as follows.

ST-GMV method uses a system pseudo output φ(t + k) given by the following equation to minimize the cost function expressed in general by Eq. (45).

Here, r(t) is the set point, P, Q, and R are the transfer functions with backward shift operator (z^{−k}). Pseudo output of the system includes a feed-forward feed term (Q) and filters (P, R) of output and the set point. ST-GMV algorithm uses the feed-forward polynomial Q to prevent output noise removal problem before signal transmission. φ(t + k) term of Eq. (46) can be expressed using Eq. (11) with the implementation of default time delay as follows.

According to this equation, cost function to be minimized given by Eq. (45) will be the pseudo output variation. ST-GMV control algorithm divides the system into parts. For this, firstly, the error term is fragmented to include past, current, and future data.

Both sides of Eq. (48) are multiplied by A and rearranged as follows.

Polynomials are written as follows.

AE term of Eq. (49) is expressed in terms of ARMAX model including offset as follows.

If both sides of Eq. (54) is multiplied by E and written in Eq. (49), Eq. (55) is obtained.

Both sides of Eq. (55) are divided to C, and Eq. (56) is obtained.

Eq. (56) is combined with Eq. (46) and rearranged as follows.

Eq. (57) is the sum of current and future terms. Current terms can be expressed as follows and represents the best φ(t + k) estimation made by using the data until time t.

The second term is the estimation error caused by the noise source, e (t + 1), e (t + 2), …, e (t + k). The second term cannot be removed using control signal u(t) as mentioned before.

So, J is minimized by equalizing Eq. (58) to zero.

Eq. (60) is rearranged using following definitions.

ST-GMV control law can be expressed as follows.

Calculation of input variable using ST-GMV control law is made using following equation.

Application of ST-GMV algorithm consists of following steps [16, 29]:

1) Apply a PRBS to the system as a forcing function and obtain the plant output.

2) Estimate F, G, H from Eq. (63), implementing the RLS algorithm.

3) Employ Eq. (64) to evaluate the control signal.

4) Apply the control signal.

5) The system output is obtained.

6) Return to step 1.

## 5. Material and methods

### 5.1. Microorganism, inoculum preculture, and growth medium

*S. cerevisiae* NRRL Y-567 was obtained from NRRL-Agricultural Research Service Culture Collection. Preculture and growth media consist of 2% glucose, 0.6% yeast extract, 0.3% K_{2}HPO_{4}, 0.335% (NH_{4})_{2}SO_{4}, 0.376% NaH_{2}PO_{4}, 0.052% MgSO_{4}·7H_{2}O, and 0.0017% CaCl_{2}·4H_{2}O which were sterilized by autoclaving under 1.2 atm at 121°C for 20 min. Microorganisms were incubated for 8 hours at 32°C at 120 rpm, and inoculum ratio of 1:10 was used for scale enlargement.

### 5.2. Experimental system

In order to observe the change of DO and pH over time during baker’s yeast production using a 2-L working volume of laboratory-scale bioreactor which was operated continuously, the input–output data were recorded and the ARMAX model parameters were determined by RLS method written in MATLAB. Experimental system is given in Figure 1. Most suitable parametric model was estimated using different values of α (covariance matrix), λ (forgetting factor), and order of parametric model. During experiments, DO and pH were measured with a WTW Oxi 340 with polarographic DO sensor and WTW pH340i pH meter, respectively. The DO and pH probes immersed in the bioreactor measure the online DO and pH values of the growth medium, and these values were converted to the electrical signal with DO, and pH meters reach the I/O card in the computer via the carrier interface modules. The signals arriving to card are interpreted by the algorithm written in Visual Basic in the ADVANTECH VISIDAQ package program and was sent to the system online. Operating conditions of the bioreactor was given in Table 1.

Temperature (°C) | Air flow rate (vvm) | Cooling water flow rate (mL/min) | Cooling water temperature (°C) | Agitation speed (rpm/min) |
---|---|---|---|---|

32 | 1 | 55 | 21 | 600 |

## 6. Results and discussion

### 6.1. Dynamic analysis

#### 6.1.1. Step input given to air flow rate

In baker’s yeast production, the oxygen concentration must not fall below the critical value (0.7 mg/L); therefore, firstly manipulated variable must be selected to control the DO [12]. For that purpose, the air flow rate was chosen as the manipulated variable for the control of the DO. However, the effect of air flow rate on the DO in the bioreactor was investigated in order to observe effective control of this variable. For this purpose, while the system was in steady state at 1 mg/L DO for 0.5 L/min air flow rate, the positive step input was given to air flow rate as 3.4 L/min, and the change in DO over time was observed. In this case, DO was increased to 3 mg/L as can be seen from Figure 2.

#### 6.1.2. Step input given to base flow rate

During the yeast growth, due to the degradation process of glucose in aerobic conditions to save the chemical energy in ATP molecules cause an increase in the concentration of H^{+} ions resulted with pH decrease. Decreasing the pH of the medium affects not only the cell division, but also the cleavage rate and the production of many products from yeast and the activity of enzymes. For this purpose, it is necessary to determine the manipulated variable in order to achieve pH control. Therefore, the base flow rate was selected as the manipulated variable for pH control. However, in order to observe effective control of this variable, the effect of the base flow rate on the pH in the bioreactor was investigated. The bioreactor was settled at pH 3.90 under the specified operating conditions, and then, microorganism was fed to the bioreactor. Positive step input was given to base flow rate from 0.26 to 1.41 mL/min with 0.05 M NaOH solution. The acid (H_{2}SO_{4}) flow rate was kept constant at 0.22 mL/min. The change in pH value over time under such an effect is shown in Figure 3.

### 6.2. System identification results

#### 6.2.1. Determination of model parameters for controlled variable of DO

In order to find the most appropriate ARMAX model using data of the manipulated variable air flow rate and the controlled variable DO obtained from dynamic analysis, the various forgetting factors (0.96–1), the initial value of the covariance matrix (1100,1000,10,000), and the order of the model (na = 2 nb = 1, na = 2 nb = 2, na = 3 nb = 1, na = 3 nb = 2) were run with the RLS algorithm, and integrated square error (ISE) was used for comparison. The models with the lowest and highest ISE values were used in the ST-GMV control algorithm to demonstrate the effect of the model structure on the control performance which will be explained in the next section. The compared values in terms of estimation performance are given in Table 2.

Estimation performance criteria | Order of the model | ||||
---|---|---|---|---|---|

na = 2, nb = 1 | na = 2, nb = 2 | na = 3, nb1 | na = 3, nb = 2 | ||

λ | P | ISE | ISE | ISE | ISE |

0.96 | 1 | 1.8720 | 1.2037e + 36 | 1.5937 | 1.9718e + 61 |

100 | 1.6297 | 3.8648e + 05 | 1.4280 | 2.2419e + 34 | |

1000 | 1.6110 | 9.9163e + 23 | 1.4124 | 8.5341e + 04 | |

10,000 | 1.6080 | 7.4019e + 25 | 1.4089 | 1.7541e + 25 | |

0.97 | 1 | 2.0045 | 1.9599 | 1.7180 | 1.6972 |

100 | 1.7544 | 1.7441 | 1.5485 | 1.5402 | |

1000 | 1.7337 | 1.7304 | 1.5306 | 1.5278 | |

10,000 | 1.7308 | 1.7295 | 1.5267 | 1.5254 | |

0.98 | 1 | 2.1602 | 2.1154 | 1.8570 | 1.8365 |

100 | 1.9039 | 1.8920 | 1.6852 | 1.6752 | |

1000 | 1.8803 | 1.8772 | 1.6641 | 1.6610 | |

10,000 | 1.8776 | 1.8765 | 1.6597 | 1.6585 | |

0.99 | 1 | 2.3344 | 2.2891 | 2.0055 | 1.9853 |

100 | 2.0702 | 2.0559 | 1.8307 | 1.8186 | |

1000 | 2.0427 | 2.0395 | 1.8052 | 1.8018 | |

10,000 | 2.0404 | 2.0392 | 1.8002 | 1.7991 | |

1 | 1 | 2.4483 | 2.4060 | 2.1171 | 2.0993 |

100 | 2.1663 | 2.1490 | 1.9384 | 1.9243 | |

1000 | 2.1336 | 2.1304 | 1.9077 | 1.9040 | |

10,000 | 2.1317 | 2.1305 | 1.9020 | 1.9010 |

The estimation performance criterion λ value shows a decrease in ISE values between 0.96 and 0.97, but an increase in ISE values is observed after 0.97 of λ. The initial values of the covariance matrix between 1 and 10,000 resulted in a decrease in ISE values (Table 2).

Consequently, the most suitable ARMAX model was obtained with the order na = 3 nb = 2, the forgetting factor of 0.97, and the initial value of the covariance matrix of 1000. In the least successful ARMAX model case, the order was na = 2, nb = 1, the forgetting factor was 1, and the initial value of the covariance matrix was 1. At the end of this approach, it was decided that the type of ARMAX model to be developed for GMV control was given in Eq. (67).

As a conclusion, the most suitable ARMAX model was obtained with the model order as na = 3, nb = 2, forgetting factor of 0.97, and initial value of the covariance matrix as 1000, and the RLS estimation of DO in the growth media by alterations with the air flow rate is given in Figure 4.

#### 6.2.2. Determination of model parameters for controlled variable of pH

In order to find the most appropriate ARMAX model using data of the manipulated variable base flow rate and the controlled variable pH obtained from dynamic analysis, the various forgetting factors (0.96–1), the initial value of the covariance matrix (1, 100, 1000, 10,000), and the order of the model (na = 2, nb = 1; na = 2, nb = 2; na = 3, nb = 1, na = 3, nb = 2) were run with the RLS algorithm and ISE values of the prediction were used for comparison. The models with the lowest and highest ISE values were used in the ST-GMV control algorithm, which will be explained in the next section. The compared values in terms of estimation performance are given in Table 3.

Estimation performance criteria | Order of the model | ||||
---|---|---|---|---|---|

na = 2, nb = 1 | na = 2, nb = 2 | na = 3, nb = 1 | na = 3, nb = 2 | ||

λ | P | ISE | ISE | ISE | ISE |

0.96 | 1 | 1.2954 | 1.2948 | 1.1136 | 1.2699 |

100 | 1.2394 | 1.2278 | 1.0715 | 1.0658 | |

1000 | 1.2096 | 1.2029 | 1.0521 | 1.0479 | |

10,000 | 1.1952 | 1.1943 | 1.0410 | 1.0383 | |

0.97 | 1 | 1.3644 | 1.3631 | 1.1858 | 1.1851 |

100 | 1.3160 | 1.3062 | 1.1474 | 1.1423 | |

1000 | 1.2815 | 1.2731 | 1.1260 | 1.1204 | |

10,000 | 1.2620 | 1.2595 | 1.1117 | 1.1091 | |

0.98 | 1 | 1.4439 | 1.4428 | 1.2706 | 1.2699 |

100 | 1.4020 | 1.3940 | 1.2351 | 1.2312 | |

1000 | 1.3670 | 1.3558 | 1.2149 | 1.2077 | |

10,000 | 1.3405 | 1.3373 | 1.1961 | 1.1929 | |

0.99 | 1 | 1.5302 | 1.5292 | 1.3676 | 1.3669 |

100 | 1.4910 | 1.4852 | 1.3326 | 1.3298 | |

1000 | 1.4602 | 1.4468 | 1.3159 | 1.3079 | |

10,000 | 1.4257 | 1.4213 | 1.2930 | 1.2887 | |

1 | 1 | 1.6207 | 1.6202 | 1.4828 | 1.4820 |

100 | 1.5827 | 1.5785 | 1.4484 | 1.4460 | |

1000 | 1.5582 | 1.5441 | 1.4355 | 1.4276 | |

10,000 | 1.5171 | 1.5108 | 1.4104 | 1.4044 |

ISE values were raised with the increase of the estimation performance criterion λ. For the same forgetting factor values, the ISE values decrease with the increase of covariance matrix initial value. The lowest ISE was obtained when the initial value of the covariance matrix was 10,000. As the order of polynomial A increases, the ISE values decrease, and as the order of polynomial B increases, a significant change in ISE values cannot be observed.

As a result, the most suitable ARMAX model was obtained with the order na = 3 nb = 2, the forgetting factor of 0.96, and the initial value of the covariance matrix of 10,000. In the least successful ARMAX model case, the order was na = 2, nb = 1, the forgetting factor was 1, and the initial value of covariance matrix was 1. At the end of this screening, it was decided that the suitable ARMAX model structure in order to develop the GMV control algorithm for pH control with by manipulating the base flow rate is given in Eq. (68).

As a conclusion, the most suitable ARMAX model was obtained with the model order of na = 3, nb = 2, forgetting factor as 0.96, and initial value of the covariance matrix of 10,000, and the RLS estimation is given in Figure 5.

### 6.3. ST-GMV control applications of baker’s yeast production

The suitable and unsuitable ARMAX models of the yeast production process expressed the relationship between the controlled variables of DO and pH, with the manipulated variables of air flow rate and base flow rate in system identification results section. After this step, ST-GMV control performances were evaluated with the suitable and unsuitable ARMAX models determined for each controlled variable in the case of constant set point trajectory for various noise levels. The control performance criterion was selected as ISE and values were evaluated for the ST-GMV control simulations of both DO and pH control cases. By this way, how much the system identification step, including the determination of model structure and model parameter settings, has affected the success of process control is demonstrated by using control simulations.

#### 6.3.1. DO control

In the baker’s yeast production process, in which DO was controlled variable and the air flow rate was selected as the manipulated variable, the most suitable and unsuitable ARMAX models obtained from system identification have been used in ST-GMV control algorithm. In the case of positive step input from 0.5 to 3.4 L/min for the air flow rate, the order of the most suitable model was na = 3 nb = 2, λ = 0.97, and P = 1000. By the same way, the model that does not identify the system (largest ISE value) was found as na = 2 nb = 1, λ = 1, and P = 1. When the both suitable and unsuitable obtained models were used in the ST-GMV control algorithm with the controller parameters of P = 1, Q = 0.9975, R = 2.0885 in the presence of two different noises. It was observed that the suitable model with calculated ISE values of 50.14 and 52.37 was definitely able to identify the system as expected and provided a good control (Figure 6) in contrast to unsuitable model with ISE values of 502 and 609.56, respectively (Figure 7).

#### 6.3.2. pH Control

In the baker’s yeast production process, in which the pH value was controlled variable and the base flow rate was selected as the manipulated variable, the most suitable and unsuitable ARMAX models obtained from system identification have been used in the ST-GMV control algorithm. In the case of positive step input from 0.26 to 1.41 mL/min for the base flow rate, the order of the most suitable model was na = 3, nb = 2, λ = 0.96, and P = 10,000. By the same way, the model that does not identify the system (largest ISE value) was found as na = 2 nb = 1, λ = 1, and P = 1. When the both suitable and unsuitable obtained models were used in the ST-GMV control algorithm with the controller parameters of P = 1, Q = 0.9375, R = 1.1885 in the presence of two different noises. It was observed that the suitable model with calculated ISE values of 110.2 and 113.51 was definitely able to identify the system as expected and provided a good control (Figure 8) in contrast to unsuitable model with ISE values of 245.69 and 213.69, respectively (Figure 9). ST-GMV control simulation results are summarized in Table 4.

Controlled variable | Noises | Control with suitable model ISE | Control with unsuitable model ISE |
---|---|---|---|

DO | e = 0.005 | 50.1361 | 502.0056 |

e = 0.05 | 52.3745 | 609.5629 | |

pH | e = 0.005 | 110.2026 | 245.6958 |

e = 0.05 | 113.5143 | 213.6950 |

## 7. Conclusion

Understanding the dynamic behavior of biotechnological processes, in which living cells are used as biocatalysts, is one of the most challenging issues nowadays due to the fact that thousands of biochemical reactions are taking place simultaneously. It is clear that process operation in the batch mode will be difficult due to time-varying parameters. In this case, estimation procedure is gaining the main importance to express the real process behavior by the mathematical models. For this purpose, various methods and the approaches exist. Selecting the most appropriate method for the system identification is the next critical step. It also affects the success of the process dynamic behavior estimation. In the production process of baker’s yeast in a batch operational mode with aerobic conditions using *S. cerevisiae* microorganism, system identification studies carried out easily by RLS algorithm and were found successful for identifying DO and pH variations with the air flow rate and the base flow rate manipulations. Prediction error defined as the ISE demonstrates that the estimation performance was good.

Selection of the model structure is crucial in expressing process behavior accurately. In this study, order of the model was found as na = 3, nb = 2 for both polynomial-type ARMAX model structure by examining the different order of the models. As the order of the polynomial A increases, the difference between the actual value and the predicted value decreases, which is desirable. However, the increase in B polynomial does not show any significant change. The forgetting factor was found as 0.96 and 0.97, while the initial value of covariance matrix was not as effective as the value of the forgetting factor, and the 1000 value was observed as appropriate for all experiments.

The theoretical ST-GMV control of DO and pH was successfully performed with the most suitable ARMAX models obtained from system identification. When the noise level is increased in the theoretical ST-GMV control, it is possible to achieve successful control under the constant set point condition with obtained models. In addition, the performance of a controller that uses unsuitable models decreases with the increase of noise levels. So as a conclusion, successful control can only be accomplished with a good system identification.