Details of metabolite and enzymes in case study 1.

## Abstract

This chapter presents an improved method for constrained optimisation of biochemical systems production. The aim of the proposed method is to maximise its production and, at the same time, to minimise the total amount of chemical concentrations involved in producing the best production. The proposed method models biochemical systems with ordinary differential equations. The optimisation process became complex for the large size of biochemical systems that contain many chemicals. In addition, several constraints as the steady-state constraint and the constraint of chemical concentrations also contributed to the computational complexity and difficulty in the optimisation process. This chapter considers the biochemical systems as a nonlinear equations system. To solve the nonlinear equations system, the Newton method was applied. Then, both genetic algorithm and cooperative co-evolutionary algorithm were applied to fine-tune the components in the biochemical systems to maximise the production and minimise the total amount of chemical concentrations involved. Two biochemical systems were used, namely the ethanol production in the Saccharomyces cerevisiae pathway and the tryptophan production in the Escherichia coli pathway. In evaluating the performance of the proposed method, several comparisons with other works were performed, and the proposed method demonstrated its effectiveness in maximising the production and minimising the total amount of chemical concentrations involved.

### Keywords

- biochemical systems production
- constrained optimisation
- computational intelligence
- cooperative co-evolutionary algorithm
- genetic algorithm
- Newton method

## 1. Introduction

Computational systems biology is a field of biological study that combines the knowledge of science and engineering. The objective of this field is to model the behaviour of biochemical reactions through a computational approach. Within this field, the structures and complexity of biological processes can be investigated as a system [1]. Therefore, computational systems biology enables the scientist to represent the biological process as a system. This allows the biochemical process in a living cell to be manipulated as a real factory and gives a way for scientists to improve the cell production (microbial production).

Integrating the knowledge of microbial production with genomic techniques and biotechnology processes creates the ability to manipulate a living cell to act like a real cell factory, thus opening new doors for researchers seeking to improve microbial productions [2]. One example of improving the microbial production is the optimisation of a biochemical systems production. Generally, biochemical systems can be defined as a series of chemical reactions found in a microorganism cell. With the knowledge of microbial production and genomic techniques, the biochemical systems can be represented as a dynamic mathematical model such as the Michaelis-Menten type [3], the stoichiometric approach [4], flux-balance analysis [5], metabolic control analysis [6] and biochemical systems theory (BST) [7]. Among these various choices, this work uses the BST representation to model the biochemical system. An advantage of using the BST is that prior knowledge of the mechanisms for each reaction is not required in order to build equations and the mathematical models can be designed by identifying the reactants and their interconnections [7]. For that reason, a canonical form that uses an ordinary differential equation (ODE) representation is suitable for modelling biochemical systems [1].

The optimisation of the biochemical systems production is a biotechnological process that aims to improve production by fine-tuning the chemical reaction. Besides that, the total amount of chemical concentrations involved also needs to be taken into account [8, 9]. To date, many studies have been carried out to develop methods for the optimisation of the biochemical systems production. Researchers tend to use the computational methods due to the flexibility of the mathematical models allowing to reduce the required costs and time. Popular methods used are the linear programming method (Vera et al. 2010; Xu 2012) and the geometric programming method [10, 11]. These methods depend on the definitions of the decision variables and the equality and inequality constraints, which could cause a convergence problem if the definition process is not performed well [12]. In order to overcome this problem, the present study was carried out using the stochastic method. The stochastic method operates on an evolving set of candidate solutions. In the evolving process, the candidate solutions are modified by the stochastic operator to produce the next generation. Using the stochastic operator, the search direction is determined by a random method, which makes it more efficient and robust [13]. In addition, the stochastic method does not rely on the manipulation of the objective function and constraints or the initialisation of a feasible point [14]. There are many stochastic methods that can be adopted for the optimisation process, among which is the genetic algorithm (GA) that has been widely found to be the most suitable method [15, 16, 17]. The GA works by representing the chemical reaction in the biochemical systems as a chromosome. The chromosome is then evolved and modified by a crossover and mutation process, with the intention to improve the solution.

As mentioned above, this chapter uses the BST method to model biochemical systems. Within the BST, two representations are typically used, namely, the S-system and generalised mass action (GMA). This study employs the GMA representation due to its ability to represent the nonlinearity of a biochemical systems and superior performance in optimisation [10]. The GMA uses the power law function, which is an ODE to model the biochemical systems. Applying only the GA for the optimisation of biochemical systems is not sufficient as the GA only fine-tunes the chemical concentrations. Therefore, a method is needed to deal with the biochemical systems. Implementing the Newton method for the biochemical systems is a good choice because the GMA model that represents the biochemical systems can be viewed as a nonlinear equations system [8, 18, 19, 20, 21, 22]. It also has been found that the Newton method is suitable for the nonlinear equations system due to the convergence speed, simplicity and ease of use [23, 24].

Using the Newton method with the GA in optimising the biochemical systems production is a good choice because the Newton method deals with the biochemical systems, while the GA is used to fine-tune the chemical concentrations by representing the chemical concentrations into a chromosome. However, several problems do occur when dealing with large biochemical systems that contain many chemicals and has complex structures where it makes the representation of the solution become complex and difficult to evaluate. Hence, a method is needed in order to overcome these problems by simplifying the representation of the solution. Using the cooperative co-evolutionary algorithm (CCA) is a good choice because it has the ability to simplify the representation of the candidate solution by decomposing a single chromosome into multiple sub-chromosomes [17, 25, 26].

In this chapter, a hybrid method known as the advanced Newton cooperative genetic algorithm (ANCGA) that combined the Newton optimisation method; the GA and the CCA were presented. This method models biochemical systems as a system of nonlinear equations and applies the Newton method to solve the system. In the optimisation process, the GA and the CCA were used to represent the variables in a nonlinear system in order to search the best solution. The GA was used to maximise the production, while the CCA was used to minimise the total amount of chemical concentrations involved. The ANCGA that proposed in this study is the improvement of the existing method [17]. The reason of proposing the ANCGA is due to the previous algorithm that takes longer time for the optimisation process. Moreover, the performance of the previous work can be improved in terms of maximising the production and minimising the total amount of chemical concentrations involved. In order to do that, this work introduces a concept of external population. The external population was used to store the best solution found in every generation. The reason of using this concept was to avoid the best solution found in every generation from being lost during the reproduction process. The methods used in this study are presented in the following order. Firstly, the proposed method is explained in detail. Case studies of the fermentation pathway in *Saccharomyces cerevisiae* (*S. cerevisiae*) and the *tryptophan* (*trp*) of biosynthesis in *Escherichia coli* (*E. coli*) are then presented. Following that, the results are discussed, and a brief conclusion is made.

## 2. The proposed method

This section describes the proposed ANCGA in detail. The ANCGA is proposed in order to improve the performance of the previous method [17] in terms of computational time. In addition, the ANCGA is hope to improve the performance of the previous method [17] in maximising the production and minimising the total amount of chemical concentrations involved. Figure 1 shows the flowchart of ANCGA. The ANCGA operates by treating the biochemical systems as a system of nonlinear equations and then uses the Newton method in solving the nonlinear equations system. Then, the GA and CCA were used in the optimisation process. The detailed operation of the ANCGA is described in the following steps:

Step 1—randomly generate the initial n sub-chromosomes in m sub-populations and create an empty external population. The number of sub-populations (m) must be the same to the number of variables in the nonlinear equations system. The sub-chromosomes represent the variables in the nonlinear equations system. The sub-chromosome is in the binary format.

Step 2—evaluate the sub-chromosome. The evaluation process starts when a representative from every sub-population is selected to produce a complete solution that is known as a cooperative chromosome. The selection of representatives is based on their fitness value, where the lowest values are selected first. This process is known as the sub-chromosome evaluation. The objective of this process is to minimise the total amount of chemical concentrations involved by letting representatives that have the lowest fitness values from every sub-population to be combined together.

Step 3—produce the cooperative chromosome. The cooperative chromosome is produced after all the selected representatives are combined together. The cooperative chromosome is the complete solution. The formation of the cooperative chromosome is depicted in Figure 2.

Step 4—evaluate the cooperative chromosome. In this step, the cooperative chromosome is tested. The evaluation process starts with an encoding of the cooperative chromosome into variables in the nonlinear equations system. Then, the Newton method is used to solve the nonlinear equations system. In the evaluation process, a condition might occur depending on whether or not the cooperative chromosome follows the set of constraints. If the cooperative chromosome follows the constraints, then the procedure goes ahead to Step 8; if not, it goes to Step 5.

Step 5—decompose the cooperative chromosome into sub-chromosomes. After solving the nonlinear equations system using the Newton method, the variables in the nonlinear equations system are decoded back into the cooperative chromosome form. Then, the cooperative chromosome is decomposed into multiple sub-chromosomes. After that, all the sub-chromosomes are sent back to their own sub-populations in order to perform selection and reproduction.

Step 6—select a pair of sub-chromosome for the reproduction process. The selection process is based on their fitness value, where the lowest fitness value is selected first.

Step 7—produce new generations. In this step, the genetic operators of crossover and mutation are applied on the selected sub-chromosomes in order to produce new generations. This process is performed up to the last sub-chromosome. Then, the new generation is processed again, starting from Step 2.

Step 8—copy the cooperative chromosome into the external population. The process is performed by copying selected cooperative chromosome that fulfil the constraints and put the selected cooperative chromosome into external population. This process is intended to keep the best solution in every generation and prevent it from being lost in the reproduction process (Step 7). At this stage, two conditions may occur: either the maximum number of generations is reached or the maximum number of cooperative chromosomes in the external population is achieved. If these two conditions are fulfilled, the procedure jumps to Step 10; otherwise, the procedure continues to the next step. During this process, if the maximum number of cooperative chromosomes in the external population is reached before the maximum number of generations is achieved, the cooperative chromosome that has the lowest fitness value is deleted and replaced by a newly copied cooperative chromosome. However, if the maximum number of generations is reached before the maximum number of cooperative chromosomes in the external population is achieved, the procedure moves to Step 10.

Step 9—select some of the cooperative chromosomes from the external population. This process refers to the elitism of external population concept. The elitism of external population concept works where some (with y probability) of the cooperative chromosomes from the external population are selected and combined with the current sub-chromosomes. The selection process is based on their fitness value, where the cooperative chromosomes from the external population that have the highest fitness value are selected first. Then, this process goes back to Step 5.

Step 10—choose the best solution. The best solution is chosen among all the cooperative chromosomes in the external population. The selection is based on the fitness values of the cooperative chromosomes, where the cooperative chromosome with the highest fitness value is chosen.

Step 11—return to the best solution. This step decodes the selected cooperative chromosome into its real value (the variable in the nonlinear equations system) and gives the best solution set.

## 3. Case studies

In this section, the effectiveness and efficiency of the ANCGA is demonstrated. The effectiveness of the proposed method refers to the ability of the ANCGA to obtain the best result, while the efficiency refers to the ability of the ANCGA to maintain its performance in producing the best result in several case studies. Two case studies were used, namely, the *S. cerevisiae* pathway and the *E. coli* pathway. In order to test the performance of the ANCGA, a Java program based on two Java libraries, namely, jMetal [27] and JAMA of the version 1.0.3, was developed. The jMetal library can be downloaded from

### 3.1 Case study 1: the ethanol production in *S. cerevisiae* pathway

In this case study, the ANCGA was used to optimise ethanol production in the *S. cerevisiae* pathway. The GA was used to represent the chemical reactions in the *S. cerevisiae* pathway, which were metabolites and enzymes. Details of the metabolites and enzymes, including the initial steady-state values, are presented in Table 1. The pathway was suspended in a cell culture at p. 4.5 and had the following ODE models [28].

Metabolite/enzyme | Symbol | Initial steady-state value |
---|---|---|

Glc_{in} | X_{1} | 0.0345 |

G6P | X_{2} | 1.0110 |

FDP | X_{3} | 9.1440 |

PEP | X_{4} | 0.0095 |

ATP | X_{5} | 1.1278 |

V_{in} | Y_{1} | 19.70 |

V_{HK} | Y_{2} | 68.50 |

V_{PFK} | Y_{3} | 31.70 |

V_{GAPD} | Y_{4} | 49.90 |

V_{PK} | Y_{5} | 3440.00 |

V_{Carb} | Y_{6} | 14.31 |

V_{Gro} | Y_{7} | 203.00 |

V_{ATPase} | Y_{8} | 25.10 |

Eq. (2) shows the fluxes at the steady-state condition.

For the total amount of chemical concentration involved, it can be formulated as follows:

In the optimisation process, the GMA model was treated as a nonlinear equations system, where all the GMA models were set to be equal to 0. This gave all the ODE models in Eq. (1) the following forms:

For the metabolite concentration constraint, the constraint was set to 20% from the steady-state value, which was in the range between 0.8 and 1.2 [8, 29]. Thus, the constraint for this case study became as follows:

Meanwhile, the enzyme concentration constraint was set in the range between 0 and 50 from the steady-state value [8, 29]. The enzyme concentration constraint can be formulated as follows:

### 3.2 Case study 2: the tryptophan biosynthesis in *E. coli* pathway

For case study 2, the ANCGA was used to optimise the end product of this pathway, which was *trp* production. The complete description of this pathway was provided by Xiu and colleagues [30]. The GMA models of this pathway are given as follows:

where *X1* is the mRNA concentration, *X2* is the enzyme concentration and *X3* is the *trp* concentration. The rates of all reactions in this pathway at steady state are given as follows:

The *trp* production in this case study is given by the reaction *V34* [31]. This leads to optimisation that can be formulated as follows:

For the total amount of chemical concentrations involved, it can be formulated as follows:

Similar to case study 1, the GMA model was set to be equal to 0, thus Eq. (8) became as follows:

In this case study, the GA and CCA only represent several chemical concentrations. This was because not all chemical concentrations were being tuned [1, 10, 11]. The chemical concentrations that tuned were *X1* up to *X6* and *X8*. These chemical concentrations including their initial steady states are summarised in Table 2. For the other chemical concentrations which were *X7* and *X9* up to *X13*, fixed values were used [1, 10, 11]. Eq. (12) lists the range of these chemicals.

Reaction | Initial steady-state value |
---|---|

X1 | 0.0345 |

X2 | 1.0110 |

X3 | 9.1440 |

X4 | 0.0095 |

X5 | 1.1278 |

X6 | 19.70 |

X8 | 25.10 |

## 4. Results and discussion

In performing the experiments, many parameter settings were used. The list of all parameter settings used in this study is listed in Table 3, whereas Table 4 presents the parameter settings in producing the best result. The binary coding was used to represent the chemical concentrations. For the Newton method, fixed parameters were used, namely, 50 for the maximum number of iterations and 10^{−6} for tolerance.

Parameter | Rate |
---|---|

Number of sub-populations | Depend on the number of variables in nonlinear equations system |

Number of sub-chromosomes in sub-population | [100,110,120,130,140,150] |

Number of chromosomes in external population P | [100,110,120,130,140,150] |

Maximum number of generations | [100,110,120,130,140,150] |

Crossover rate | [0.1,0.2,0.3,0.4,0.5] |

Mutation rate | [0.1,0.2,0.3,0.4,0.5] |

Elitism rate | [0.1,0.2,0.3,0.4,0.5] |

Parameter | Case study 1 | Case study 2 |
---|---|---|

Number of sub-populations | 11 | 7 |

Number of sub-chromosomes in sub-population | 150 | 140 |

Number of chromosomes in external population P | 100 | 100 |

Maximum number of generations | 150 | 130 |

Crossover rate | 0.3 | 0.4 |

Mutation rate | 0.1 | 0.1 |

Elitism rate | 0.2 | 0.2 |

The full results obtained by the ANCGA when applied on *S. cerevisiae* pathway are given in Table 5. At the best solution, the ANCGA was able to increase the *F1* (ethanol production) up to 53.02 bigger than its initial steady-state value. For the *F2* (total amount of chemical concentrations involved), the proposed method was able to reduce it to 293.5249. All metabolites and enzymes fulfilled their constraints, with all the metabolites staying in the range of 0.8–1.2, while all the enzymes were in the range of 0–50. The performance of the ANCGA was assessed by comparing the result obtained by ANCGA with other works, and the comparison results are listed in Table 6. As shown in the table, the ANCGA produced higher results as compared to other methods. In addition, to verify the results achieved by the ANCGA, an average of 100 independent runs was recorded. The results are summarised in Table 5. It shows that the average result for the metabolites and enzymes fulfilled their constraints, whereby they were in their optimum range, thus leading to the conclusion that the ANCGA is able to produce reliable results. It can be said that the ANCGA can produce higher production of ethanol as compared to the methods used in other studies.

Variables | Best solution 1 | Average |
---|---|---|

X1 | 1.1240 | 0.9951 |

X2 | 1.0322 | 1.0018 |

X3 | 0.9900 | 1.0053 |

X4 | 1.1407 | 1.1297 |

X5 | 1.0001 | 0.9831 |

Y1 | 49.8103 | 49.9793 |

Y2 | 45.3702 | 45.0767 |

Y3 | 45.3452 | 49.8103 |

Y4 | 48.5112 | 47.4064 |

Y5 | 49.4448 | 49.3426 |

Y8 | 49.7563 | 49.7876 |

F1 | 53.0200 | 52.7499 |

F2 | 293.5249 | 294.5178 |

The full results of the *E. coli* pathway are presented in Table 7. The ANCGA was able to improve the *F1* (production of *trp*) to 3.9774 from its initial steady state. Meanwhile, the proposed method was able to reduce the *F2* (total amount of chemical concentrations involved) to 6006.4280. All variables representing the chemical reaction followed their constraints and were in the optimum range. To assess the performance of the ANCGA, the results achieved were compared to the results of other methods, with the details of the comparison shown in Table 8. As presented in the table, the *F1* of the ANCGA was higher when compared to the methods employed in other works. Similar to the previous case study, 100 experiments were conducted, and the average result was calculated in order to validate the ANCGA results. Table 7 presents the average result. From the data in Table 7, it can be concluded that the ANCGA is reliable in performing the optimisation of this pathway because the average of all the variables follows their constraints. From the observations presented in Tables 7 and 8, it can be concluded that the ANCGA is effective in optimising the *trp* production as well as producing reliable results.

Variables | Best solution | Average |
---|---|---|

X1 | 0.8064 | 1.0742 |

X2 | 0.8046 | 1.1085 |

X3 | 0.8000 | 0.8000 |

X4 | 0.0054 | 0.0054 |

X5 | 4.0116 | 4.4694 |

X6 | 5000 | 5000 |

X8 | 1000 | 1000 |

F1 | 3.9774 | 3.9616 |

F2 | 6006.4280 | 6007.4575 |

The external population concept used by ANCGA can be validated by comparing it with the previous method proposed in [17]. The aim of the external population concept was to reduce the computational time and the number of generations. To learn the effect of the external population concept, several experiments were conducted. To investigate the decrease in the number of generations, *F1* was set to 52.5 for case study 1 and 3.90 for case study 2. After *F1* was achieved, the process was stopped. This helped to investigate which method required more generations in achieving the target production. Figures 3 and 4 illustrate the comparisons of all case studies. In both figures, the maximum number of the external population was smaller as compared to the maximum number of the previous method in achieving *F1*. This was caused by the concept of external population that was introduced in this study. By using this concept, the best solutions found in the iteration process could be maintained and thus enabled the number of generations to be reduced. In addition, it was found that this concept tended to converge faster than the previous method. This meant that the use of the external population concept allowed faster search of the best solution. In conclusion, the external population concept had an impact in reducing the number of generations and helped in faster convergence as compared to previous methods. To determine the statistical significance between the proposed method and previous methods, the paired t-test and the Wilcoxon signed-rank test were used. The result of the statistical tests showed that all p-values were <0.05, thus confirming that the proposed method significantly improved the previous method.

Meanwhile, to investigate the decrease in computational time, the maximum number of generations was not set, but *F1* was set to 52.5 for case study 1 and 3.9 for case study 2. After *F1* was achieved, the process was terminated. Table 9 lists the computational time results, and it was found that the ANCGA required less time as compared to the method in [17]. This situation occurred because the high-quality solutions were stored in the external population and then combined with the current solution in the optimisation process. Copying the high-quality solutions into the external population prevented them from being lost (because the optimisation process involving crossover and mutation operation could lose the high-quality solutions). By storing the high-quality solutions into the external population, it would be able to keep the best solution until the optimisation process stopped. To determine the significant improvement of the proposed method against the previous method, the paired t-test and the Wilcoxon signed-rank test were used. The p-value from both tests was <0.05. From this finding, the proposed method and the previous method were statistically different from each other, and the improvement of the proposed method could be accepted.

Method | Case study 1 | Case study 2 |
---|---|---|

ANCGA | 75.56 | 38.07 |

Previous method [17] | 80.40 | 40.45 |

## 5. Conclusion

Improving production has become an important issue in the optimisation of biochemical systems. Many factors need to be considered to ensure optimal production. In this work, a hybrid method for constraint optimisation of the biochemical systems production known as the ANCGA was presented. The ANCGA was developed based on a previous method [17], where the ANCGA combined the Newton method, GA and CCA. This study introduced a concept of external population. The aim of this concept was to reduce computational time. In this work, the biochemical system was modelled by a nonlinear equations system. In the optimisation process, the Newton method was employed to deal with a system of nonlinear equations. The GA and CCA were then applied to fine-tune the chemical concentration value in the nonlinear system in order to search for the best solution. During the optimisation process, the high-quality solutions were copied and stored into the external population. The purpose of this process was to avoid the loss of high-quality solutions during the optimisation process. Then, some solutions from the external population were mixed with the next generation of solutions. By doing this, the computational time and number of generations were reduced. In the present study, the proposed method was applied on two case studies, and better results were obtained as compared to the methods presented in other works. In addition, the results were validated, and they demonstrated that the constraints of all the components in the biochemical system were fulfilled. Thus, it can be concluded that the performance of the ANCGA is effective and reliable in producing the best result.

## Acknowledgments

Special appreciation to Universiti Malaysia Pahang for the sponsorship of this study by approving the RDU Grant Vot No. RDU180307. Special thanks to the reviewers and editor who reviewed this manuscript.