Strategies for Enhancing Product Yield: Design of Experiments (DOE) for Escherichia coli Cultivation

E. coli is considered one of the best model organism for biopharmaceutical production by fermentation. Its utility in process development is employed to develop various vaccines, metabolites, biofuels, antibiotics and synthetic molecules in large amounts based on the amount of yield in shake flasks, bioreactors utilised by batch, fed-batch and continuous mode. Production of the desired molecule is facilitated in the bioreactor by employing strategies to increase biomass and optimised yield. The fermentation is a controlled process utilising media buffers, micronutrients and macronutrients, which is not available in a shake flask. To maximise the production temperature, dissolved oxygen (aerobic), dissolved nitrogen (anaerobic), inducer concentration, feed or supplementation of nutrients is the key to achieving exponential growth rate and biomass. Design of experiments (DOE) is critical for attaining maximum gain, in cost-effective manner. DOE comprises of several strategies likewise Plakett-Burman., Box–Behnken, Artificial Neural Network, combination of these strategies leads to reduction of cost of production by 2–8 times depending on molecules to be produced. Further minimising downstream process for quickly isolation, purification and enrichment of the final product.


Introduction
E. coli is most studied bacteria learned of symbiotic relationships with human for years derived after culturing F Plasmid of a 1922 isolate from a diptheria patient. Production of biopharmaceuticals from E. coli is in practice since 1965. Also, the it served for production of biopharmaceuticals such as recombinant proteins, metabolites by several companies namely BPB Bioscience, Agilent technologies, Promega, Takara, Tonbo Biosciences, New England Biolabs, Novagen and Lucigen. They are optimised for expression of challenging proteins difficult to express, purify and folding in native conformation. Moreover, formation of proper disulphide bonds and refolding of membrane proteins is also achieved by using newly commercially

Critical fermentation ingredients
Critical fermentation ingredients are media components which cannot be replaced. These can only be standardised for maximised yield. Typically, buffers and nitrogen sources such as yeast extract, tryptone are not changed. Since E. coli is the most studied and highly utilised systems for producing various enzymes, antibodies, and biological products. Bacteria require specific conditions for growth attributed to factors such as oxygen, pH, temperature, and light. Bacterial growth is divided into lag, exponential (log) and stationary phase. During the initial stage, cellular activity in a rich nutrient medium allows cells to synthesise proteins, cells increase in size, but no cell division occurs in the phase. During the exponential phase, metabolic activity is high as DNA, RNA, cell wall components, or machinery needed for division are generated. The stationary phase is triggered due to the accumulation of waste products and depletion of nutrients. During the late log phase, proteins are induced by the addition of allolactose analogue, Isopropyl β-D-1thiogalactopyranoside (IPTG) [63,64]. The expression of recombinant products is controlled by promoter systems like T5 and T7 RNA Polymerases. Alternative promoter systems, such as auto-inducible phoA promoter system [13], the saltinducible promoter (proU), arabinose-inducible promoter (pBAD) [65], the heatinducible phage Lamba promoters (pL and pR), the cumate-inducible T5 promoterbased system [66], and the cold-inducible cspA promoter-based system [67] are also valuable for the biologics production. The cost of biologics production is due to the high cost of raw material and fermentation media. In the biologics industry, the more straightforward, cheaper, and reproducible process is highly appreciated.  Fermentation media is a critical component, and a balance of nutrients is needed for increasing productivity. Standardisation of E. coli fermentation requires identifying a combination of various media components available, e.g., Yeast extract, Soyabean meal, Bactotryptone, Meat extract and Enzymatic digest of plant and animal protein (Trypic or casein enzymes). There are various carbon sources (glucose, glycerol, sucrose, lactose etc). Additives for fermentation are vitamins, amino acids, and trace elements. Designing a media needs to evaluate the requirement of each of the individual component along with the additives. The design of model using statistical approach having multiple parameters in consideration, followed by validation of defined parameters using fermentation. This is achieved using DoE experimentation (Figure 1). Experiments are carried out at Shake flash level with selected nutrients such as carbon and nitrogen sources. Small scale studies are carried out to define as batch or fed-batch fermentation. Next stage is to screen the components available for fermentation of batch/fed batch. Once the components are finalised the possibility of Scale-up is evaluated based on the availability from the source. Finalisation of media components is carried out using shake flask with DOE of media buffer additives and inducers. Evaluation of various product outcome biomass ratio and validation of protein quality is also study with 3 to 5 selected designs.
Once the nutrient and components are finalised pilot scale batches setup to study biomass to product ratio. Further optimisation of dissolved oxygen and temperature in fermentation is carried out by the DOE approach. If results are not reproducible with the selected condition, other near possible designs are studied to finalise the medium and process for fermentation (Figure 1). The process of selection of components is based on outcome in an experiment calculated by Biomass (OD), product output g L À1 and cost of ingredients. The process is clearly defined in (Figure 1).

Batch fermentation
In microbial batch, fermentation cultivation is done in a fixed volume of medium in a fermenter. The standard inoculum in the fermenter is 50-200 mL of shake flask volume in 2-5 L of fermentation media. The batch fermentation typically OD 600 is 20-40 in 8-12 h time. The microbial growth depletes the nutrients resulting in the accumulation of by-products; there is a continuous change in the culture environment. After completion of the batch, media and cells are harvested. The advantages are batch fermentation, ease of operation, low risk of contamination, high yield of protein to biomass in less time of fermentation, and majorly for soluble or excreted proteins. Typical disadvantages are relatively long downtime between batches due to vessel setup and sterilisation, low cell/biomass densities, due to cleaning. DOE is needed to optimise the required nutrients and minimise product accumulation during fermentation. Typical batch fermentation media constitutes Yeast extract, Bactotyptone (or Soybean meal) 10 g Àl to 24 g Àl , respectively. Buffers of Sodium and potassium phosphate in combination to reach pH 6.8. to 7.0, 100 X amino acid solution, Trace elements (400x) is Fe(III)citrate (40 mg ml À1 ) [54]. The typical medium components are listed in the ( Table 2) for batch fermentation as a base design to start optimisation.

Fed-batch fermentation
Fed-batch fermentation is a standard mode of fermentation in the bioprocess industry. Typically, fed-batch fermentation starts at the end of batch fermentation. E. coli is adapted and cultivated in defined media. In Fed-batch fermentation, cells are inoculated and grown in batch mode for 10-15 h. Once all the nutrients are depleted, evident by analysing the amount of glucose in the medium, dissolved oxygen levels are increased to 60-80%. The Fed-batch fermentation is initiated by starting to feed of Glucose, Vitamins, amino acids, and trace elements. The feed is added to the medium to allow the volumetric cell to increase the mass concentration exponentially. The growth rate of is changed to 0.12-0.22 μ h À1 during fed batch stage. These equations determine the growth rate in the medium.
The first equation, ms (i), is the value of feed rate at the initiation of the fedbatch phase at time ti. μ set is the specific growth rate, m is specific maintenance coefficient, Y X/S is yield coefficient, V is the bioreactor volume, and So is the initial glucose concentration. In the second equation, ms(t) is the rate of addition of substrate (g hÀ1 ). After induction of protein expression, the specific growth rate of E. coli is typically reduced to 0.1 μ hÀ1 . The cells are harvested after completion of the run. Suppose the growth rate is not specified during fermentation, constant accumulation of several toxic metabolites produced during the fed-batch process acetate, formate, succinate, and lactate, resulting in oxygen limitation fed-batch. Therefore, it is recommended to wash cells with Tris-EDTA buffers after washing E. coli cells are stored or lysed for downstream processing. The distinct advantage of resuspending E. coli after completion of batch reduces protein degradation due to metalloproteases [46].
Start of feed is determined by measuring the concentration of substrate in the fermentation broth typically after 10-12 hrs of batch. The feeding strategy should be designed so that the growth rate is maintained to limit the production of toxic formate, acetate and other metabolic compounds, enhancing bacterial growth. The growth of bacteria and conversion of feed to biomass is maximum when the exponential growth phase is maintained. The utility of fed-batch and importance is obtaining high cell density and biomass, leading to increased production of the high amount of product yield. The fed-batch is applicable to increase product yield by limiting growth rate and controlled substrate utilisation ( Table 2). The media for batch and fed-batch fermentation is listed ( Table 2).

One factor at a time-classical media optimisation methods
Selection of one-factor-at-a-time (OFAT) is a traditional method for optimisation of media. In this strategy only one factor is varied keeping all other parameters constant. The usual choice is ease and convenience; it makes OFAT the most preferred choice for formulating, designing, optimising, and scaling up the fermentation medium [68]. This method is still popular among many research groups for developing the medium for fermentation. Physical parameters, supplementation, removal, replacement and feedback experiments are primary considerations during OFAT. They comprise of growth temperature, operating pressure, size of nutrients or extracts. The bioprocess in fermentation controls constant supply of nutrients, removing metabolic products and toxic compounds, and constantly disseminating the nutrient solids, buffers and salts in liquid and gaseous phases. There is a constant evolution of design improvement for agitation and aeration; these allow better control over flow dynamics, the minimal effect of viscosity, and even circulation of components and nutrients. The healthy growth of culture in the batch is maintained by supplementing nutrients such as M9 medium with FeSO 4 •7H 2 O; M63 medium with KOH; A medium MgSO 4 •7H 2 O, 20% Glucose or sugar, vitamins, casamino acids or L-amino acids [69]. Removal experiments are required for the identification of critical components needed for the media. Certain media affect the reduction of formate, acetate, and reduction of pH. There are few examples associated with removing glucose from complex media to prevent inhibition of bacterial growth. Replacement experiments identifies correct nutrients complexes for nitrogen source yeast, soy peptone, bactotryptone, meat extract and protein powder. The carbon sources utilised are glucose, glycerol, sucrose, lactose and others. The use of OFAT for designing of media for fermentation limits number of experiments, the approach is suitable for production of metabolite. In one study, precursor carbohydrate phosphotransferase system (PTS) encoding genes a vital DXP pathway were deleted. This resulted in the enrichment of Isoprenoid phosphoenolpyruvate. Growth medium and production of lycopene (a C40 isoprenoid) resulted in maximisation by these culture conditions [70].
Defined media recipes used in the fermentation of E. coli include nine mineral salts in usually salts of ammonium, potassium and sodium cations; and carbonate, chloride, nitrate and sulphate anions. Glucose, Glycerol and ammonia were identified as potential additional sources of carbon and nitrogen, respectively. EDTA is a chelating agent, and seven trace elements are Iron chloride, Zinc chloride, Cobalt chloride, Sodium molybdate, calcium chloride, cuprous chloride, and Boric acid. The vitamins included in the experimental design solutions were Riboflavin, Panthothenic acid, nicotinic acid, Piridoxin, Biotin, and folic acid. In complex media, yeast extract ranges from 1 to 1.6 g L À1, is varied during preliminary concentrations. Screening designs often involve many factors and allow for initial differentiation of significant and non-significant factors and an estimation of the magnitude of the critical factors. A Full Factorial design, including 24 factors, would require almost 20 million experimental treatments. Fractional Factorial platform of the Design expert or JMP software can be used to generate 32 experimental treatments, randomly distributed into eight blocks. Each block comprised eight treatments and provided information on a technical error, a positive control, and negative control. These results are calculated using standard algorithms in the software.
E. coli growth was studied with nine continuous factors. The media ingredients in the first iteration were found to be influencing optical density in relation to time. For this study custom design platform of the Design expert software was used to construct a design that balanced the need to maximise the information that could be gathered from the experiment whilst minimising resources and time. Total number of experiments and concentration of yeast extract tested in 50-60 experiments is determined 10 g L À1 to be the optimal concentration [15].

Statistical designs for E. coli Media optimization
Typically, statistical medium optimisation is beneficial in improving overall product output, reduces time needed for process development and cost. The microbial processes have complex reactions. Evaluation of results statistically increases the reliability of results, further reduction in the number of experiments. In one of the study, the GDP mannose pyrophosphate yield was improved upto 100% after conducting 33 experiments [61]. Improvement of media by DOE is for understanding various test variants, multiple investigations, and uniform pattern. The results obtained after the various experiments are used to predict media improvement using mathematical models. The current advancement in statistical techniques provides rapid analysis of experimental findings. Meticulously planned experiments can enhance the desired outcomes using DOE strategies. For designing a full factorial, possible combinations of relevant factors, e.g. temperature, pH, buffers, carbon, nitrogen sources, strain, are considered. Similarly, partial factorial design is considered if knowledge about few components is not available. These experiments if planned and output is studied properly results in quick and definitive reproducible processes.

Identification of critical components: Plakett Burman design
Cultivation at a large scale requires a medium that will produce maximum yield of product per gram of substrate, maximum biomass and minimum undesirable byproducts. Also, consisten with minimal problems during media preparation and sterilisation. While considering the biomass in isolation, it must be recognised that efficiently grown biomass produced by an optimised high productivity growth rate is not necessarily best suited for its ultimate purpose, such as synthesising the desired product. Different combination, and sequences of process condition need to be investigated to determine phases, specific sets of conditions during optimisation. OFAT for media optimisation using traditional replacement experiments with keeping one factor at a time for nutrient, antifoam, pH, temperature are highly time-consuming and expensive. Minimum number of experiments and development of process in short duration of time is prerequisite to media optimisation. Therefore, other alternative strategies must be considered, which allows more than one variable to be changed at a time, and these methods have been described in earlier studies by Placket and Burman 1946, and Hendrix 1980 ( Table 3).
The Plakett Burman algorithm is a rapid statistical approach enables us to obtain the physicochemical parameters and factors influencing the fermentation process with the limited number of planned experiments [71]. For the given number of observation, the linear effect of all aspects are screened with maximum accuracy. The design is practical when investigating a large number of factors to produce an optimal or near-optimal response. Statistically optimised media design along with kinetic models characterises the fermentation behaviour more rapidly to achieve maximum productivity. Also, when complex carbon-nitrogen substrates, such as yeast extract or peptone, are used together with carbohydrate substrates, the Dissolved Oxygen (DO) change is not as significant when the carbon source is depleted, as cells continue to utilise the complex substrates [72]. The use of a good reliable model is essential to develop better strategies for optimising the fermentation process [73]. In one the study, during production of succinic acid [71] increasing output was achieved by combining Plackett-Burman design (PBD), steepest ascent method (SA), and Box-Behnken design (BBD) for fermentation medium. PBD identified Glucose, yeast extract, and MgCO 3 as critical components with optimal concentration was located to be 84.6 g L À1 of glucose, 14.5 g L À1 of yeast extract, and 64.7 g L À1 of MgCO 3 [2]. Also, the productivity was enhanced by 67.3% and 111.1%, respectively. Microbial fermentation for L-methionine (L-Met) production was enhanced by Plackett-Burman (PB) design, and Box-Behnken design (BBD) estimated glucose 37.43 g/L, yeast extract 0.95 g/L, KH 2 PO 4 1.82 g/L, and MgSO 4 .7H 2 O 4.51 g/L), L-Met titre was increased to 3.04 g/L from less than 2.0 g/L. an increase of 38.53% and 30.0% compared with those of the basal medium, respectively. Furthermore, higher L-Met productivity of 0.261 g/L/h was obtained, representing 2.13-fold higher in comparison to the original medium [14].
In another study, yield of O-succinyl-l-homoserine (OSH) was improved through multilevel fermentation optimisation; Plackett-Burman design was used to screen out three factors (glucose, yeast and threonine) from the original 11 factors that improved the titre of OSH.
Plackett Burman randomisation is an excellent tool for the determination of the effect of variables for optimisation. Once such approach for preparation of Bacterial Ghosts (BGs) preparation is established using these methods. The twelve experiments containing either the +1 or À 1 value for each variable in each experiment in random arrangement have been conducted simultaneously to get the best results and enable the best possible comparison. The BGQ has been given 100% quality as  10, while ten cells have been evaluated as either bad or good. This will decrease the range of the differences if we use %. Unexpectedly, E. coli, which is more sensitive to the SDS than E. coli BL21 (DE3), gives better results with most of the experiments. Nine experiments provide the number 10 out of the twelve experiments. Two give the number eight, and the only one shows the number 0, which means inferior preparation. The experimental Design is based mainly on the determination of Minimum Inhibition Concentration (MIC) and the Minimum Growth Concentration (MGC) of critical concentrations from chemical compounds able to convert viable cells to BGs. The mean of +1 experiments has been calculated using the following formula: P þ1 ðÞ =n þ1 ðÞ : While the standard of À1 experiments has been calculated using the following formula: P À1 ðÞ =n À1 ðÞ :The main effect of both +1 and À 1 for each variable has been calculated from the following formula: Main product = P þ1 ðÞ =n þ1 ðÞ À P À1 ðÞ =n À1 ðÞ . Multiple linear regression analysis with ANOVA test of Plackett-Burman design has been performed on the BGQ as responses. A multiple linear regression analysis for the data of the BGQ has been committed to study the relationship between different variables and their level of significance regarding BGQ as a response. From the analysis of the Coefficient, Standard error, T Statistic, P-value and Confidence level % for each has been calculated. The confidence level has been calculated from the formula The confidence level% = 100 * (1 À P-value). The P-value from the ANOVA analysis for the BGQ response was determined to analyse the relationship between the variables at the 90% or higher confidence level. The model created from the analysis of Plackett-Burman experimental design using multiple regression analysis is based on the 1st order-model Y ¼ ß0 þ P ßiXi. Where Y is the predicted response, ß0 model intercept, ßi variables linear coefficient. ANOVA test was generated for each response to determine the relationship between the variables at the 90% or higher confidence level [74]. This improvement was applied for optimisation of production of chimeric protein PfMSP3-MSP1 19 resulted in critical concentrations are calculated are listed in Table 4.

Optimisation of fermentation conditions: Box-Behnken response surface methodology
George E. P. Box and Donald Behnken in 1960 developed the Box-Behnken response surface method. This algorithm establishes a comparison between composite central, three-level full factorial and Doehlert designs to optimise the fermentation conditions. In one of the example result of optimisation, the titre of O-succinyl-l-homoserine (OSH) reached 102.5 g l À1 , which is 5.6 times higher than before (15.6 g l À1 ) [5]. Similarly, by Box-Behnken combination and Plackett-Burman design and were optimised further by employing the Response Surface Methodology, O-acetylhomoserine OAH production was up to 9.42 and 7.01 g/L. The effect of glycerol, ammonium chloride and yeast extract were screened for fermentation conditions [3].
scFv anti-HIV-1 P17 protein was optimised by the sequential simplex method. Plackett-Burman design (PBD) and sequential simplex were combined with the aim of improving feed medium for enhanced cell biomass, relative protein to biomass ratio. The scFv anti-p17 activity was enhanced by 4.43, 1.48, and 6.5 times more than batch cultivation, respectively [29].
DNA vaccine pcDNA-CCOL2A1 production was increased using the response surface method (RSM) in E. coli DH5alpha in fermentation, therapeutic DNA vaccine pcDNA-CCOL2A1 markedly increased from 223.37 mg/L to 339.32 mg/L under optimal conditions, and a 51.9% increase was observed compared with the original medium [21]. Statistical experimental design methodology for fermentation conditions (dissolved oxygen, IPTG, and temperature) improved rPDT production by E. coli. 15 Box-Behnken design augmented with centre points revealed that IPTG and DO at the centre point and low temperature would result in high yield. The optimal condition for rPDT production was found to be 100 mM IPTG, DO 30%, and temperature 20°C [23]. In another application, E. coli drug susceptibility testing was done by on-chip bacterial culture conditions using the Box-Behnken design response surface methodology for faster drug susceptibility, optimal growth parameters were determined within 6-8 h, MICs determination in 2-6 h of individual drugs (antibiotics and TCMs) to improve the clinical management of bacterial infection [75].

Functional characteristics with minimum experiments-Taguchi design
There are several challenges associated with the PBD and Box-Behnken design. To overcome these challenges of Box-Behnken new array based on "Orthogonal Array" was developed. Using this method, less number of experiments, instead of full factorial, is implemented. The system and technique provide control over three stages, likewise system strategy, tolerance design and parameters designing. The strategy design helps in determining tolerance, affecting factors in product output. Taguchi design is using a number of OAs to initiate the experimental setup, these arrays are utilised to suit the number of experimental iterations. Second step is conducting total tests with orthogonal arrays. These experiments are decided as per number of trial experiments as per Taguchi design, followed by randomisation of experiments for determining the output. This design analyses main effect and twofactor interactions. Noise for uncontrolled experimental variables is considered, focal point for two-point analysis. Taguchi methodology removes effect of noise due to uncontrolled variables; this is better as compared to PBD [76]. The Taguchi method provides help in functional characteristic for capturing acceptable deviations. Human insulin-like growth factor I (hIGF-I) was produced in one study in E. coli, 32y media, 32°C and 0.05 mM IPTG. The unimproved hIGF-I was 0.694 g L À1 which improved to 1.26 g L À1 using optimum conditions [77].

Deciphering outcome with-central composite design
This design is widely used in building a second-order (quadratic) model in response surface methodology (RSM). It consists of factorial Design with two levels +1 and À 1; centre points, factorial Design in experiments with median values; and star points identical runs for centre points except for one factor considering values above median and below the median. The number of star points is double the number of factors used. CCD is defined on the level of factors: as Face centred CCD (CCF), Inscribed CCD (CCI) and Circumcentered CCD (CCC). E. coli BL21(DE3) is utilised for optimum Design for extracellular production of recombinant human epidermal growth factor (rhEGF) by CCD. This resulted in 122.40 μgmL À1 rhEGF concentration in medium 20 h after induction. In 2 L fermentation, medium optimised yield to 1.5 fold and induction time to 3 h [78].

Predicting effects of responses-partial least squares modelling
The effect of media ingredients, interactions with the system (X) and co-relation in response to culture ΔOD 600 (Y) is defined by the Partial least squares (PLS) model. PLS covariance of matrix design and outcome are inferred accurately by virtue of small underlying events not measured directly. The ideal or latent variable (LVs) to study outcome needs to be carefully evaluated to avoid overfitting training data. The prediction of the accuracy of models for DoE iterations with multiple values, the Root Mean Predicted Residual Error Sum of Squares, is with the lowest value of LVs, having the slightest error. The significance of LVs is calculated by the Voet T2 test. The score of media design and component determined by OD600, the threshold of 0.8 is accepted. The lower score of model threshold defines to remove from subsequent designs; these threshold values are considered to study the positive and negative effect on the contribution of various factors associated with the increasing growth of the culture. In one study, 2D spectrofluorometry was utilised for fermentation processes to monitor the fermentation process online to produce extracellular 5-aminolevulinic acid (ALA). Various chemometric methods used for analysis of the spectral data are principal component analysis (PCA), partial least square regression (PLS) and principal component regression (PCR). PCA results visualised and considered for online fermentation monitoring. PCR and PLS compared for correlation between the 2D fluorescence spectra, PLS had slightly better calibration and prediction performance than PCR [79].

Minimal product trial experiments-definitive screening designs
Traditional definitive screening designs (DSD) require a low number of experiments and trials to determine the positive outcome. Jones and Nachtscheim developed Jones DSD methodology. These designs are popular in biopharma due to the relatively small number of experiments. These designs use three levels for each factor, allows estimation of nonlinear effects. Evaluate the number of runs to determine X is 2 X + 1 or 2X + 3 for even and odd values. Typically, the Design of X = 6 is used if X < 6. There are few dummies runs with additional factorial or centre points are added to precisely determine the experimental error. In DSD, few different factorial trials or centre point trials are added to the initial design to define and evaluate the experimental error. One such example of DSD is for upstream process development for cell growth and increased product output in fed-batch high-cell-density fermentation. The expression of the desired gene cloned in the plasmid was under the control of the phoA promoter [13]. Simultaneous evaluation of phosphate concentration from 2.79 mM to 86.4 mM was designed using DoE. Several parameters, phosphate content, temperature, pH, and DO evaluated using a Definite screening design (DSD), resulting in determining each parameter's impact on product formation. Similarly, a 24-bioreactor ambr250™ system for fermentation utilised 10-factor DSD to characterise the process of demonstrating 16 batches reproducible workflow for recombinant protein production. This strategy was further evaluated by QbD approaches to assess techniques for late-stage depiction in small experiments and subsequently leading to large scale fermentation parameters improvement [80].

Stepwise regression and artificial neural network modelling
Artificial Neural Networks (ANN) are known for parallel, and continuous learning capabilities are known to interpret nonlinear functions. These are utilised to predict steady-state and dynamic processes. One iteration multi-layer perceptron (MLP) is famous for estimating hidden layers between output and input layers. Using this method, simulation of dissolve Oxygen (DO) parameters, Feeding (F), Biomass, Glucose, Acetate, and output production of γ-interferon is modelled. Several DoE iteration modelled using stepwise regression; these models are fitted with linear regression, six terms per model are allowed with Heredity Restriction. The goodness-of-fit of the resulting model is evaluated using Akaike Information Criterion (AICc). An artificial neural network (ANN) is used to create weighted ensemble of regression models. There are three nodes in single hidden layer of ANN. Sigmoid activation functions were used, cross-validation of 19 of the media formulations defined in the second DOE iteration were randomly selected and withheld from ANN training set to do validation studies [59].

Evaluation of production and process-response surface methodology (RSM)
RSM is simple, robust and efficient, in predicting processes of metabolite or product production. Also, this method helps in the determination of factors for specifications, changes in levels of the elements, response with specified levels, quantitative understanding of system behaviour, predict product properties, factor combinations not run and stability of the designed process.
RSM methodology consists of different phases. Typically, performed in three steps, First is the screening factors by steepest ascent/descent, secondly by quadratic regression model fitting, third optimization using canonical regression analysis.
For a cost-effective and robust process, improvising parameters related to medium, productivity, safety and usefulness are desired outputs. The interdependency of factors associated with productivity is difficult to understand, and this slows downs the enhancement process and yield evaluation.
Response surface methodology (RSM), is based on factorial designs to elevate the process and final product yield. RSM is considered a sturdy, robust, and efficient mathematical approach. It includes experimental statistical methods, multiple regression design, and analysis, resulting in developing the best strategies guided by constrained equations. RSM is typically applied to study the response of different media ingredients [21]. The production of Examples for E. coli fermentation. One such example is standardisation of production of human interferon-gamma (hIFN-γ).
Where β 0 is defined as the constant, B i the linear coefficient, B i i the quadratic coefficient and B i j the cross-product coefficient. X i and X j are levels of the independent variables, while ε is the residual error. This variable and RSM predicted 7.81 g L À1 glucose, 30°C for fermentation and induced at OD600 1.66, Combined with BBD to get the 95.50% acetate and 97.96% productivity of rhIFN-β [81].
Plackett Burman design and Response surface methodology are utilised together to increase the production of the desired product multiple-fold. Combining these techniques is usually employed to enhance the product outcome of several microbial processes, batch, fed-batch fermentation. RSM is widely used, with Plackett Burman, CCD, Box Benken. Even after much success, several limitations are associated with RSM, likewise predicting responses based on second-order polynomial equation [82], results in poor estimation of optimal designs, leading to low levels of yield or outcome. One limitation is developing a model for many variables on physical and chemical inputs due to nonlinear biochemical network interactions, with partial knowledge of these systems [83]. Another limitation is the study of multiple interactions and significant variations, resulting in error, bias and or no reproducibility. These challenges are dealt with better using Artificial neural networks (ANN) [84].

Study of interaction of pathways and multiple parameters-artificial neural network
An artificial neural network (ANN) is designed for a computing system to simulate the information and process the data similar to the human brain, guided by artificial intelligence (AI). It solves problems impossible or difficult by human or statistical standards. Handling units consist of inputs and outputs; using these inputs, ANNs produce desired or defined work. ANNs are built as neurons, are interconnected like a web in the human brain. There are hundreds or thousands of artificial neurons or processing units interconnected by nodes [85].
Similarly, as human functions, ANNs have a set of rules for learning backpropagation, an abbreviation for backward propagation of error, to perfect their output results. Typical processes in ANN are the training phase to recognise patterns in data, whether visually, aurally, or textually; in this supervised learning, the actual output is compared with the desired outcome. The differences are adjusted using backpropagation. The program runs backwards as we advance, and adjustments are made until the actual and expected output difference results in a minimum possible error. Designing of medium or metabolic process ANN is highly suitable, and it generates large amounts of data. The architecture of ANN consists of three layers: a layer of "input" connected to "hidden" units, ultimately connected with "output". The conditions for ANNs three types are Supervised, Unsupervised and Reinforced learning. The objective of supervised learning for the neural network is to provide input training data and possible experimental output. An unsupervised output unit is trained to respond to clusters or patterns present in the input data. Reinforced learning is an intermediate system; learning systems' actions are considered good or bad based on environmental responsibility. These parameters are adjusted till the time equilibrium state is attained. These systems are applied for system designing, modelling, optimisation. It leads to control the noisy signals and generalise through system training procedure. The ANNs are employed in various fermentation processes to optimise nutrient and prediction biomass outcome in different culture conditions. ANN has several limitations; likewise, it needs proper training, also based on input data to get the quality output [86]. To overcome some of these challenges, ANN combined with a genetic algorithm (GA) is applied to improve the concentration and shelf life of aspartate-β-semialdehyde dehydrogenase protein [87].

Study of biological process of evolution-genetic algorithm (GA)
Genetic algorithm (GA), developed in 1975 by Holland and Long, is based on Charles Darwin's theory of natural selection. GA is a model for the study of biological evolution by testing crossover and recombination mutation in adaptive and artificial systems. A genetic algorithm works as a problem-solving strategy using essential genetic operators. There are several GA designed to deal with complex problems and parallelism for stationary or non-stationary functions, linear or nonlinear, continuous or discontinuous, random or noise. Improvement in yield of recombinant G-CSF was obtained in auto-induction medium. The backpropagation (BP) algorithm and radial basis function (RBF) algorithm combined with the Genetic Algorithm improve G-CSF yield. The yield of models was 72.24 and 76.09%, respectively, and are higher than those obtained using non-optimised autoinduction mediums.
There are some disadvantages or genetic algorithms as well. The formulation of the fitness function, population size, choice of factors for mutation and cross over, selection of criteria for these factors needed to be carried out carefully. Despite drawbacks, GA is one of the widely used algorithms in modern nonlinear optimisation [88].

Geometric function evaluation-Nelder-Mead simplex algorithm
Nelder-Mead published this algorithm in 1965. The objective is to solve the classical unconstrained optimisation problem of minimising a given nonlinear function; without derivative only numerical evaluation of the objective function is needed. This algorithm is based on geometry, and in three-dimensional space, simplex is a tetrahedron determined by four points (vertices) and their interconnecting line segments. For two dimensions, simplex is an equilateral triangle, and three dimensions should be tetrahedron. The objective function is evaluated every point with the highest numerical value of all four points is perpendicularly mirrored against the opposite plain segment, generating reflection [89]. An expansion can accompany the reflection to take more significant steps or a contraction to shrink the simplex where an optimisation valley floor is reached. The optimisation procedure continues until the termination criteria are met. The termination criterion is usually the maximum number of reflections with contractions or tolerance for optimisation variables. The algorithm can be implemented in N dimensions, where simplex is a hypercube with N + 1 vertex points. The NM method provides significant improvements in primary iteration and improves outcome. NM is combined with ANN to optimise the production of metabolites [90].

Problems and bottlenecks in E. coli media optimisation
Medium optimisation involves many experiments irrespective of the media chosen, which accounts for labour cost and is an open-ended experiment. Many experiments are carried out at shake flask, even after generation of large amount of data using single experiments. The results obtained at piolet scale batch fermentation are not reproducible. During shake flask experiments, the precise control over pH, oxygen transfer and evaporation is not controlled. The experiments carried out at shake flask may or may replicate during fermentation. Also, soluble proteins expression may lead to inclusion bodies formation. Optimisation of media is time consuming due to the requirement of rigorous experimental planning. Moreover, the media utilised in the production of recombinant products faces challenges due to variability in different batches, media availability, cost of media, bulk storage, transport time. For Biotherapeutics, Enzymes and Probiotics, the cost of media needs to lower in Probiotics compared to Enzymes and Biotherapeutics, respectively. The choice of fermentation depends on the solubility of protein from batch to fed-batch. E. coli cells are dynamic, and every product requires different media compared to the earlier optimised process. Optimisation of media depends on considering dynamic internal control mechanisms. For the production of metabolites after engineering of bacterial strains, metabolic pathways needed to be optimised to regulate the desired product by choice of media. The influence of using different strains for the production is dependent on toxicity, complexity (Disulphide bonds in the sequence), AT-rich sequences. In our previous study, E. coli cell Shuffle 3030H for production of Plasmodium falciparum MSP-3 and the MSP-1 19 fusion protein was successfully optimised to generate protective antibodies [91]. Improvement of production of recombinant products is also guided by downstream processing of protein. Therefore, series of experiments designed for correct folding and confirmation are most important. Significant protein amounts can be achieved using pH, time for fermentation, oxygen transfer and temperature for fermentation. Also, inducer and harvest time are critical for increasing output. The critical factors for fermentation in batch and fed-batch are different. Therefore, the choice of media defined, semi-defined or complex media with vitamins, minerals and trace elements needed to be considered for evaluation in DOE experiments. To evaluate the considerable amount of output and variables combination and application of various algorithms is done to achieve desired output. In all the optimisation process and advanced algorithms such as Artificial neural networks and Genetic Algorithm are applied to achieve the desired output efficiently. The need for innovation as per sustainable development goals (SDGs) for United Nations 2030 plan is needed to increase the reach of technologies to low income countries. The application of DOE can improve the yield and cost leading to improved access to Biomolecules, Biopharmaceuticals, enzymes and metabolites.

Future strategies
The selection of host cells for industrial application has some technical difficulties despite the availability of many gene manipulations theoretically in various organisms. The availability of a genetic map, gene exchange system, useful vector and transformation procedures, and metabolic pathways leading from raw material to the desired product are essential criteria for selecting a suitable host strain. The most popular organisms used to date for the expression of the recombinant proteins are E. coli, Bacillus subtilis, Bacillus stearothermophillus, Streptomyces spp, Corynebacterium, Saccharomyces cerevisiae, Pichia pastoris, Hansenula polymorpha and various animal/plant cells. E. coli remains an important host system for the industrial protein production from cloned genes as one of the main applications of genetic engineering in biotechnology. Various efficient expression vector systems have been developed, and a variety of mutants are available as host strains for different purposes [92,93]. Overexpression of a heterologous protein is possible in E. coli, making it suitable for industrial production. Fermentation DOE is an essential tool for basic research that greatly facilitates efficient purification and analysis of such proteins [94].
For the successful production of the recombinant protein-based vaccine, producing biologically active protein is an essential requirement that can be further scaled up. Production of a biologically active recombinant protein depends on the host cell's microenvironment for expression and compatibility of codon usage. E. coli has been a widely used expression host for the high-level production of heterologous protein. Differences in usage of codons in prokaryotes (E. coli) and eukaryotes Chinese hamster ovary cells (CHO) can substantially impact heterologous protein production. The compatibility of codon usage can significantly increase protein expression [95,96]. Moreover, the presence of rare codons in cloned genes affects protein expression level and mRNA & plasmid stability. The excessive presence of rare codons may result in ribosome stalling, slow translation errors [96,97]. In some cases, rare codons inhibit protein synthesis and cell growth [98]. Earlier studies of codon usage patterns in E. coli have established that a clear codon bias exists in the mRNA. The level of each cognate tRNA seems to be directly proportional to the codon frequency [99,100]. The strategy widely used is to change rare codons in the target gene to the favoured codons of E. coli without affecting the encoded amino acid sequence [101,102]. The second approach is to expand the intracellular tRNA pool by introducing a plasmid encoding additional copies of tRNAs for codons rarely used in E. coli [103]. The co-presence of the RIG plasmid encoding three tRNAsAG(A/G), ATA, GGA in the host cells significantly increases the expression level of Dihydropteroate synthase, Aldolase, Phosphatase, and Orotidine-5 0 -monophosphate decarboxylase of P. falciparum [104][105][106]. Codon optimisation for maximum expression of foreign proteins by changing host cell favourable codons is beneficial and crucial for large-scale proteins [107].
The recombinant plasmid carrying cloned gene would behave differently compared to the original vector plasmid. It can be easily understood, as it is preserved under a delicate quasi-equilibrium state in the host cell. There are several reasons for the instability of recombinants. The higher the plasmid gene expression, the more segregants (plasmid free cells) tend to appear. The recombinant plasmid is relatively unstable when the cloned gene products are inhibitory to the host cells. Phenotypic instability of plasmid is due to the disappearance of the entire plasmid or the deletion of a specific region [108]. Both plasmid copy number and plasmid loss rate are features affected by factors such as media composition growth rate and culture strategy [109] and other factors such as temperature, agitation rate, and pH [110].
Therefore, future strategies for optimisation of cultivations needed to be shift to conclusions evaluated during experimental phases before actual fermentation to identify role of batch, fed-batch or different media components. The utilisation of carbon, nitrogen and other minimally required nutrients during batch and fedbatch is critical for delivering output and achieve sustainable development goals (SDGs) for technological innovation. The method design and modelling approaches are future strategies for increasing output during a process development. Utilisation of one factor and carrying out experiments by statistical media optimisation can be improved by combining several algorithms such as Plakett Burman, Box-Behnken, Taguchi design, Central composite design, partial least squares modelling in determining optimal factors. Response surface methodology with Artificial neural network (ANNs) can be applied to difficult model kind of fermentation processes. A free artificial neural network is applicable for carrying out nonlinear regression models to optimise metabolic processes. These algorithms are combined and applied to increase productivity and optimise the product output by reducing the cost of fermentation and product development.

Conclusions
Optimisation of critical factors and nutrient sources is an essential step for metabolite, recombinant proteins before pilot fermentation. In this chapter, strategies, conventional, advanced process design are reviewed and detailed. DOE approaches with statistical evaluation are critical for process development are essential for saving experimentation time. The strategies and examples shared in this review have been analysed for ease of implementation, time consumption. The conditions and media designed needs to be further tested under realistic conditions, full scale process with replication to production setup.
Overall, this chapter detailed need of critical factors identification, their significant contribution in enhancing process of metabolite production. Also, recently, cofermentation of glycerol and glucose in engineered E. coli increased production of 1, 3 propanediol [1]. Similarly, O-acetylhomoserine production is increased by suitable designs for fermentation and modification of glycerol-Oxidative pathway [3]. Production of recombinant protein in one study response surface methodology was utilised for production of repletase and improved yield to 188 mg L À1o f fermentation [15]. The approaches discussed in this chapter have several advantages for improving the yield and reduction of resource utilisation. These approaches are efficient for achieving the access for biotechnologically produced products to reach the larger population across the globe.