The *M1* matrix of GSA.

## 1. Introduction

The field of computational biologyhas grown significantly in recent years, allowing researchers to investigatecomplex biological systems *in silico.* In this chapter, a new method of combining cellular automata (CA) with global sensitivity analysis (GSA) is introduced. This method has the potential to determine which mechanisms of a regulated biological network contribute to characteristics such as stability and responsiveness. For dynamic models of biochemical reactions and networks, determining the correct values ofthe kineticparameters that govern the system is often problematic[12, 13, 19, 29, 30, 33, 36, 46, 47, 49, 50, 55, 57]. This isoften due to the difficulty ofobtaining accurate experimental measurements ofvital kineticconstants[6, 10, 17, 21, 32, 37]. This is currently the case for biochemical reactions occurring in complex environments *in vivo* that cannot be approximated through more simple experiments*in vitro*.Complex gene regulatory networks are a prime example.Modelling these systems using aCA approach allows researchers to easily change kineticparameter values (individuallyand in combination) to study their effects on system function[4, 7, 8, 25, 38, 51]. Since the overall system function (i.e., the output molecule or action) is something that can be measured easily, CA modelling provides a method for determining “difficult-to-measure” parameters using “easy-to-measure” observations of the system being studied.However, the real challenge after conducting a CA study with combinatorial parameter variation is interpreting the results. It has been found that GSAis an extremely useful tool for identifying the parameters that most significantly affect overall model performance[9, 11, 16, 24, 28, 31, 44, 45, 56]. In this chapter,a new approach that combinesCA and GSA is applied for analysing regulatory mechanisms of the L-arabinose (*ara*) operon. In particular, the influence of the negative autoregulatory (NAR) action of the transcription factor AraC on system dynamics and stability is calculated. The purpose of this chapter is to provide instruction through a detailed example of how to apply CA and GSA simultaneously to analysea biological network that must be studied *in vivo*. An in-depth explanation of GSA and the regulatory elements of the *ara* operon are provided in the Introduction. Detailed descriptions of CA model building and system parameters as well as a comprehensive GSA tutorial are presented in the Materials and Methods section.Computational experiments illuminating the influence of the NAR mechanism on the studied regulatory network are presented and discussed throughout the rest of thechapter.

### 1.1. Global sensitivity analysis

GSA uses Monte Carlo simulations to calculate the outputs of a model over the entire range of all input parameters [39, 40]. This variance-based method calculates the contribution of each input parameter to the total variability of the model output. In other words, GSA is used to determine which inputs most significantly influence the output. This provides the investigator one or multiple targets that can be manipulated to effectively engineer the system. GSA differs significantly from the traditionally used method of partial gradient-based sensitivity analysis (SA). With traditional SA, the change in a model output is calculated by allowing only one parameter to vary, while keeping others constant. This variance in the model output is likely to change if all other parameters are held constant at different values. The GSA approach takes this into account and enables the consideration of multiple parameters simultaneously over the entire range of each parameter. To consider the model output variance caused by only a single parameter is a “first-order” analysis. Two parameters may be considered simultaneously to develop a measure of their interactions in a “second-order” analysis. Or, a single parameter can be considered with is interactions with all other parameters in the model. This is called the “total effect index” [39, 40]. Thus, another significant advantage of GSA over SA is that GSA accounts for the influence and interactions between input parameters over the entire input space. A simplified tutorial of GSA has been published for a deterministic ammonia emissions model [35]. The GSA methods are also presented in detail in this chapter for the *ara* operon model system.

### 1.2. The L-arabinose operon

Transcriptional regulation networks are largely made up of recurring regulatory patterns called network motifs. These network motifs have been shown to carry out many signal transduction functions[2, 3, 27, 48]. One of the most abundantly found network motifs is thenegative autoregulation (NAR) motif. In an NAR motif, a transcription factor (TF) negatively regulates the promoter of its own gene or operon. This has been found to

dramatically increase response acceleration and

increase the stability of the gene product concentration response to noise [26, 41].

The L-arabinose system is an example of an NAR network.L-arabinose,which is a five-carbon sugar foundin plant cell walls, is used as a carbon source by many organisms. The *ara* operon contains genes encoding enzymes leading to L-arabinose catabolism. The selective usage of the *ara* operon isone of the best-studied gene regulation systems and is well-characterized[5, 15, 22, 23, 34, 42, 52, 59]. The entire arabinose system consists of the following:

the system specific transcription factor (TF)

*araC*,the arabinose transporters (

*araE, araFGH,*and*araJ*), andthe

*ara*operon containing the arabinose catalytic enzymes,*araBAD*.

Ultimately, the *ara* operon is responsible for the conversion of arabinose to D-xylulose-5-phosphate, which then enters the pentose phosphate pathway. The AraC TF regulates the *ara* operon [1, 14, 18, 20, 26, 52, 54, 59].Recent studies have shown that *araC*and the *ara* operon share a common regulatory protein,cAMP Receptor Protein (CRP),which is activated by cAMP. AraC both activates and represses the *ara* operonusing a DNA looping mechanism. As a negative regulator, AraC isde-activated by L-arabinose, allowing transcription of the *ara* operon. AraC represses its own promoter through a NAR motif[1, 14, 26, 43, 58].

### 1.3. Goals of the Modelling Effort

In this research, the overall influence of NAR on the dynamics of a regulated biological network was studied by applying a unique combination of CA and GSA. Here, the expression of *araBAD* was calculated in the

presence and

absence of NAR by AraC.

The CA approach was used to simulate this network given altered kinetic rate constants and initial concentrations. Then, the GSA approach was applied to determine which of these parameters most directly influence *araBAD* expression. When applied to the

presence and

absence of NAR, the difference in GSA results give clues to the influence of the NAR mechanism in regulating the system dynamics.

In the case studied in this research, NAR was found to equally distribute model sensitivity across all input parameters. This dramatically increases stability and responsivenessof the regulatory network. The approach presented in this chapter of combining CA and GSA can be applied to virtually any biologicalnetwork using the methods presented in this chapter.

## 2. Methods

### 2.1. Model Construction

The *ara* operon model was constructed using NetLogo simulation environment [53]. To perform a CA simulation, individual (agents) (i.e., interacting molecules) were allowed to move among (cells) (i.e., spatial locations) inside the simulation environment and undergobiochemical reactions with other agents in their Von Neumann neighbourhood. In all simulations, a two-dimensional16 x 16 matrix of cells was used as the simulation environment, and 10 time steps were executed. Whether a reaction occurs between interacting agents is governed by probability. The agents of the *ara* operon model are:

L-arabinose,

cAMP,

AraC (the TF regulator),

CRP, and

AraBAD (representing gene products of the

*ara*operon).

Characteristics and reaction rules for individual agents were predefined at model initialization in NetLogo. Basal expression levels for CRP, AraC, and AraBAD were set at 10, 0, and 0 cells respectively.The number of agents occupying cells represents concentration in agent based modelling. For example, CRP was present in 10 cells of the simulation environment upon initializing the simulation. Specifics of the varied model parameters are discussed in detail in the next section. These included

the concentrations of L-arabinose and cAMP,

the probabilities of biochemical reactions, and

NAR by ArgC.

Monte Carlo methods were used to select 2000 values of each parameter to perform CA simulations. This was followed by 1000 independent iterations of the model to perform GSA. The CA simulation records activation of the *ara* operon measured as number of AraBAD agents present in cells at the end of the simulation. Simulations are also often run to record the number of model “events” required to reach a specified concentration of an agent of interest (e.g., AraBAD). Independent simulations use different values of the varied parameters, resulting in different values of the targeted agent. Two common approaches use

a set number of model events to derive a target agent or

a different number of model events required to reach a specified concentration of the target agent.

The first approach was used in this study. Two different scenarios for NAR byAraC regulation were simulated in this research:

AraC is not allowed to negatively autoregulate its own promoter and

NAR by AraC is allowed.

These simulations seek to understand the influence of the NAR mechanism on overall rigidity and robustness of the *ara* operon regulatory network.

### 2.2. System parameters

A mathematical model was created to simulate the dynamics of *ara* operon activation in the presence and absence of NAR by AraC. CA was applied by allowing critical parameters to vary over thousands of simulations of the system. GSA was then applied to the results to determine which system parameters most influence *ara* operon activation.The following parameters were taken into consideration while building and simulating the model. The upper and lower bounds of the parameters and a description of their functions are described in detail below.

Simulation | 1 | 2 | … | 1000 |

Parameter | ||||

conc_cAMP | 105 | 188 | … | 32 |

conc_arabinose | 159 | 179 | … | 244 |

rate_CRP_activate | 0.019 | 0.103 | … | 0.500 |

rate_araC_activate | 0.418 | 0.391 | … | 0.406 |

rate_araBAD_activate | 0.577 | 0.326 | … | 0.445 |

rate_araC_autoreg | 0 | 0 | … | 0 |

Output ( Y 1 ) | 9 | 3 | ... | 296 |

*conc_arabinose*: The initial concentration of L-arabinose was allowed to vary between 1 and 250 cells. Upon binding,L-arabinoseactivates the TF and autoregulatorAraC.Unbound AraC inhibits transcription of the*ara*operon.*conc_cAMP*: Initial concentration of the second messenger cAMP was allowed to vary between 1 to 250cells. cAMP binds to CRP causing its activation. The cAMP-CRP positively regulates transcription of the*ara*operon.*rate_CRP_activate*: This parameter controls the probability of CRP activation by cAMP. This parameter was varied between 0 and 1 for the simulations described in this chapter. This probability parameter represents the rate of activation of CRP.*rate_araC_activate*: This parameter controls the probability of AraC activation by L-arabinose and was allowed to vary between 0 and 1.This probabilityultimately controlsthe activity of the*araC*gene and transcription of the*ara*operon.*rate_araBAD_activate*: This parameter controls the probability of*araBAD*activation by CRP and was allowed tovary between 0 and 1.*rate_araC_autoreg*: This parameter controls the NARbyAraC protein and was allowed tovary between 0 and 1.Thisrepresents the rate at which AraC supresses its promoter.

### 2.3. Global sensitivity analysis

GSA was performed on the *ara* operon activation model described in this chapter. The following step-by-step tutorial is given to combine CA with GSA.

The procedure starts with the derivation of the *M1* and *M2* matrices shown in Tables 1 and 2, respectively. To build each table, 1000 CA simulations were run given random values of the system parameters. For each simulation, the model output (activated *araBAD*or expressed AraBAD) was calculated and recorded. The estimated unconditional means (*N* is the number of simulations (i.e., 1000 for this study).

Next, the *P* matrix was created for the calculation of the first-order sensitivity index for each model parameter. To illustrate this example, the model parameter *conc_cAMP*was used. The *P* matrix for this case is shown in Table 3.

The *P* matrix consists of the *conc_cAMP*parameter values from matrix *M2*, and all other parameters are assigned their values from *M1*. Then model outputs were calculated for the *P* matrix using these new inputs. Thus, 1000 more simulations are required for each parameter a first-order sensitivity index is desired. The first-order sensitivity index (

Next, the total effect index was calculated by creation of the *R* matrix for each parameter. The first-order index describes the influence of a single parameter on the model output directly. The total effect index takes into account all interactions of a parameter with all other parameters when determining the effect on model output.

Simulation | 1 | 2 | … | 1000 |

Parameter | ||||

conc_cAMP | 190 | 189 | … | 227 |

conc_arabinose | 67 | 212 | … | 208 |

rate_CRP_activate | 0.887 | 0.281 | … | 0.671 |

rate_araC_activate | 0.638 | 0.413 | … | 0.693 |

rate_araBAD_activate | 0.530 | 0.620 | … | 0.891 |

rate_araC_autoreg | 0 | 0 | … | 0 |

Output ( Y 2 ) | 608 | 93 | ... | 3043 |

Simulation | 1 | 2 | … | 1000 |

Parameter | ||||

conc_cAMP (M2) | 190 | 189 | … | 227 |

conc_arabinose (M1) | 159 | 179 | … | 244 |

rate_CRP_activate (M1) | 0.019 | 0.103 | … | 0.500 |

rate_araC_activate (M1) | 0.418 | 0.391 | … | 0.406 |

rate_araBAD_activate (M1) | 0.577 | 0.326 | … | 0.445 |

rate_araC_autoreg (M1) | 0 | 0 | … | 0 |

Output ( Y P ) | 12 | 196 | ... | 6 |

The *R* matrix is shown in Table 4 for the *conc_cAMP*parameter example. To build the *R* matrix, the parameter values from *M1* for *conc_cAMP*were used along with parameter values from *M2* for all other parameters. An additional 1000 CA simulations are required to calculate the model outputs for the *R* matrix. The calculation of the total effect index for rcAMP (

## 3. Results

The CA modeling of the *ara* operon was performed for two cases

without NAR (i.e., negative autoregulation) by AraC and

with NAR by AraC (as is observed experimentally).

GSA was applied to both cases in order to determine how the NAR mechanism impacts overall system dynamics. To simulate the model without NAR, the parameter *rate_araC_autoreg* was held constant at 0. The results of the GSA calculations derived from Eqs. 1-6 and the values in Tables 1-4 are given in Table 5. This case was simulated without NAR by AraC.

Simulation | 1 | 2 | … | 1000 |

Parameter | ||||

conc_cAMP (M1) | 105 | 188 | … | 32 |

conc_arabinose (M2) | 67 | 212 | … | 208 |

rate_CRP_activate (M2) | 0.887 | 0.281 | … | 0.671 |

rate_araC_activate (M2) | 0.638 | 0.413 | … | 0.693 |

rate_araBAD_activate (M2) | 0.530 | 0.620 | … | 0.891 |

rate_araC_autoreg (M2) | 0 | 0 | … | 0 |

Output ( Y R ) | 288 | 519 | ... | 324 |

The first-order sensitivity indices for each system parameter for the *ara* operonmodel without NAR by AraCare reported in Fig. 1. These values are reported as a percentage of the summation of all first-order index values. When interactions between single parameters are not taken into consideration, the probability of CRP activation was found to be the single most important parameter significantly influencing the *araBAD*activation(29.58%).All other parameters show similar influence (~17%). The total effect indices for the *ara* operon model without NAR by AraC is shown in Fig. 2. When all the interactions between all parameters were considered, probability of CRP activation (*rate_CRP_activate* parameter) was shown to have most influence (29.47%) over *araBAD*activation. The more noticeable result is the small contribution from the initial concentration of L-arabinose (*conc_arabinose* parameter). This is significant because the *ara* operon is known to require the presence of L-arabinose to be active in the cell.

Calculation | Value |

M1) | 256.78 |

M2) | 228.49 |

M1) | 309411.56 |

M2) | N/A |

4210.45 | |

0.18 | |

83451.88 | |

0.94 |

The first-order indices for each parameter for the *ara* operonmodel with NAR by AraC are reported in Fig. 3. By activating the NAR role of AraC, the first-order indices show very close index values for all parameters (~16.5%). In other words, the NAR reduced the excessive influence of CRP activation over *araBAD*activation. The total effect indices are shown in Fig. 4. A pattern similar to that revealed by first-order indices was obtained. All total effect indices were also similar for all parameters (~16.5%).Adding the NAR by AraCto the regulation network dramatically increased the influence of L-arabinose concentration on *araBAD*activation.

## 4. Discussion

In this study, a unique combination of CA and GSA were used to study the parameters that influence the dynamics of the *ara* operon regulatory network. The results of the GSA study revealed the degree to which individual parameters affect the output of a biological model.GSA was used to explorethe influence of NAR on the regulatory network by calculating the impact of parameter variance on model output.Comparing first-order and total effect sensitivity indices with and without NAR by AraC elucidates the roles NAR plays in the signaling network. These include

increasing network stability and

increasing the response of the network to L-arabinose concentrations.

Equal distribution of variation among all parameters suggests that the NAR mechanism increases network robustness,providing protection against random perturbations (both biological and environmental) of the system.

GSA has shown that parameter sensitivity indices can provide useful insight in interpreting the results of CA simulations.Thus, the combination of CA and GSA provides a valuable tool for the identification of source of output variability. In the case of the *ara*operon, all model parameters showed to contribute equally to the variance of *araBAD*activationlevel. While all the systemparameters are important and can significantly influence *araBAD*activation, those parameters with higher first-order sensitivities can have profound effects on regulation of the *ara* operon if NAR function by AraC is lost. This scenario demonstrates the potential of CA and GSA for identifying targets for manipulating highly interconnected gene regulatory networks.