Using Cellular Automata and Global Sensitivity Analysis to Study the Regulatory Network of the L-Arabinose Operon

The field of computational biologyhas grown significantly in recent years, allowing re‐ searchers to investigatecomplex biological systems in silico. In this chapter, a new method of combining cellular automata (CA) with global sensitivity analysis (GSA) is introduced. This method has the potential to determine which mechanisms of a regulated biological network contribute to characteristics such as stability and responsiveness. For dynamic models of bi‐ ochemical reactions and networks, determining the correct values ofthe kineticparameters that govern the system is often problematic[12, 13, 19, 29, 30, 33, 36, 46, 47, 49, 50, 55, 57]. This isoften due to the difficulty ofobtaining accurate experimental measurements ofvital ki‐ neticconstants[6, 10, 17, 21, 32, 37]. This is currently the case for biochemical reactions occur‐ ring in complex environments in vivo that cannot be approximated through more simple experimentsin vitro.Complex gene regulatory networks are a prime example.Modelling these systems using aCA approach allows researchers to easily change kineticparameter val‐ ues (individuallyand in combination) to study their effects on system function[4, 7, 8, 25, 38, 51]. Since the overall system function (i.e., the output molecule or action) is something that can be measured easily, CA modelling provides a method for determining “difficult-tomeasure” parameters using “easy-to-measure” observations of the system being stud‐ ied.However, the real challenge after conducting a CA study with combinatorial parameter variation is interpreting the results. It has been found that GSAis an extremely useful tool for identifying the parameters that most significantly affect overall model performance[9, 11, 16, 24, 28, 31, 44, 45, 56]. In this chapter,a new approach that combinesCA and GSA is ap‐ plied for analysing regulatory mechanisms of the L-arabinose (ara) operon. In particular, the influence of the negative autoregulatory (NAR) action of the transcription factor AraC on


Introduction
The field of computational biologyhas grown significantly in recent years, allowing researchers to investigatecomplex biological systems in silico.In this chapter, a new method of combining cellular automata (CA) with global sensitivity analysis (GSA) is introduced.This method has the potential to determine which mechanisms of a regulated biological network contribute to characteristics such as stability and responsiveness.For dynamic models of biochemical reactions and networks, determining the correct values ofthe kineticparameters that govern the system is often problematic [12,13,19,29,30,33,36,46,47,49,50,55,57].This isoften due to the difficulty ofobtaining accurate experimental measurements ofvital kineticconstants [6,10,17,21,32,37].This is currently the case for biochemical reactions occurring in complex environments in vivo that cannot be approximated through more simple experimentsin vitro.Complex gene regulatory networks are a prime example.Modelling these systems using aCA approach allows researchers to easily change kineticparameter values (individuallyand in combination) to study their effects on system function [4,7,8,25,38,51].Since the overall system function (i.e., the output molecule or action) is something that can be measured easily, CA modelling provides a method for determining "difficult-tomeasure" parameters using "easy-to-measure" observations of the system being studied.However, the real challenge after conducting a CA study with combinatorial parameter variation is interpreting the results.It has been found that GSAis an extremely useful tool for identifying the parameters that most significantly affect overall model performance [9,11,16,24,28,31,44,45,56].In this chapter,a new approach that combinesCA and GSA is applied for analysing regulatory mechanisms of the L-arabinose (ara) operon.In particular, the influence of the negative autoregulatory (NAR) action of the transcription factor AraC on system dynamics and stability is calculated.The purpose of this chapter is to provide instruction through a detailed example of how to apply CA and GSA simultaneously to analysea biological network that must be studied in vivo.An in-depth explanation of GSA and the regulatory elements of the ara operon are provided in the Introduction.Detailed descriptions of CA model building and system parameters as well as a comprehensive GSA tutorial are presented in the Materials and Methods section.Computational experiments illuminating the influence of the NAR mechanism on the studied regulatory network are presented and discussed throughout the rest of thechapter.

Global sensitivity analysis
GSA uses Monte Carlo simulations to calculate the outputs of a model over the entire range of all input parameters [39,40].This variance-based method calculates the contribution of each input parameter to the total variability of the model output.In other words, GSA is used to determine which inputs most significantly influence the output.This provides the investigator one or multiple targets that can be manipulated to effectively engineer the system.GSA differs significantly from the traditionally used method of partial gradient-based sensitivity analysis (SA).With traditional SA, the change in a model output is calculated by allowing only one parameter to vary, while keeping others constant.This variance in the model output is likely to change if all other parameters are held constant at different values.The GSA approach takes this into account and enables the consideration of multiple parameters simultaneously over the entire range of each parameter.To consider the model output variance caused by only a single parameter is a "first-order" analysis.Two parameters may be considered simultaneously to develop a measure of their interactions in a "second-order" analysis.Or, a single parameter can be considered with is interactions with all other parameters in the model.This is called the "total effect index" [39,40].Thus, another significant advantage of GSA over SA is that GSA accounts for the influence and interactions between input parameters over the entire input space.A simplified tutorial of GSA has been published for a deterministic ammonia emissions model [35].The GSA methods are also presented in detail in this chapter for the ara operon model system.

The L-arabinose operon
Transcriptional regulation networks are largely made up of recurring regulatory patterns called network motifs.These network motifs have been shown to carry out many signal transduction functions [2,3,27,48].One of the most abundantly found network motifs is thenegative autoregulation (NAR) motif.In an NAR motif, a transcription factor (TF) negatively regulates the promoter of its own gene or operon.This has been found to 1. dramatically increase response acceleration and 2. increase the stability of the gene product concentration response to noise [26,41].
The L-arabinose system is an example of an NAR network.L-arabinose,which is a five-carbon sugar foundin plant cell walls, is used as a carbon source by many organisms.The ara operon contains genes encoding enzymes leading to L-arabinose catabolism.The selective usage of the ara operon isone of the best-studied gene regulation systems and is well-characterized [5,15,22,23,34,42,52,59].The entire arabinose system consists of the following: 1. the system specific transcription factor (TF) araC, 2. the arabinose transporters (araE, araFGH, and araJ), and 3. the ara operon containing the arabinose catalytic enzymes, araBAD.
Ultimately, the ara operon is responsible for the conversion of arabinose to D-xylulose-5phosphate, which then enters the pentose phosphate pathway.The AraC TF regulates the ara operon [1,14,18,20,26,52,54,59].Recent studies have shown that araCand the ara operon share a common regulatory protein,cAMP Receptor Protein (CRP),which is activated by cAMP.AraC both activates and represses the ara operonusing a DNA looping mechanism.As a negative regulator, AraC isde-activated by L-arabinose, allowing transcription of the ara operon.AraC represses its own promoter through a NAR motif [1,14,26,43,58].

Goals of the Modelling Effort
In this research, the overall influence of NAR on the dynamics of a regulated biological network was studied by applying a unique combination of CA and GSA.Here, the expression of araBAD was calculated in the 1. presence and

absence of NAR by AraC.
The CA approach was used to simulate this network given altered kinetic rate constants and initial concentrations.Then, the GSA approach was applied to determine which of these parameters most directly influence araBAD expression.When applied to the 1. presence and 2. absence of NAR, the difference in GSA results give clues to the influence of the NAR mechanism in regulating the system dynamics.
In the case studied in this research, NAR was found to equally distribute model sensitivity across all input parameters.This dramatically increases stability and responsivenessof the regulatory network.The approach presented in this chapter of combining CA and GSA can be applied to virtually any biologicalnetwork using the methods presented in this chapter.

Model Construction
The ara operon model was constructed using NetLogo simulation environment [53].To perform a CA simulation, individual (agents) (i.e., interacting molecules) were allowed to move among (cells) (i.e., spatial locations) inside the simulation environment and undergobio-chemical reactions with other agents in their Von Neumann neighbourhood.In all simulations, a two-dimensional16 x 16 matrix of cells was used as the simulation environment, and 10 time steps were executed.Whether a reaction occurs between interacting agents is governed by probability.The agents of the ara operon model are: The first approach was used in this study.Two different scenarios for NAR byAraC regulation were simulated in this research: 1. AraC is not allowed to negatively autoregulate its own promoter and 2. NAR by AraC is allowed.
These simulations seek to understand the influence of the NAR mechanism on overall rigidity and robustness of the ara operon regulatory network.

System parameters
A mathematical model was created to simulate the dynamics of ara operon activation in the presence and absence of NAR by AraC.CA was applied by allowing critical parameters to vary over thousands of simulations of the system.GSA was then applied to the results to determine which system parameters most influence ara operon activation.The following parameters were taken into consideration while building and simulating the model.The upper and lower bounds of the parameters and a description of their functions are described in detail below.

conc_arabinose:
The initial concentration of L-arabinose was allowed to vary between 1 and 250 cells.Upon binding,L-arabinoseactivates the TF and autoregulatorAraC.Unbound AraC inhibits transcription of the ara operon.

conc_cAMP:
Initial concentration of the second messenger cAMP was allowed to vary between 1 to 250cells.cAMP binds to CRP causing its activation.The cAMP-CRP positively regulates transcription of the ara operon.

rate_CRP_activate:
This parameter controls the probability of CRP activation by cAMP.This parameter was varied between 0 and 1 for the simulations described in this chapter.This probability parameter represents the rate of activation of CRP.

rate_araC_activate:
This parameter controls the probability of AraC activation by L-arabinose and was allowed to vary between 0 and 1.This probabilityultimately controlsthe activity of the araCgene and transcription of the ara operon.

rate_araBAD_activate:
This parameter controls the probability of araBAD activation by CRP and was allowed tovary between 0 and 1.

rate_araC_autoreg:
This parameter controls the NARbyAraC protein and was allowed tovary between 0 and 1.Thisrepresents the rate at which AraC supresses its promoter.

Global sensitivity analysis
GSA was performed on the ara operon activation model described in this chapter.The following step-by-step tutorial is given to combine CA with GSA.
The procedure starts with the derivation of the M1 and M2 matrices shown in Tables 1 and  2, respectively.To build each table, 1000 CA simulations were run given random values of the system parameters.For each simulation, the model output (activated araBADor expressed AraBAD) was calculated and recorded.The estimated unconditional means ( E ^Y ) and estimated unconditional variances ( V ^Y ) of the model outputs were calculated for both matrices according to the following, where N is the number of simulations (i.e., 1000 for this study).
( ) ( ) Next, the P matrix was created for the calculation of the first-order sensitivity index for each model parameter.To illustrate this example, the model parameter conc_cAMPwas used.The P matrix for this case is shown in Table 3.
The P matrix consists of the conc_cAMPparameter values from matrix M2, and all other parameters are assigned their values from M1. Then model outputs were calculated for the P matrix using these new inputs.Thus, 1000 more simulations are required for each parameter a first-order sensitivity index is desired.The first-order sensitivity index ( S conc_cAMP ) was calculated by the following.
( ) ( ) Emerging Applications of Cellular Automata Next, the total effect index was calculated by creation of the R matrix for each parameter.The first-order index describes the influence of a single parameter on the model output directly.The total effect index takes into account all interactions of a parameter with all other parameters when determining the effect on model output.The R matrix is shown in Table 4 for the conc_cAMPparameter example.To build the R matrix, the parameter values from M1 for conc_cAMPwere used along with parameter values from M2 for all other parameters.An additional 1000 CA simulations are required to calculate the model outputs for the R matrix.The calculation of the total effect index for rcAMP ( S T (conc_cAMP ) ) was calculated as follows.

Results
The CA modeling of the ara operon was performed for two cases 1. without NAR (i.e., negative autoregulation) by AraC and 2. with NAR by AraC (as is observed experimentally).
GSA was applied to both cases in order to determine how the NAR mechanism impacts overall system dynamics.To simulate the model without NAR, the parameter rate_araC_autoreg was held constant at 0. The results of the GSA calculations derived from Eqs. 1-6 and the values in Tables 1-4 are given in Table 5.This case was simulated without NAR by AraC.The first-order sensitivity indices for each system parameter for the ara operonmodel without NAR by AraCare reported in Fig. 1.These values are reported as a percentage of the summation of all first-order index values.When interactions between single parameters are not taken into consideration, the probability of CRP activation was found to be the single most important parameter significantly influencing the araBADactivation(29.58%).All other parameters show similar influence (~17%).The total effect indices for the ara operon model without NAR by AraC is shown in Fig. 2. When all the interactions between all parameters were considered, probability of CRP activation (rate_CRP_activate parameter) was shown to have most influence (29.47%) over araBADactivation.The more noticeable result is the small contribution from the initial concentration of L-arabinose (conc_arabinose parameter).This is significant because the ara operon is known to require the presence of L-arabinose to be active in the cell.The first-order indices for each parameter for the ara operonmodel with NAR by AraC are reported in Fig. 3.By activating the NAR role of AraC, the first-order indices show very close index values for all parameters (~16.5%).In other words, the NAR reduced the excessive influence of CRP activation over araBADactivation.The total effect indices are shown in Fig. 4. A pattern similar to that revealed by first-order indices was obtained.All total effect indices were also similar for all parameters (~16.5%).Adding the NAR by AraCto the regulation network dramatically increased the influence of L-arabinose concentration on ara-BADactivation.

Discussion
In this study, a unique combination of CA and GSA were used to study the parameters that influence the dynamics of the ara operon regulatory network.The results of the GSA study revealed the degree to which individual parameters affect the output of a biological model.GSA was used to explorethe influence of NAR on the regulatory network by calculating the impact of parameter variance on model output.Comparing first-order and total effect sensitivity indices with and without NAR by AraC elucidates the roles NAR plays in the signaling network.These include 1. increasing network stability and 2. increasing the response of the network to L-arabinose concentrations.
Equal distribution of variation among all parameters suggests that the NAR mechanism increases network robustness,providing protection against random perturbations (both biological and environmental) of the system.GSA has shown that parameter sensitivity indices can provide useful insight in interpreting the results of CA simulations.Thus, the combination of CA and GSA provides a valuable tool for the identification of source of output variability.In the case of the araoperon, all model parameters showed to contribute equally to the variance of araBADactivationlevel.While all the systemparameters are important and can significantly influence araBADactivation, those parameters with higher first-order sensitivities can have profound effects on regulation of the ara operon if NAR function by AraC is lost.This scenario demonstrates the potential of CA and GSA for identifying targets for manipulating highly interconnected gene regulatory networks.

. cAMP, 3 . 2 . 3 .
AraC (the TF regulator),4.CRP, and5.AraBAD (representing gene products of the ara operon).Characteristics and reaction rules for individual agents were predefined at model initialization in NetLogo.Basal expression levels for CRP, AraC, and AraBAD were set at 10, 0, and 0 cells respectively.The number of agents occupying cells represents concentration in agent based modelling.For example, CRP was present in 10 cells of the simulation environment upon initializing the simulation.Specifics of the varied model parameters are discussed in detail in the next section.These included 1. the concentrations of L-arabinose and cAMP, the probabilities of biochemical reactions, and NAR by ArgC.Monte Carlo methods were used to select 2000 values of each parameter to perform CA simulations.This was followed by 1000 independent iterations of the model to perform GSA.The CA simulation records activation of the ara operon measured as number of AraBAD agents present in cells at the end of the simulation.Simulations are also often run to record the number of model "events" required to reach a specified concentration of an agent of interest (e.g., AraBAD).Independent simulations use different values of the varied parameters, resulting in different values of the targeted agent.Two common approaches use 1. a set number of model events to derive a target agent or 2. a different number of model events required to reach a specified concentration of the target agent.

Figure 1 .
Figure 1.First-order indices calculated by GSA for the case without NAR by AraC.

Figure 2 .
Figure 2. Total effect indices calculated by GSA for the case without NAR by AraC.

Figure 3 .
Figure 3. First-order indices calculated by GSA for the case with NAR by AraC.

Figure 4 .
Figure 4. Total effect indices calculated by GSA for the case with NAR by AraC.

Table 1 .
The M1 matrix of GSA.

Table 2 .
The M2 matrix of GSA.

Table 3 .
The P matrix of GSA.

Table 4 .
The R matrix of GSA.

Table 5 .
GSA calculations for the rcAMP parameter