Open access peer-reviewed chapter

Developing Network-Based Systems Toxicology by Combining Transcriptomics Data with Literature Mining and Multiscale Quantitative Modeling

By Alain Sewer, Marja Talikka, Florian Martin, Julia Hoeng and Manuel C Peitsch

Submitted: November 15th 2017Reviewed: February 26th 2018Published: June 20th 2018

DOI: 10.5772/intechopen.75970

Downloaded: 122

Abstract

We describe how the genome-wide transcriptional profiling can be used in network-based systems toxicology, an approach leveraging biological networks for assessing the health risks of exposure to chemical compounds. Driven by the technological advances changing the ways in which data are generated, systems toxicology has allowed traditional toxicity endpoints to be enhanced with far deeper levels of analysis. In combination, new experimental and computational methods have offered the potential for more effective, efficient, and reliable toxicological testing strategies. We illustrate these advances by the “network perturbation amplitude” methodology that quantifies the effects of exposure treatments on biological mechanisms represented by causal networks. We also describe recent developments in the assembly of high-quality causal biological networks using crowdsourcing and text-mining approaches. We further show how network-based approaches can be integrated into the multiscale modeling framework of response to toxicological exposure. Finally, we combine biological knowledge assembly and multiscale modeling to report on the promising developments of the “quantitative adverse outcome pathway” concept, which spans multiple levels of biological organization, from molecules to population, and has direct relevance in the context of the “Toxicity Testing in the 21st century” vision of the US National Research Council.

Keywords

  • omics data
  • systems toxicology
  • biological networks
  • backward reasoning
  • literature mining
  • multiscale modeling
  • adverse outcome pathways

1. Introduction to network-based systems toxicology

The ongoing public debates on the impact on human health of glyphosate, bisphenol A, or electronic cigarettes have underlined the importance of performing reliable toxicological assessments [1, 2, 3]. In this context, regulatory authorities need to require evidence packages to assess the health risks associated with chemical compounds of uncertain safety risk contained in consumer products or present in the environment. In order to make the authorities’ decisions persuasive to the public, it is critical to support them with objective evidence obtained using the latest scientific and technological advances. The US National Research Council’s (NSR) “Toxicity Testing in the 21st Century: A Vision and a Strategy” manifesto, issued in 2007, was a noteworthy response to this critical need [4]. It fostered innovative, interdisciplinary approaches (i) to scale up the experimenting capacities by favoring in vitro screening to whole-animal testing, (ii) to deepen the interpretation of the experiments in terms of biological mechanisms by integrating the pathway-based approaches used in biomedical research, and (iii) to process the extensive data generated using adequate statistical and modeling tools to provide quantitative answers and informative predictions.

Developments in systems toxicology during the last 10 years have been driven largely by the goal of concretizing the NSR’s vision. Simply stated, systems toxicology can be seen as the application of the systems biology mindset and approaches to toxicity testing. Thus, an essential feature of systems toxicology is the holistic perspective used in systems biology, in which a biological system is viewed as a complex assembly of interacting, often numerous parts rather than the simple union of individual elements, which corresponds to the reductionist standpoint [5, 6]. The first consequence of the holistic perspective is the fundamental role played by molecular omics profiling technologies, as they enable the simultaneous quantification of the abundances of all the (detectable) elements of a given class of biomolecules. The technology used most frequently is transcriptome profiling, which has become an almost routine operation thanks to its numerous advantages (technical, practical, and economical). In the current post-genomic era, its coverage exceeds 20,000 genes, and the resulting large data volume requires proper Bioinformatics processing to be exploited adequately. The second consequence of the holistic perspective is the introduction of a modeling approach for the interactions between the system parts to produce the system-level properties. In cases where transcriptome profiles are available, the modeling approach builds upon the relationships between genes to achieve a bottom-up description of the biological mechanisms taking place in cells, tissues, or organs. This inherent modeling aspect implies that systems toxicology positions itself at the final end of the “gene sets < pathways < networks” sequence, which results from the ordering of the transcriptomics interpretation approaches according to increasing structural complexity and informational richness [7]. In that sense, systems toxicology can be distinguished from toxicogenomics, for which the gene interaction modeling aspect is not an essential component. It is important to stress, however, that complete descriptions in terms of interacting genes are not (yet) available for all the system-level biological mechanisms. Inversely, not all genes measured by transcriptomics have been shown to be involved in system-level biological mechanisms.

In this chapter, we will focus on the developments of network-based systems toxicology [8], as networks have turned out to be the most suitable description framework for systems biology [5, 6]. In this case, the complex interaction map between the system parts accompanying the holistic view reduces to a (large) series of pairwise relationships encoded by edges of the networks, which connect two nodes representing the interacting system parts. Importantly, networks have been shown to constitute a suitable framework for not only representing but also understanding systems-level biological mechanisms [9]. Network-based approaches have been subsequently extended to achieve a novel understanding of disease effects in healthy systems [10, 11, 12] as well as to integratively examine the main and secondary effects of drugs (Figure 1a) [13, 14]. From the point of view of toxicological assessments, it is very reasonable to expect that network-based approaches would provide an appropriate framework for examining the system responses to test exposure in terms of perturbed biological mechanisms, in perfect alignment with the NSR’s vision. As system-level biological mechanisms result from the interactions of multiple nodes, the network-based modeling framework enables elegant collection of the distributed effects of a test exposure on individual nodes into the perturbation of a single entity (Figure 1a) [8, 15].

Figure 1.

Features of network-based systems toxicology. (a) Schematic representation of network-based view of disease, drug, and exposure effects. (b) the iterative discovery cycle in systems biology [5]. (c) the linear five-step assessment workflow underlying network-based systems toxicology [8]. (d) the tridimensional representation “biological systems-exposure treatments-biological networks” illustrating the mechanism-based comparative assessment of exposure effects (blue arrow) and in vitro-in vivo or interspecies translatability (red arrow).

In the remaining part of this section, several essential features of applying network-based systems toxicology are briefly explained. First, it is important to note the fundamental difference between systems biology and systems toxicology or, more broadly, between the investigation of novel biological mechanisms and the biological mechanism-based assessment of exposure effects in a toxicity testing context. When investigating a biological system to discover novel mechanisms, the goal of the experimental data analysis and interpretation is to identify the most promising candidate mechanisms compatible with the observations, which would eventually lead to a novel, refined hypothesis to be tested. The implementation of this iterative process has been facilitated by systems biology, as the rich system-wide omics data allow for both confirmatory and exploratory investigations (Figure 1b) [5]. On the other hand, in the toxicity testing context, the biological mechanisms and their models must be determined a priori and remain “locked” when evaluating the test exposure experimental data in a systematic and least subjective manner. The outcome of the experiment is therefore the comparative assessment of the effects of the test exposure with varied parameters, such as the tested compounds, their doses, and the exposure durations. This can be represented by a linear assessment workflow (Figure 1c) [8], in contrast to the circular systems biology discovery cycle mentioned before. Interestingly, this difference between discovery and assessment approaches possesses an analogy in the context of transcriptomics gene set analysis: the competitive “Q1” statistic enabling the identification of the best associated gene sets corresponds to the discovery mode, whereas the self-contained “Q2” statistic quantifying the relevance of a given gene set corresponds to the assessment mode [16, 17].

Another advantageous aspect of applying network-based systems toxicology is the fact that it offers an explicit framework for “mechanistic translatability” between test systems. As the resemblances between exposure responses in human subjects and test systems (in vivo animal or, more recently, in vitro human) are fundamental in toxicity testing, the network-based approach enables establishing the validity of intersystem associations using the mappings of the biological mechanism-specific networks (red arrow on Figure 1d). This intersystem mechanistic translatability supports the use of in vitro test systems, such as cellular cultures, organotypic tissues, or organ-on-a-chip models, to reduce animal testing (typically rodents), in agreement with the NSR vision and the “3Rs” principles (i.e., “reduce the number of animals,” “refine the experiments,” and “replace the animals with nonanimal systems”) [18, 19, 20, 21]. The tridimensional representation “system-exposure-network” also contains the setup for performing a comparative, mechanism-based assessment of exposure responses, which obviously remains the primary goal of network-based systems toxicology (blue arrow on Figure 1d, which results from the completion of the workflow on Figure 1c) [8, 15, 22]. A biologically sound impact assessment between two considered exposures therefore consists of multicriteria comparisons between the mechanism-specific responses or “network perturbations”, based on an appropriate selection of biological mechanisms.

The two concepts of network perturbation quantification and biological network selection are central to network-based systems toxicology and will be deepened further in this chapter. The methodology for calculating network perturbation amplitudes (NPAs) will be presented as a biologically driven complexity reduction scheme delivering valuable, structured information about the impact of toxicological exposure (Section 2). The related endeavor to ensure the quality of the biological networks will discussed afterward and illustrated by two innovative approaches based on crowdsourcing and literature mining (Section 3). The modeling perspective will be broadened beyond networks of interacting molecules to present other components of the multiscale modeling framework of an organism response to exposure (Section 4). Finally, emerging concepts from the quantification of adverse outcome pathways (qAOPs) will illustrate how extended multiscale modeling and biological knowledge assembly can combine to develop the predictive aspect of network-based systems toxicology. Throughout this chapter, our intention will not be to present a comprehensive review nor an abstract synthesis; rather we will coherently pick out concepts that are relevant for the past, current, and future developments of network-based systems toxicology as well as appealing in the context of “Bioinformatics in the Era of Post Genomics and Big Data.”

2. Quantification of network perturbation amplitudes

In this section, we describe in more detail a core element of network-based systems toxicology: the quantification of NPAs, which amounts to calculating the exposure-induced response of biological mechanisms modeled by a network using transcriptomics data. As shown in Figure 1c, it represents a key ingredient of the five-step workflow for toxicity assessment and constitutes a concrete application of network-based systems toxicology [8, 15, 22]. Here we focus on the particular type of “causal networks” for which a mathematically and statistically sound methodology has been recently developed [23, 24]. Given a suitably organized collection of causal networks selected for a priori relevant biological mechanisms, the structure of the associated NPA results can be seen as a complexity reduction scheme starting from large experimental transcriptomics data. It provides a quantification of the exposure-induced impact on the considered biological mechanisms, which is used to comparatively assess toxicity in concrete applications. Additionally, it constitutes the starting point for the network-based systems toxicology developments that will be discussed later in this chapter.

Concretely, the implementation of the NPA methodology applicable to causal networks requires three distinct inputs in terms of experimental data and biological knowledge:

  1. The differential gene expression values obtained from the transcriptomics data. Although we will consider them as resulting from “treatment versus control “pairwise comparisons, other types of contrasts can be used in the case of less trivial designs. These data are obtained by applying the suitable statistical models at the individual gene level and extend over the full transcriptome, in line with the first aspect of the holistic perspective of systems biology discussed above. We used to call them “systems response profiles” [8, 22, 25].

  2. A suitably organized collection of causal networks covering the essential biological mechanisms of the test system response to the applied exposure treatment. Unlike other types of networks, causal biological networks contain nodes that not only describe molecular concentrations but also represent functions such transcriptional, enzymatic, or kinase activities. The network edges encode causal (i.e., directed) relationships between their nodes, which is attributed a positive sign when the activities of the connected nodes are changing similarly (e.g., increase in start node causes increase in end node) or a negative one when they change oppositely (e.g., increase in start node causes decrease in end node). The underlying biological knowledge in these networks has been manually extracted from the scientific literature and encoded in the biological expression language (BEL), an ontology developed specifically for causal biological networks. The current version of the causal biological network collection is publicly available on the causal biological network (CBN) database website [26, 27]. The recent developments around the causal networks are discussed in Section 3.

  3. “Transcriptional footprints” for a large fraction of the nodes contained in the causal networks. Transcriptional footprints are transcript abundance nodes that are connected to the causal network nodes via signed directed edges, similar to the ones in the causal networks. They follow the “backward reasoning” approach, in which changes in molecular mechanisms encoded by causal network nodes (e.g., the activity of a transcription factor) can be deduced from the expression changes of their downstream-regulated genes. Clearly, these edges allow the transcriptomics data to connect to the mechanistic networks, and the NPA calculations will consist of the experimental differential gene expressions “propagating through the networks” to obtain the corresponding node- and network-level perturbations. In our assessment applications, we licensed the Selventa Knowledgebase to get a good coverage of the nodes of the causal network collection in terms of transcriptional footprints [28]. Other options are possible: the small “BEL corpus” derived from the Selventa Knowledgebase [29], the networks contained in our publications [23, 24, 30], or the commercial IPA® “causal analysis” knowledgebase [31].

Given these three inputs, the NPA methodology performs the following computational steps to quantify the treatment-induced perturbations across a network (Figure 2):

  1. Calculation of the “raw” perturbations for the nodes connected to the transcriptional footprints. Essentially, this consists of performing an edge-based, weighted average of the differential gene expressions attached to the transcriptional footprint nodes [23]. Optionally, this calculation can be applied to a complete “aggregated” network if it is “causally consistent” (or “balanced” in the graph-theoretic language). This property means that the edge-based relative sign between any two nodes must be unambiguous (i.e., must not depend of the specific path relating the two nodes). As most networks do not satisfy this condition (e.g., negative feedback loops are not causally consistent), the aggregation option would require additional processing to be operative [32].

  2. Calculation of the perturbations for all network nodes based on a constraint optimization problem. This is obtained by searching for node values that are “smooth” over the network and the transcriptional footprint edges (i.e., that have the smallest edge sign-corrected differences between connected nodes) while matching the differential gene expression values for the transcriptional footprint nodes. This problem has an exact solution that can be expressed in terms of the inverse of the adapted, signed Laplacian matrix of the network graph and of the “raw” node perturbations obtained previously.

  3. Calculation of the NPAs using an edge-based summation. The summed values are the squared edge sign-corrected mean of the corresponding node (smoothed) perturbation values. As this value is always positive, it is important to examine the node-level perturbation values to determine whether the underlying biological mechanism is activated or inhibited as a consequence of the exposure treatment.

  4. Calculation of three accompanying statistics to decide whether the obtained NPA value represents a true or a false positive. The first statistic is based on the biological variability propagated from the uncertainties of the differential gene expression values: the 95% confidence interval around the NPA value should not contain zero. The other two statistics test the relevance of the biological mechanism(s) encoded in the network by randomly reshuffling the network edges or the transcriptional footprints. This yields two null distributions for the network-level perturbation values. If the actual NPA value lies above the 95% quantile of a null distribution, it is considered to be statistically significant and labeled as “K-specific” or “O-specific,” respectively. Significant network perturbations correspond to the cases where all three statistical tests are successful.

Figure 2.

The calculations of the network perturbation amplitudes (NPAs) and biological impact factor (BIF) in a bottom-up representation. The six layers correspond to the six steps (1–6) explained in the main text. Their respective inputs, mathematical processing, and results are schematically displayed from left to right. The “complexity” column gives an order of magnitude of the corresponding data size and illustrates the associated complexity reduction scheme.

By extending the calculation of NPAs to the full network collection contained in the CBN database, we take advantage of its hierarchical structure to complete a useful, pyramidal, bottom-up complexity reduction scheme (Figure 2). The grouping of networks into network families, themselves constituting the overarching collection, allows quantification and displays the exposure-induced biological impact in a more concise way, which is particularly useful in a comparative approach to toxicity assessment. These final two steps are the following:

  1. Calculation of network family-level biological impact factors (BIF). The network families distribute the ~50 networks into five families based on their biological similarities: cell proliferation, cellular stress, cell fate, pulmonary inflammation, and tissue repair/angiogenesis. The evaluation of their BIF consists first of filtering out the networks that are not significantly perturbed and then summing the remaining NPA values with weights that take into account the number of network in each family and the nodes overlapping between networks.

  2. Calculation of network collection-level BIF. This aims at providing balanced relative weights between the five network families so the main features of the biological systems response to the exposure treatment can be perceived easily. In that sense, the BIF represents a pan-mechanistic, quantitative metric for the exposure-induced effects measured at the molecular transcript level and “shaped” by the a priori chosen network collection from the CBN database [15, 30]. It represents the starting point for investigating the impacted biological mechanisms in a top-down approach.

Having presenting the NPA methodology, it is instructive to see how it compares to existing approaches providing network-level quantification. The causal biological networks used in the NPA calculations are usually composed of several molecular signaling pathways assembled around common nodes. Generally, pathways have a simpler and somewhat more linear structure than networks, so their structure is not as important. As a consequence, it has been often disregarded in the published methodologies, which were primarily aimed at dealing with pathways. In a recent review recapitulating the network- and pathway-based methodologies developed over the last decade, only one category (out of three) takes into account the structure: the so-called pathway topology (PT) group [7]. We further observe a recurrent difference between the NPA and most PT methodologies: the goal of the quantification is the determination of the most relevant pathways or networks (compared to the other ones in the collection) to support the biological interpretation. This is achieved by sorting either abstract scores or enrichment p-values [33]. This approach corresponds to the abovementioned competitive Q1 statistic, which suits the discovery rather than the assessment perspective, corresponding to systems toxicology [16, 17]. This also indicates that the NPA approach is closer to the self-contained Q2 statistic in the sense that it allows meaningful comparison of several treatments. In short, the NPA methodology provides an explicitly network-based quantification scheme that inherently incorporates the self-contained Q2 statistic, allowing meaningful comparisons between the exposure effects on the same biological mechanism.

The NPA approach has been successfully employed across a range of toxicological questions of concern:

  • Comparative assessment of biologically active substances to complement the standard toxicological endpoints. This was used for the preclinical assessment of a candidate modified-risk tobacco product in comparison with conventional cigarettes [34, 35, 36].

  • In vitro screening of multiple compounds in combination with the capacity of the high-content screening technologies. This was applied to selections of environmental toxicants and nutraceuticals [37, 38].

  • Investigation of in vivo-in vitro translatability (red arrows in Figure 1c). The case of the xenobiotic metabolism response to cigarette smoke exposure was investigated and supported the validity of in vitro testing [39].

  • Classification of individual human subjects. A proof-of-principle application of the NPA methodology to individual subjects has been published [24], and the approach was benchmarked during the sbvIMPROVER diagnostic signature challenge [40].

  • Exploratory investigations of transcriptomics data. Examining the biological process activities contained in the collection of causal networks provides an additional point of view already used several times [41, 42, 43].

In this section, we explained the NPA methodology as a core element of network-based systems toxicology. However, its validity also depends on the quality of input from causal network collection available in the CBN database. In the following section, we discuss several innovative ways to ensure constant quality in order to consolidate the acceptance of the network-based systems toxicology.

3. Enhancements of the causal network collection

The application of network-based systems toxicology requires the a priori identification of the biological mechanisms involved in the test systems response to the applied exposure (Figure 1c and d). This led to gradually assemble a structured collection of causal networks of high-quality standards, which has been deposited in the CBN database to be accessible to run the NPA calculations in concrete situations. The validity of the whole network-based systems toxicology approach depends heavily on the biological pertinence of the retained mechanisms and of the networks encoding them. In this section, we examine these validity conditions more closely and describe two recent efforts aimed at augmenting the biological pertinence and extending the biological contexts of the causal networks: a crowdsourced review of their content and the use of semi-automated text-mining tools.

Over last two decades, the ever-increasing use of transcriptomics technologies has resulted in compilations of a number of pathway resources aimed at associating biological insight to sets of differentially expressed genes: KEGG [44], Reactome [45], BioCarta [46], Wiki-pathways [47], SPIKE [48], UCSD signaling gateway [49], NCI pathway interaction database [50], or NetPath [51]. The parallel assembly of the CBN database was decided and justified by the requirement to satisfy higher-quality standards, which were not always met by the available pathway resources (Table 1 in [27]):

  1. Explicitly accounting for the biological context by setting mechanistic boundaries in terms of species, tissue or cell type, and disease state

  2. Supporting all the causal relationships encoded in the network edges by (at least) one explicit, literature-based statement

  3. The use of BEL to encode the manually curated literature statements into a format that is both human-readable and computable and that stores the rich mechanistic and contextual information accurately

  4. Application of data-driven enhancement by analyzing suitable public or dedicated datasets using a complementing source of prior biological knowledge, such as the Selventa Knowledgebase, which contains more than two million curated relationships [28]

Note that the last feature is also relevant for augmenting the ensemble of transcriptional footprint edges, which were also extracted from the Selventa Knowledgebase in our assessment applications (see Section 2). Typically, the public dataset GSE44747 investigates the gene expression regulation by the activation of protein kinase C (“PKC”), and the molecular changes in this datasets can be causally related to the node act(p(SFAM:“PRKC Family”)) [52]. Whenever a sizable fraction of the genes regulated in this dataset are changed in response to an exposure treatment, the activation or inhibition of PKC can be inferred [28]. This example illustrates the transcriptional footprint-based “backward reasoning” necessary to connect the causal biological networks and the transcriptomics data in order to apply the NPA methodology.

In 2011, we published our first biological networks “Cell Proliferation” that are still part of the collection that serves as the input for NPA and BIF calculations [53]. The initial mechanistic interest focused on the lung biology, and version 1.0 of the collection consisted of 108 assembled causal networks regrouped into five high-level functional families (cell proliferation 15 [53], cellular stress 7 [54], cell fate 34 [55], pulmonary inflammation 24 [56], and tissue repair/angiogenesis 9 [57]). The design and assembly processes were the same for all the networks, each of them having been defined by biological boundaries chosen to globally cover all of the essential biological processes and responses of healthy lung tissues (Figure 3). Since 2015, the CBN database website has provided free access to the full collection [27]. In addition to the original focus of inhalation toxicology covering the non-diseased respiratory tract tissues, causal networks for non-diseased vascular tissues, chronic obstructive pulmonary disease, and atherosclerosis plaque destabilization have been assembled and published to enrich the covered biological contexts [58, 59, 60].

Figure 3.

Overview of the causal biological network assembly and enhancements. The CBN database website contains the initial hierarchically structured collection of biological networks describing the essential biological processes and responses of healthy lung tissues. The enhanced network versions resulting from the sbvIMPROVER network verification challenges are integrated in CBN, as well as the networks describing relevant response mechanisms in other biological contexts, which were obtained by the BELIEF semi-automated literature mining workflow.

As mentioned above, the scientific acceptability was the main requirement during the assembly of the causal networks collection, which is freely available to the scientific community in the CBN database. This motivated additional and innovative crowdsourced verification initiatives to consolidate the accuracy of the biological mechanisms encoded in the networks. They took place in the framework of the network verification challenges of the systems biology verification initiative (sbvIMPROVER NVC) [59, 61, 62, 63]. These challenges were based on a novel crowdsourcing approach by a large community of more than 50 contributors who were given tools to vote on various edges and nodes of the causal networks via a dedicated web interface [64]. A moderator supervised the votes for each network and made decisions to include or exclude nodes and edges based on community choices. The resulting 46 causal networks were made publicly available through the CBN website and constituted version 2.0 of the causal network collection organized along the same five high-level functional families as version 1.0 (cell proliferation 15, cellular stress 7, cell fate 34, pulmonary inflammation 26, and tissue repair/angiogenesis 11). Currently, the NVC platform supports a third crowdsourced network verification challenge for the liver xenobiotic metabolism [64]. Eventually, the new models will be shared via the CBN website [26].

As the original network assembly process involved significant efforts in manual literature curation (Figure 3), the development of text-mining-based capabilities appeared as an appropriate solution to increase the quantity of assembled causal networks while preserving their quality. A novel, semi-automated biological knowledge extraction workflow called the “BEL information extraction workflow” (BELIEF) was developed, which incorporates state-of-the-art linguistic tools for recognition of specific entities [65, 66]. It mines preselected, unstructured scientific literature and enables its users to extract causal and correlative relationships that are subsequently transcribed into the computable and human-readable BEL format used in the CBN network collection. A web interface has been developed, as well, to facilitate its practical application [67]. The usefulness of the BELIEF workflow was assessed during the assembly of a network describing atherosclerotic plaque destabilization and containing 304 nodes and 743 edges supported by 33 PubMed literature references [65]. The comparison between the semi-automated and conventional curation processes showed similar results but with significantly reduced curation effort for the semi-automated process. It is currently applied to a variety of biological mechanisms extending beyond the initial focus of pulmonary biology (e.g., vascular tissues).

The high quality of CBN causal network collection provides a solid foundation for the network-based systems toxicology approach. Supplementing its essentially manual assembly process, innovative crowdsourced verification initiatives have consolidated and updated the biological content of the networks. The development of the semi-automated BELIEF workflow has been beneficial not only directly, by speeding up the maintenance of the CBN collection, but also indirectly, by popularizing the use of causal networks in biomedical contexts beyond toxicological assessment [27].

4. Integration into the multiscale modeling of exposure responses

In the previous section, we saw that the enhancements of the causal network collection were opening new development opportunities for the approaches used in network-based systems toxicology. This leads to similar reconsideration of the molecular holistic approach underlying the systems biology approach from a broader perspective—that of modeling an organism response to exposure in the toxicological context. Indeed, the organism response to exposure is a complex process, covering multiple space and time scales, for which modeling approaches of diverse complexities have been used. In this section, we discuss how the causal networks used in our holistic systems biology approach can be integrated into the quantitative toxicology/pharmacology frameworks of absorption, distribution, metabolism, and excretion toxicity (ADMET) and physiologically based toxicokinetics (PBTK)/physiologically based pharmacokinetics (PBPK) modeling. This will not only reveal the approximations and limitations of the respective approaches but also eventually indicate where bridges between causal molecular networks and other modeling approaches can be built and which efforts would be required to achieve them. Paving the road for multiscale approaches constitutes a promising development perspective for improving the understanding of how potentially toxic substances interact with the human body.

ADMET belongs to the basic principles of pharmacology and toxicology and describes the kinetics, dynamics, and toxicity of compounds within the human body following an exposure. The objective of such an approach is to estimate the toxicokinetic and metabolic profiles (Figure 4a). Obviously, a molecular dynamics approach resolving the trajectories of individual molecules from absorption to excretion is not achievable because of our insufficient understanding of the interplay between the numerous molecular mechanisms involved and, from a practical perspective, limited computational power. As a consequence, the replacement of the individual molecular trajectories by the corresponding mean density distributions and velocity fields—the so-called continuum approximation—appeared to be the most suitable approach to perform quantitative modeling in the toxicokinetic context. In the specific case of inhalation toxicology, the inclusion of additional assumptions about the interplay between liquid, vapor, and aerosol phases lead to a well-defined computational fluid dynamics (CFD) scheme, which quantitatively describes the deposition of aerosol particles in the nasal and other respiratory cavities by calculating the airflows and velocities [68]. Therefore, a fine description of the dose reaching respiratory tissues (Figure 4b) can be achieved through CFD partial differential equation systems in space and time variables.

Figure 4.

The multiscale modeling framework of the human body response to toxicological exposure. The sequence from panels (a) to (e) spans several space and time scales, for which multiple modeling approaches are used. In order to make them applicable, simplifying assumptions are necessary at each step (blue arrows), and the resulting model parameters must be determined experimentally. From this perspective, the signed directed graphs (SDGs) underlying the network-based system toxicology approach can be integrated into a broader multiscale response modeling framework.

The description of the dynamics of each molecule when it reaches a cell can be done, for example, using a stochastic description of enzymatic activities through the chemical master equation (Figure 4b). While appealing on a local level, those approaches are not straightforward in global application to a whole-body model. To that end, simplifying the complex human body into a limited number of connected compartments underlying PBTK/PBPK modeling is usually explored for evaluating levels of a given substance in various tissues or organs (Figure 4c). PBTK/PBPK can also be linked to deposition CFD models, as discussed by some authors [69, 70, 71]. Metabolism is further simplified by assuming a well-stirred volume or that conversion of an enzyme-substrate complex to an enzyme-product complex is instantaneous. Such a description of the enzymatic and metabolic dynamics reduces them to a set of ordinary differential equations (ODE) in time.

In general, PBTK/PBPK-derived ODE systems involve many parameters that are not necessarily accessible to researchers, and an analytical study of the system may be required to estimate them. For that purpose, assuming steady state, ODEs can be represented semiquantitatively by a signed directed graph (SDG) derived from the Jacobian matrix of the ODE system evaluated at its steady-state solution (Figure 4d) [72]. In this context, the organismal response to an exposure is viewed as a perturbation of its steady state, which is characterized by the associated SDG encoding the time directionality and relative signs of the perturbations between all pairs of connected nodes. Although such an SDG is derived in the PBTK/PBPK context, in principle, other SDGs can be obtained similarly [73]. This is accomplished when a higher resolution of the description of molecular mechanisms involved in the response can be obtained from the scientific literature. This is exactly the case for the biological processes contained in the causal networks presented in the previous section, as we know how they integrate into the broad quantitative toxicology/pharmacology modeling frameworks built around ADMET and aimed at describing the organismal response to exposure in its full complexity.

In this short excursion aimed at broadening the modeling scope beyond molecular systems biology, we saw several approaches to quantify the response to exposure. Their validity ranges covered specific space and time scales, while a higher complexity often demanded more and more parameters to be experimentally determined to make the model applicable. As a consequence, building bridges between modeling scales represents appealing development directions to achieve a more integrated understanding of an organism response to exposure. However, the effort required to preserve the applicability of the resulting models are substantial, and in the last section, we will examine a tentative, multiscale approach that is acquiring an increasing interest the context of modern (twenty-first century) toxicology.

5. Development of quantitative adverse outcome pathways

In the previous sections, we have described NPA as a core element of network-based systems toxicology. We then saw two extensions: new networks contexts and extended modeling framework. In this final section, we propose a combination of these elements in terms of a network-based approach to qAOPs. This direction offers a novel development opportunity that needs to incorporate the predictive aspect at population level, which is to be contrasted to the a posteriori approach of test system data-driven assessment discussed up to now (Figure 1c). This feature requires an adapted approach to select the relevant biological mechanisms as well as the development of quantitative, multiscale modeling approaches of the suitable complexity.

Starting from ecotoxicology and quickly gaining popularity in human toxicology, adverse outcome pathways (AOP) have become a valuable means to model exposure effects. Similar to the network models, AOPs organize existing, scattered literature knowledge into a structured representation with the aim to construct a linear sequence of “key events” (KE) from the initial interaction between a chemical and the biological system—the molecular initiating event (MIE)—to the individual and population-level adverse outcome [74] (Figure 5). We have contributed to the development of two AOPs for the common disorders resulting from long-term smoking and published them in the AOP wiki [75]. The first AOP maps the events from epidermal growth factor receptor activation by oxidative stress to decreased lung function [76], and the second AOP illustrates the different steps that are required for oxidative stress to lead to disruption in endothelial nitric oxide bioavailability and, finally, to hypertension [77]. These AOPs were built following the requirements by the Organization for Economic Co-operation and Development (OECD) [78]. One avenue to network-based systems toxicology is to build BEL models that represent these events. The first BEL model suite is underway and describes the biological processes involved in impaired mucociliary clearance. It is foreseen to be published under an SBVimprover NVC in 2018 [64].

Figure 5.

The structure of an adverse outcome pathway (AOP).

While the above effort aims at identifying the mechanistic biological knowledge underlying the chosen AOPs, the parallel development of the associated quantitative modeling approaches needs to be moved forward. It was anticipated to follow three steps to yield a “dynamic adverse outcome pathway” [79]:

  1. Assembly and quantification of causal mechanistic networks

  2. Development of dynamic models linking exposure to the organ-level responses

  3. Simulation of the population-level effects of an exposure

The importance of this endeavor was underlined by the fact that the achievement of Step [3] was explicitly promoted as “the ultimate goal of systems toxicology.” As Step [1] has been completed with the CBN database and the development of the NPA methodology, the attention now focuses on Steps [2] and [3], which have to incorporate the predictive capacity of the future qAOP. Typical useful resources in this context are the BioModels database containing hundreds of computational models of biological processes (Step [2], [80]) as well as the “mechanistic axes population ensemble linkage” algorithm, which enables the creation of large sets of mechanistically distinct virtual humans that, upon simulated exposure, statistically match the prevalence of phenotypic variability reported in human population sample studies (Step [3], [81]).

Given the network-based system toxicology components presented in this chapter, several directions could be considered to support the qAOP development. Typically, appropriate transcriptomics datasets could be identified and used for applying NPA quantification to causal networks representing the biological mechanisms underlying one or more KE and their relationships. Although the time dependence is not explicit in the SDGs associated to the networks, their causal characteristic can provide information about the time direction based on the sequence of causally related perturbations. As during the assembly of the CBN network collection, the use of transcriptomics data is expected to improve the accuracy of the networks. In the qAOP context; the usual “treatment vs. control” experimental design might be advantageously replaced by a time course design, which can reveal (part of) the time evolution of the relevant perturbed mechanisms. We may also consider the possibility of calculating NPA at individual level, which, as a consequence of its complexity reduction property, yields better between-class separations in classification contexts [24]. This could be used to more accurately model the population-level distributions of the exposure-induced perturbations.

To conclude on a more concrete note, we show the “real-life” example of a simple qAOP developed for risk assessment in ecotoxicology: the connection between the inhibition of cytochrome P450 19A aromatase (the MIE) and the population level decreases of the fish fathead minnow (the adverse outcome) [82]. Concretely, the easily collected measures of chemical inhibition of the rate-limiting steroidogenic aromatase enzyme are used to predict reductions in egg production and, subsequently, population size of the fish. The quantitative modeling of the associated sequence of events was achieved by linking three discrete models describing different components of the AOP, from the MIE (aromatase inhibition) through five intermediate KEs, to impacts of regulatory interest (fecundity, population size). While the qAOP was developed based on experiments with fish exposed to the aromatase inhibitor fadrozole, a toxic equivalence calculation allowed to predict the effects of another untested aromatase inhibitor, iprodione.

This example showed that as long as their main elements are well chosen, qAOPs do not need to be “complicated,” as it would have been expected from a pathway covering multiple levels of biological organization (i.e., from molecules to population levels). This observation effectively illustrates the trade-off that needs to be found during qAOP development between biological accuracy, modeling complexity, and practical value in terms of predictive capacity. All three aspects are equally important for the validity of the outcome, as qAOPs are meant to play a central role in regulatory decision-making based on twenty-first-century toxicology approaches to risk assessment.

6. Conclusions

In this incursion into the field of network-based systems toxicology, we have seen how original approaches were used and developed to provide innovative tools for assessing the health risks associated with the exposure to chemical compounds of uncertain safety. The application of systems biology principles to the assessment of exposure-induced responses involved the generation of genome-wide transcription profiles. These large datasets were processed using a combination of standard bioinformatics tools and ad hoc methodologies following a network-based framework reflecting the holistic perspective of systems biology. This approach provided an implementation of the NSR principles and, in particular, supported the 3Rs initiative aimed at reducing animal use in research. We described in more detail the NPA methodology suitable for the particular type of causal networks using the “backward reasoning” approach. Combined with the collection of causal networks available on the CBN website, NPA enables the quantification of exposure-induced perturbations of the mostly molecular biological mechanisms described by the networks. This provided a quantitative assessment of the biological impact resulting from toxicological exposure treatments and offered multiple application possibilities. Turning to the current developments of network-based systems toxicology, we first mentioned the quality improvement of the CBN causal network collection using crowdsourcing initiatives (SBVimprover) and the extension to new biological contexts enabled by the application of literature mining tools that partially replace the manual curation process needed to assemble high-quality causal networks. After integrating the network-based systems biology approach into the multiscale modeling of exposure responses, we discussed the qAOP as a promising development avenue for network-based systems toxicology. Its expected advantageous use in the regulatory decision-making context represents an appealing perspective that justifies the past, current, and certainly future efforts deployed in the development and applications of systems toxicology.

Acknowledgments

We thank our colleagues from the Systems Toxicology Department and from the Research Technologies group of PMI R&D Biomedical, our long-standing collaborators from the Fraunhofer Institute for Algorithms and Scientific Computing, and all the members of the SBVimprover community. We also acknowledge Nicholas Karoglou for the editing of the manuscript.

Conflict of interest

All authors are employees of Philip Morris International.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Alain Sewer, Marja Talikka, Florian Martin, Julia Hoeng and Manuel C Peitsch (June 20th 2018). Developing Network-Based Systems Toxicology by Combining Transcriptomics Data with Literature Mining and Multiscale Quantitative Modeling, Bioinformatics in the Era of Post Genomics and Big Data, Ibrokhim Y. Abdurakhmonov, IntechOpen, DOI: 10.5772/intechopen.75970. Available from:

Embed this chapter on your site Copy to clipboard

<iframe src="http://www.intechopen.com/embed/bioinformatics-in-the-era-of-post-genomics-and-big-data/developing-network-based-systems-toxicology-by-combining-transcriptomics-data-with-literature-mining" />

Embed this code snippet in the HTML of your website to show this chapter

chapter statistics

122total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Genome-Guided Transcriptomics, DNA-Protein Interactions, and Variant Calling

By Emmanouil E. Malandrakis and Olga Dadali

Related Book

First chapter

Virtual Plant Breeding

By Sven B. Andersen

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us