Metabolomics for soil contamination assessment

The evaluation of biological responses to assess and predict the impact of environmental changes in ecosystems functioning is receiving increasing attention, and current research focuses on overcoming concerns about the specificity of biomarkers (Amiard-Triquet et al., 2012). Generally, biomarkers are chemicals, metabolites, susceptibility characteristics, or physiological changes that relate to the exposure of an organism to a chemical. Accordingly, a selected biomarker, i.e. biological response, can be linked to a specific environmental exposure, being representative of the health status of the ecosystem studied. The identification of biomarker profiles has been possible upon the development of metabolomics. Those profiles allow the genuine identification of the relevant biological response/s associated to a particular exposure while the assessment of a single biomarker could only estimate the potential response of the ecosystem to a particular pollutant.


Introduction
The evaluation of biological responses to assess and predict the impact of environmental changes in ecosystems functioning is receiving increasing attention, and current research focuses on overcoming concerns about the specificity of biomarkers (Amiard-Triquet et al., 2012). Generally, biomarkers are chemicals, metabolites, susceptibility characteristics, or physiological changes that relate to the exposure of an organism to a chemical. Accordingly, a selected biomarker, i.e. biological response, can be linked to a specific environmental exposure, being representative of the health status of the ecosystem studied. The identification of biomarker profiles has been possible upon the development of metabolomics. Those profiles allow the genuine identification of the relevant biological response/s associated to a particular exposure while the assessment of a single biomarker could only estimate the potential response of the ecosystem to a particular pollutant.
Metabolomics is the generic name assigned to a scientific field that addresses the characterization of low molecular weight organic metabolites released by living organisms in response to environmental stimuli. Morrison et al. (2007) provided an extended definition ''the application of metabolomics to the investigation of both free-living organisms obtained directly from the natural environment (whether studied in that environment or transferred to a laboratory for further experimentation) and of organisms reared under laboratory conditions (whether studied in the laboratory or transferred to the environment for further experimentation), where any laboratory experiments specifically serve to mimic scenarios encountered in the natural environment''.
The methodological approach of metabolomics relies on a comprehensive analysis of the set of metabolites or "metabolome" produced in response to particular environmental stimuli. Accordingly, the metabolome is the pool of metabolites, small molecules, within a cell, tissue, Bundy et al. (2009) highlight the challenge of identifying a large number of metabolites and the necessity of creating metabolite databases specifically dedicated to environmental issues. Multivariate statistical analysis has proved highly effective for metabolite identification. Thus, principal component analysis (PCA) is used to identify differences between metabolic profiles of organisms exposed to organic or inorganic pollutants (Jones et al., 2014;Kwon et al., 2012;Lankadurai et al., 2011). Besides, association between the metabolic profile and biological factors evaluated as markers for exposure to pollutants can be modelled by partial least squares (PLS) regression analysis (Ellis et al., 2012).
The implementation of metabolomics for the assessment of soil contamination is nevertheless at an early stage (Viant, 2009). A basic screening of published research in the web of science returns circa 100 items for the search "soil pollution-metabolomics", with a significant launch in 2011 (Figure 1), reduced to 21 records when the search is narrowed with the term "biomarkers", published in the period 2007-2013. However, emerging regulatory challenges demand the advance of toxicity testing. Toxicogenomics tools have been presented as an advanced from the current methodologies used for regulatory decision making in ecotoxicology, which entirely rely on whole animal exposures and adverse effects on survival, growth, and reproduction (Ankley et al., 2006). From the acquisition of reproducible metabolic profiles as response to the presence of specific pollutants in soil (Jones et al., 2008) to the application of metabolomics techniques to the study of the response of the entire community of a soil to factors such as pollution and climate change (Jones et al., 2014), the implementation of metabolomics in ecotoxicology is a sound answer to the current needs of society and the environment (van Ravenzwaay et al., 2012).
During the last decade a number of general revisions about the application of metabolomics in environmental health assessment have been published Miller, 2007;Snape et al., 2004;van Ravenzwaay et al., 2007;Viant et al., 2003). The present review specifically summarizes the most significant research concerning implementation of metabolomics in soil contamination assessment. The main objectives of this revision are i) to provide a systematized outline for the application of metabolomics in risk assessment of soil contamination, ii) to provide a rapid guide to the methodological approaches currently optimized and iii) to unify and simplify the knowledge currently available in the topic to provide an accessible tool for further advance in the implementation of metabolomics in risk assessment.

Metabolites isolation
Generally, metabolites are extracted from intact organisms (occasionally from selected relevant tissues) that have been exposed to the studied toxicant by moderate chemical extraction (Baylay et al., 2012;Yuk et al., 2010). The organisms commonly selected for toxicity testing are earthworms (Table 1), particularly the genus Eisenia, which are a classic model organism for toxicity assays (Sanchez-Hernandez, 2006;Van Gestel et al., 1992;van Gestel et al., 1989) and have been since long included in official guidelines (OECD, 1984(OECD, , 2004. Earthworms ingest large amounts of soil and uptake a significant amount of contaminant through the skin. Therefore they are continuously exposed to contaminants. Extractions performed with methanol-chloroform (Baylay et al., 2012) or phosphate buffer solution (Yuk et al., 2010) on pulverized or lyophilised organisms are described to extract the maximum number of metabolites while allowing the performance of reliable analyses.
The isolated extracts usually might not require further sample treatment prior to analysis, which minimizes the introduction of artefacts but also facilitates the development of low cost, rapid methodologies. advanced from the current methodologies used for regulatory decision making in ecotoxicology, which entirely rely on whole animal exposures and adverse effects on survival, growth, and reproduction (Ankley et al., 2006). From the acquisition of reproducible metabolic profiles as response to the presence of specific pollutants in soil (Jones et al., 2008) to the application of metabolomics techniques to the study of the response of the entire community of a soil to factors such as pollution and climate change (Jones et al., 2014), the implementation of metabolomics in ecotoxicology is a sound answer to the current needs of society and the environment (van Ravenzwaay et al., 2012).

Published items in each year
Citations in each year . The present review specifically summarizes the most significant research concerning implementation of metabolomics in soil contamination assessment. The main objectives of this revision are i) to provide a systematized outline for the application of metabolomics in risk assessment of soil contamination, ii) to provide a rapid guide to the methodological approaches currently optimized and iii) to unify and simplify the knowledge currently available in the topic to provide an accessible tool for further advance in the implementation of metabolomics in risk assessment.

Metabolites isolation
Generally, metabolites are extracted from intact organisms (occasionally from selected relevant tissues) that have been exposed to the studied toxicant by moderate chemical extraction ( (OECD, 1984(OECD, , 2004. Earthworms ingest large amounts of soil and uptake a significant amount of contaminant through the skin. Therefore they are continuously exposed to contaminants. Extractions performed with methanol-chloroform (Baylay et al., 2012) or

Metabolites determination: chromatography, spectroscopy and spectrometry
The leading analytical techniques in metabolomics for soil contamination assessment are proton nuclear magnetic resonance spectroscopy ( 1 H NMR) and gas chromatography-mass spectrometry (GC-MS), as thoroughly reported in Table 1

Metabolomics data analysis
The general approach to data analysis in metabolomics can be summarized in three main stages: explorative, supervised and biological interpretation (Smilde et al., 2010). The explorative phase aims to find groups, clusters and outliers in metabolites and samples studied while the supervised discriminates two or more groups to make predictive models and to find biomarkers ( purposes (Kammenga et al., 2000). Finally, the biological interpretation seeks the links between metabolome data and underlying metabolic networks through metabolite set enrichment, pathway analysis and metabolic network inference (Trygg et al., 2006). Thus, finding metabolite relationships is essential to determine comprehensive and meaningful metabolic changes as biological response to environmental stimuli (Ellis et al., 2012; Morrison et al., 2007). Accordingly, such extensive evaluation of the impact of pollutants in the metabolism of target organisms is the approach that can add value to the assessment of soil health and viability of soil organisms undergoing stress from pollution.

Metabolomics bioinformatics
Information processing by bioinformatics tools and computational biology methods has become essential for solving complex biological problems in genomics, proteomics, and metabolomics. Understanding "omics" data requires both common statistical and computational based methods due to the multi-dimensional and complexity level of the data.
Data-analytical methods for the study of biological systems as developed in the field of computational biology provide a suit of indispensable tools to survey the outcome of metabolomics studies. First, computational biology allows a fast screening of the large biological and chemical data sets generated (Shulaev, 2006), and therefore the identification of the most relevant metabolites, i.e. compounds specifically representative of the metabolic changes in the model system following exposure to different concentrations of organic and inorganic toxicants. As a result of the large number of variables (metabolites) studied, metabolomics studies encompass a significant statistical power for the systematic detection of biological responses to environmental changes (van Ravenzwaay et al., 2012). Second, the mathematical models developed in computational biology allow the identification of relationships between the external stimuli and the metabolic response (Zhang et al., 2010). Third, the implementation of computational algorithms to structural biology makes possible to discover the structurefunction of new macromolecular compounds, the functional enzymatic conversion and changes in their activity, as well as their molecular interaction and relationship with others compounds in the pathways where they are involved (Jimenez-Lopez et al., 2013). Moreover, it is possible to detect patterns in such biological responses and establish significant doseresponse relationships. Besides, pattern recognition reduces the metabolomics data from hundreds of variables to two or three components that are orthogonal to each other. Overall, this advance of computational biology has been possible due to three significant technological breakthroughs: high-information-content data streams, novel bio-statistical methods, and the computational power to analyse these data. Although metabolomics studies mostly use multivariate statistics, univariate statistical analyses can contribute to the information gained from a study. Thus, t-tests can be used to assess the significance of the separation between the controls and stressed organisms in PCA and PLS-DA scores plots. Also, t tests can be used to determine which metabolites in the 1 H NMR spectra of the treatment class increased or decreased significantly relative to the controls.

Biomarkers
The somewhat secondary significance of biological responses for soil contamination assessment was customarily associated to the limitation of biomarkers as measurable responses to contaminants, which classically could only provide an indication of exposure to contaminants in soil (Sanchez-Hernandez, 2006). The development of metabolomics, considered an "emerging field" as late as mid-2010, has provided the tools for the determination of multiple biomarkers across different levels of biological organization, and therefore a better assessment of the ecological consequences of contamination. Since the creation of the first metabolomics web database, METLIN (Smith et al., 2005), 60,000 metabolites has been incorporated, a rapid development closely related to the evolution of mass spectrometry instrumentation and data analysis tools. Currently, the number of databases and metabolites registered is continuously increasing. Table 2 summarizes some of the most relevant databases operative and the corresponding website is also indicated. Further information on metabolomics databases can be obtained from the metabolomics society (http://www.metabolomicssociety.org). For instance, ChemSpider is an aggregated database of organic molecules containing more than 20 million compounds from many different providers. At present the database contains information from such diverse sources as a marine natural products database, ACD-Labs chemical databases, the EPA's DSSTox databases and from a series of chemical vendors. It has extensive search utilities and most compounds have a large number of calculated physicochemical property values.
One of the goals in bioinformatics is to establish automated and efficient ways to integrate large, biological datasets from multiple sources. This objective is challenging because data sources are heterogeneous in terms of their functions, structures, data access methods and dissemination formats. In addition, the enormous quantity of information produced by "omics" is handled via computers that systematically analyze and store the accumulating sequence, structure and function data. Databases are essential in metabolomics because they provide a rapid and specific tool to identify the compounds isolated from an organism exposed to a particular environmental challenge. Thus, the KNApSAcK package provides tool for analysing datasets of mass spectra as well as for retrieving information on metabolites by entering the name of a metabolite, the name of an organism, molecular weight or molecular formula. A list of metabolites that are associated to a taxonomic class can be obtained by search with the taxonomic name, from which information of individual metabolites can be retrieved. The NIST Chemistry WebBook provides access to chemical and physical property data for chemical species. The data provided in the site are from collections maintained by the NIST Standard Reference Data Program and outside contributors. Data in the NIST Chemistry WebBook can be found by direct searches for chemical species or indirect searches based on related data. Specific databases are also being developed, such as LIPID MAPS, currently the largest database of lipid molecular structures. Otherwise, SetupX combines mass spectrometric and biological metadata, which is a step forward in the organization of information generated by metabolomics analysis.
Metabolomic databases are thus accompanied by accurate description of the biological study design and accompanying metadata reporting on the laboratory workflow from sample preparation to data processing.
Currently, standard analyses focus on the determination of amino acids, mono-and disaccharides, lipids/fatty acids, short chain fatty acids and small phenolics. Accordingly, it is possible to already launch the standardization of metabolomics analysis. For instance, the Northwest Metabolomics Research Center (University of Washington) has established a relevant list of target compounds to evaluate biological responses to changes in the environment. The list of compounds is summarized in Table 3. Valine, leucine and isoleucine biosynthesis 11

Metabolic Pathways Number of Metabolites
Valine, leucine and isoleucine degradation 5 ing to the research results summarized in Table 1, the implementation of metabolomics in the assessment of soil contamination indicates that contaminants in soil affect several of the major metabolic pathways in living organisms (Table 3), including glycolysis, trycarboxylic acids cycle and amino acids metabolism. Moreover, data analysis indicates an overall reduction in the production of the associated metabolites. For instance, the interference in amino acids specialized pathways results in a decreased synthesis of purine and pyrimidine nucleotides (Brown et al., 2010;McKelvie et al., 2011). These nucleotides are essential for the production of the energy (ATP molecules) that drive most of the enzymatic reactions in living organisms, but also protein synthesis is consequently hampered, which explain the negative effect in processes such as antioxidant activity.  (Lankadurai et al., 2011). Relatedly, earthworm esterases has been proposed as biomarkers for pesticide contamination in soil (Sanchez-Hernandez, 2010). Esterases are directly involved in the natural tolerance of earthworms to pesticides, and can therefore be used as specific biomarkers, but furthermore, their characterization by metabolomics approach might help to select the appropriate earthworm species for regulatory toxicity testing. Overall, the increasing specificity of the research performed in ecotoxigenomics will allow a realistic and meaningful incorporation of biological responses in ecological risk assessment.

Oxidative stress in contaminated soil
The induction of the oxidative stress response by the presence of toxic compounds in the environment is a primary mechanisms of defence, although prolonged exposure to contaminants is likely to overwhelm this short-term defence (Regoli et al., 2002).
Metabolites such as proline possibly detoxify the ROS under stress in vivo (Smirnoff, 1993). Exposure of plants to both redox active, for example, Cu and Hg, and other metals, for example, Cd and Zn, induces the generation of free radicals that leads to oxidative stress. This represents one of the major causes of toxicity particularly due to redox metals. The cells are equipped with an elaborate network of antioxidative enzymes and low molecular weight metabolites which mitigate the oxidative stress. Proline scavenges different free radicals in certain in vitro generation and detection systems.
Proline quenches ROS and reactive nitrogen species (RNS), which relieves the oxidative burden from the glutathione system. Moreover, polyamines also have an antioxidative role by quenching the accumulation of O 2 .-probably through inhibition of NADPH oxidase (Paschalidis and Roubelakis-Angelakis, 2005). This may facilitate phytochelatin synthesis and enhance metal tolerance (Siripornadulsil et al., 2002).
Overall, oxidative defence response to toxicity or other environmental stress involves the generation of oxygenated metabolites from exposed organisms and activation/inhibition of the production of antioxidants enzymes and metabolites such as glutathione. The depletion of antioxidants for prolonged exposures might result in the decrease of the response effectiveness and eventual imbalance between generation and elimination of reactive oxygen species.
Depletion of glutathione appears to be a major mechanism in short-term heavy metal toxicity (Schutzendubel and Polle, 2002). In accordance with this hypothesis, a good correlation between glutathione contents and tolerance index was observed with 10 pea genotypes differing in Cd sensitivity (Metwally et al., 2005). High GSH concentrations in hyperaccumulator T. Goesingense coincided with high constitutive activity of serine acetyl transferase (SAT); SAT catalyses the acetylation of L-Ser to OAS which in turn provides the carbon skeleton for Cys biosynthesis. Elevated GSH levels in T. Goesingense also coincided with the ability both to hyperaccumulate Ni and to resist its damaging oxidation effects.
The significance of glutathione and the metal-induced phytochelatins (PCs) in heavy metal tolerance has been studied intensely (Rauser, 1995). However, PCs are important for detoxification of only a limited set of metals such as Cd 2+ , Cu 2+ and AsO 2 2-while Zn 2+ and Ni 2+ are poor inducers of PCs and exhibit low binding affinity. Most other metals lack significant binding.
Evaluation of metabolites related to oxidative response constitutes a relevant group of target compounds for risk assessment. Although oxidative response to soil contamination has been classically addressed in plants, the study of this response in soil microorganisms is already being introduced in ecotoxicology as a fundamental part of the biological response of soil microorganisms to soil contamination (Boer et al., 2013;Tremaroli et al., 2009). Accordingly, Boer et al. (2013) describe the attenuation of the oxidative response for springtails in laboratory tests, which constitutes and early detection of soil pollution, and standardized test have been developed.

Metabolites related to soil contamination with organic compounds
The importance of the identification of biomarkers and metabolic pathways specifically related to soils contamination with a particular pollutant or group of pollutants has been already highlighted through this chapter. From the information summarized in Table 1 and Table 3 it is possible to infer that soil contamination with organic compounds, namely pesticides o polycyclic aromatic hydrocarbons, abates essential metabolic pathways such as the trycarboxylic acid cycle and the oxidative stress response, while lipid metabolism appears to be enhanced. However, the advance in the application of bioinformatics is providing further progress in terms of identification of specific biomarkers for risk assessment of individual target compounds. Thus, toxicity of endosulfan has been directly related with alterations of the GABA-glutamine cycle (Yuk et al., 2013), while chlorpyrifos depresses the Cori cycle and reduces the production of phospholipids, as indicated by lower levels of choline . Baylay et al. (2012) specifically relates chlorpyrifos toxicity to increased levels of  (Brown et al., 2010;Lankadurai et al., 2012), confirming the capability of metabolomics to discriminate the metabolic pathways involved in the response to a particular toxic compound. Moreover, the results strongly suggest that sets of biomarkers might be soon sufficiently reliable as for their implantation in in toxicity standardized test.
The relevance of these and future studies on the development of risk assessment strategies is aggravated by the inherent risk of soil contamination for human health. Soil contaminants may be responsible for health effects costing millions of euros. Health problems range from cancer (arsenic, asbestos, dioxins), to neurological damage and lower IQ (lead, arsenic), kidney disease (lead, mercury, cadmium), and skeletal and bone diseases (lead, fluoride, cadmium).
Overall, few studies have been conducted on the toxicity of complex chemical mixtures in soils. The effects of the soil and organisms within it upon organic pollutants are unknown. The data currently available correspond mostly to short-term studies and high level exposure of these chemicals, which is less relevant to the potential low-level, long term health impacts on living organisms near to contaminated soil.

Metabolites related to soil contamination with heavy metals
The uptake of excess metal ions is toxic to most organisms, and the biochemical impact of metal ions on the cells varies with the chemistry of the element as their chemical nature. In plants, phytotoxicity of heavy metals in most parts can be attributed to symplastic accumulation of heavy metals, such as the cytosol and chloroplast stroma. Metal-induced changes in development are the result of either a direct and immediate impairment of metabolism or signaling processes that initiate adaptive or toxicity responses that need to be considered as active processes of the organism. Transport processes have been recognized as a central mechanism of metal detoxification and tolerance (Hall, 2002;Hall and Williams, 2003). Some metals, for example, Zn and Cu, are essential for normal plant growth and development as they serve as structural and functional components of specific proteins. Other metals, for example, Cd and Pb, have no known function in plants although a Cd requirement for carbonic anhydrase from marine diatoms has been reported (Lane and Morel, 2000).
Upon exposure to metals, organisms often synthesize a set of diverse metabolites that accumulate to concentrations in the millimolar range, particularly specific amino acids, such as proline and histidine, peptides such as glutathione and phytochelatins (PC), and the amines spermine, spermidine, putrescine, nicotianamine, and mugineic acids that can be detected as response to these metals exposure. The advance of toxicogenomics in relation to organic contaminants is significantly ahead of the equivalent research in metal contaminated soil (Table 1). Nevertheless, research conducted up to date has yielded a number of biomarkers representative of the biological response of soil microorganisms to metals toxicity. Thus, soil contamination with Pb has been related with an enhancement of lipid metabolism (Sanchez-Hernandez, 2006) and more directly with reduction of tyrosine levels (Wu et al., 2013). Furthermore, Ni-hyperaccumulation has been specifically linked to histidine production (Krämer, 2005), particularly for Saccharomyces cerevisiae (Pearce and Sherman, 1999). The beneficial role of high histidine levels has been shown in transgenic Arabidopsis thaliana which accumulated about 2-fold higher histidine levels than wild-type plants and showed more than 10-fold increased biomass production in the presence of toxic Ni in the growth medium (Wycisk et al., 2004).. Moreover, cell surface-engineered yeast displaying a histidine oligopeptide (hexa-His) has been shown to adsorb 3-8 times more copper ions than the parent strain, being more resistant to Cu than the parent (Kuroda et al., 2002).
Otherwise, polyamine contents are altered in response to the exposure to heavy metals. Weinstein et al. (1986) showed an increment in putrescine content in Cd-treated oat seedlings and detached oat leaves with a marginal rise in spermidine and spermine content. They influence a variety of growth and development processes in plants and have been suggested to be a class of plant growth regulators and to act as second messengers (Kakkar and Sawhney, 2002). It has been suggested that they could stabilize and protect the membrane systems against the toxic effects of metal ions, particularly the redox active metals.
Overall, the number of studies remains rather scarce, and the preliminary results available in the literature merely constitute a launching platform for this promising research field.

Future perspectives
The main objective of metabolomics implementation in soil risk assessment is to meet the continuously increasing demand of safety data from human and ecological risk assessments. Accordingly, regulatory programs worldwide are currently incorporating tests with endpoints that involve the effects of chemicals and the impact in specific metabolic pathways (Ankley et al., 2006). Toxicological end-points can be general biological responses such as survival or weight loss (Baylay et al., 2012), but specific biomarkers provide the accuracy that was classically elusive for test with living organisms Several issues immediately arise from the summary here presented, such as the need to perform field toxicological test, with natural soils rather than use artificial soils, as was the case with some of the studies listed in Table 1. Ecotoxigenomics can also benefit from the incorporation of further analytical techniques. Techniques based on mass spectrometry are certainly required to understand the mechanisms involved in the alteration of metabolic pathways as response to toxicants. However, for screenings which merely require the detection of differences between metabolic phenotypes, optical methods such as FT-IR would be suitable, particularly if extremely high sample throughput is required ). Although no data was available in the existing literature, Figure 2 illustrates the change in the fingerprint of organic compounds in a soil amended with different sources of carbon collected 10 after the application. While some of the groups of compounds might be merely related to the sources of carbon added, the variations in the signal associated to polysaccharides (600-1000 cm -1 ) can be associated to changes in the metabolic fingerprint of the soil system and therefore linked to microbiological activity in soil. Overall, the introduction of these results seeks to encourage further characterization of families of compounds in intact soil (or functional pools such as aggregates) in relation with soil processes, an approach that can find immediate application in the assessment of biological responses to toxic compounds in soil. The variability of biological responses has been one of the main obstacles for their implementation in standardized risk assessment. However, the examination of changes in biological Environmental Risk Assessment of Soil Contamination processes by accurate analytical techniques and powerful statistical tools has launched a new era in our understanding of the soil processes. The possibility of identifying the most sensitive metabolites for a certain toxicant and develop a tailored standardized test is the ultimate goal pursued.