The astounding ability of plants to make smart decisions in response to environment is evident. As they have evolved a long list of complex and unique processes that involve photosynthesis, totipotency, long-distance signaling, and ability to restore structural and metabolic memory, recognition, and communication via emission of the selected class of volatiles. In recent years, use of metabolite profiling techniques in detection, unambiguous identification, quantification, and rapid analysis of the minute quantity of cellular micromolecules has increased considerably. Metabolomics is key to understand the chemical footprints during different phases of growth and development of plants. To feed the ever-increasing population with limited inputs and in a rapidly changing environment is the biggest challenges that the world agriculture faces today. To achieve the project genetic gains, the breeding strategies employing marker-assisted selection for high-yielding varieties and identifying germplasm resistant to abiotic and biotic stresses are already in vogue. Henceforth, new approaches are needed to discover and deploy agronomically important gene/s that can help crops better withstand weather extremes and growing pest prevalence worldwide. In this context, metabolic engineering technology looks viable option, with immense potential to deliver the future crops.
- mass spectroscopy
- metabolic engineering
Metabolomics is one of the fascinating disciplines in ‘– omics’ field involving plants, animals, and microorganisms. Since its adoption in the mid-1990s in the field of plant biology, this approach has been successfully used in identifying important gene(s) in plants [1, 2]. The model plant Arabidopsis thaliana (henceforth referred to as Arabidopsis) has been extensively researched using a plethora of genomic tools and technologies, facilitating functional genomics analyses. In recent years, metabolomics approach has been extended in crop plants to ascertain gene functions [3, 4]. The ability of metabolome to serve as an ultimate phenotype of a cell renders it immensely promising for advancing crop-breeding gains . For instance, delineating metabolite quantitative loci (mQTL) in crop plants offers information about the genomic target regions or genes that hold great relevance to breeding [6, 7]. Also, food and agronomical traits of crops improved through genetic modification (GM) could be better evaluated in terms of the metabolites present [8, 9].
During the last decades, techniques used to analyze metabolites have shown unprecedented refinements such as improvements in mass spectrometry (MS) and nuclear magnetic resonance spectroscopy (NMR), in conjunction with the growing ability of bioinformatics. In this chapter, we present the application of metabolomics for functional genomics in crops as well as its possible integration with crop breeding to deliver future crops.
2. Different platforms to gather metabolomic data
Let us take an example of tomato as a model system that contains different categories of chemical compounds contributing to the fruit quality. These include sugars, organic acids, amino acids, fatty acids, isoprenoids, and polyphenolic compounds. Variety of separation approaches have been used to investigate the tomato metabolome, using both targeted and nontargeted metabolomics, leading to a wide range of quality biomarkers. Targeted metabolomics is by far the most common way, as most research programs focused on understanding or improving a single target trait. A great deal of information exists that explain the phenotypic variation; however, this information may not be easily accessible.
Small molecules can have large effects. For example, the variation in the ratio between sweetness and acidity causes tomatoes to taste sharp, sweet, insipid, or lovely . Accelerating improvements through breeding programs demands large-scale and low-cost assays that allow analysis of thousands of samples within a short period of time . Phenotypic surveys of diverse germplasm have a very broad scope and help defining the range of acceptable phenotypic variation, albeit limited in their depth. These kinds of data on organic acid and sugar can be leveraged with gene expression analysis for discovering the genetic causes underlying fruit quality . The information on carbohydrates and organic acids can also be obtained using more sophisticated tools such as nuclear magnetic resonance (NMR) spectroscopy, which detects more compounds per assay than enzymatic or colorimetric methods but at far lower throughput . NMR spectroscopy is used for structural determination of a novel metabolite of particular interest. Alternatively, gas chromatography (GC) paired with mass spectrometry (MS) (GC–MS) permits broad-scope metabolomic profiling, with increased throughput compared to the NMR . On the flip side, the need of GC–MS for chemical derivatization may cause exclusion of some metabolites from the analysis, and also may not produce sufficient information for the clear identification of a particular metabolite. However, combining multiple datasets emanating from complementary analytical platforms offers a powerful strategy to analyze metabolomes.
In tomato, color and aroma are other targets for improvement. A majority of pigments in tomato are isoprenoids, such as carotenoids, while others are polyphenolics (e.g., flavonoids) . Traditionally, liquid chromatography (LC) with commercial standards is used for carotenoid profiling . However, LC–MS is to be used for more complete estimate of metabolomes especially for isoprenoids. The MS analysis is done either inline with the LC or in an offline mode [17, 18]. Inline MS simplifies work flow, while offline MS may enhance sensitivity due to the greater reduction of sample complexity . NMR spectroscopy could also be for isoprenoid profiling, which is effective in distinguishing E and Z isomers; not possible from MS analysis . This is important as different carotenoid isomers may have different biological activities, hence, nutritive qualities . Carotenoid composition may change during food preparation and processing, both in quality (i.e., isomerization) and identity (i.e., degradation by heat). Therefore, analysis of both raw and cooked samples is necessary for complete description of the isoprenoids [21, 22]. In addition to color, carotenoids also contribute to fruit aroma, as do fatty acid and amino acid derivatives . All three represent volatile compounds, GC and GC–MS are used for their separation and identification [23, 24]. A metabolite survey of approximately 100 Dutch tomato cultivars was conducted using LC–MS and MS/MS .
Need for a highly curated database is one of the challenges routinely faced while analyzing MS or NMR data in order to better understand the spectra produced during an experiment. Fortunately, recent developments in tomato metabolomics have led to creation of such community-oriented resources.
In recent past, several software and analyzing tools has been developed for processing and analyze the metabolite data but till now none of the platform is self-sufficient to fulfill the user expectations. In this context, Department of Biotechnology, Government of India, has initiated a project to develop a platform (Computational Core for Plant Metabolomics, CCPM) that is a web-based collaborative platform for researchers in the field of metabolomics to store, analyze, and share their data .
3. Gene identification
Metabolomics study helps identifying particular mQTL which corresponds to gene(s) related to that particular trait. The method is increasingly gaining recognition because once mQTL is identified then it became easier to pin-point gene(s) responsible for that particular metabolite .
4. Breeding program
Researchers/breeders are interested in selecting desirable genotypes from a large plant population. Initial selection procedures relied solely on the phenotypic appearance of the plants but information on the entire breeding cycle is required (a time of nearly 10 years) to release an improved variety. To reduce this time duration, marker-based technologies such as enzyme-based markers, marker-assisted selection (MAS), and so on have been employed, that shortened the entire process up to 6 years. By using mQTL-based selection, we may further reduce time up to 4 years, given the fact that most of the metabolites are directly related to particular phenotype; and selection of mQTL remains easier and faster than that of MAS .
5. Metabolomic approaches to improve rice quality
Rice is an important staple crop worldwide. The crop has been benefitted considerably from the developments in the field of genomics. For example, rice genome has been sequenced and is found to encode approximately 32,000 genes . However, the biological functions of more than half of these genes are yet to be determined . Novel genes in rice have been identified using gain and loss-of-function approaches. Genetic linkage and association analyses with genetic core collections and segregating populations have been employed to investigate the direct relationships between metabolic composition, genotypes, and phenotypes as representatives for agronomical traits. These strategies can also be applied for other crops and vegetables (Figure 1). In the following section, we shall describe some of these approaches.
5.1. Approaches to collate metabolite, phenotypic, and genotypic data: some examples in rice are as follows
5.1.1. Gain-of-function approach
Construction of the rice full-length (FL) cDNA collection (Oryza sativa L. ssp. japonica “Nipponbare”) was possible due to the development of the FOX hunting system (FL-cDNA overexpressor gene hunting system) . The FOX hunting system is unique, as it permits ectopic expression of any plant FL-cDNA library even in heterologous plant systems, therefore, allowing the functional analysis of genes. More than 30,000 transgenic Arabidopsis lines overexpressing rice FL-cDNAs, called “rice FOX Arabidopsis lines,” have been generated . Metabolic fingerprinting  and metabolic profiling  have been used with these FOX lines to identify functional genes in rice.
To screen a large number of rice FOX Arabidopsis lines, a nondestructive analytical method was developed using Fourier transform-near-infrared (FT-NIR) spectroscopy . Unlike MS techniques, FT-NIR analysis circumvents destructive preparation, and allows data acquisition within a very short span of time (<1 min). The authors analyzed approximately 3000 FOX seeds with FT-NIR to obtain their metabolite fingerprints. Assessment of the changes in the metabolite fingerprints of the re-transformants led the discovery of seven lines with altered metabolite fingerprints in seeds. Five of these seven lines have annotations for inserted FL-cDNAs. The association of the genes with biological processes highlighted the role of complex networks underlying metabolomic responses in plants.
A detailed metabolite composition can be obtained in non-targeted manner by using metabolite profiling based on gas chromatography-time-of-flight-MS (GC-TOF-MS), particularly for primary metabolites and intermediates of secondary metabolites . A set of 26 candidate lines for gene characterization were identified through surveying 350 rice FOX Arabidopsis lines with GC-TOF-MS. These candidate lines included a rice FOX Arabidopsis line that overexpressed the FL-cDNA of the rice Lateral Organ Boundaries (LOB) Domain (LBD)/Asymmetric Leaves2-like (ASL)LBD37/ASL39 (Os-LBD37/ASL39) gene, which showed significant changes in nitrogen metabolism in the mutants . The aerial parts of the rice FOX Arabidopsis plants exhibited hyponastic leaves and early flowering. The Arabidopsis At-LBD37/ASL39-overexpressor plants showed similar morphological leaf changes (i.e., hyponastic leaves), and had increased levels of amino acids and metabolites related to nitrogen metabolism. Subsequent profiling of metabolites and transcriptomes of the rice Os-LBD37/ASL39-overexpressing lines ascertained the same function of Os-LBD37/ASL39 in rice and Arabidopsis. The analysis revealed notable features in rice overexpressor plants including early heading, metabolite alterations (related to nitrogen metabolism), and advanced leaf senescence. These findings established a close association between Os-LBD37/ASL39 and nitrogen metabolism in rice.
Above studies suggest that the FOX hunting system can quickly and efficiently identify and characterize the genes from available cDNA libraries; the alterations that exert influence on metabolite profiles in crops and vegetables.
5.1.2. Loss-of-function approach
The Tos17 retrotransposon- and Ds-transposon-inserted mutant lines have served as loss-of-function resources for characterization of the novel genes in rice [37, 38]. Tos17-knockout lines characterized glutamine synthetase (GS), catalyzes the key step of ammonium assimilation. Tabuchi et al. (2005) used the Tos17-retrotransposon inserted lines to show that the three genes (OsGS1;1, OsGS1;2, and OsGS1;3) encoding cytosolic GS (GS1) in rice. The OsGS1;1 gene was critical for normal growth and grain filling . They further investigated the metabolomic changes and metabolite-to-metabolite correlations of the mutants by a GC-TOF-MS-based assay . In comparison to the wild-type rice, the mutants showed dramatic increase in the levels of sugars and sugar phosphates and reduced levels of amino acids and rice leaf TCA cycle intermediates. Changes in the metabolite profiles differed in root and leaf parts in the presence of ammonium. Interestingly, an overabundance was noted for nitrogen-containing secondary metabolites. The study uncovered new correlations between the over-accumulated metabolites and some primary metabolites in the mutant roots. These findings demonstrated OsGS1;1 playing crucial role in regulating the global metabolic network in rice plants grown using ammonium as the nitrogen source.
5.2. Association analysis between trait and metabolites
Modern crop-breeding practices have been highly successful in improving some important traits, for example, field performance and yield. However, genetic bottlenecks develop due to slow selection processes and narrow genetic base. Strategies to determine relationships between metabolic composition and genotypes and phenotypes in rice are discussed later.
5.2.1. Untargeted high-coverage metabolomic characterization of the rice diversity research set (RDRS)
The vast reservoir of rice seed banks provides a rich opportunity to identify genotypes possessing useful agronomical traits. However, large-scale characterization of this vast germplasm demands considerable time and resources. As a result, genetic core collections have been developed as a manageable representation of the genetic diversity. Examples include, the rice diversity research set (RDRS) comprising 67 varieties, created with the analysis of 332 varieties of O. sativa using restriction fragment length polymorphism (RFLP) marker . To investigate the direct relationship between metabolite  and phenotype in RDRS, untargeted high-coverage metabolomic characterization and constructed was performed, leading to the development of predictive metabolome-trait models using multivariate regression analysis . Combined datasets of rice kernels were obtained from four types of MS platforms: GC-TOF-MS for small compounds, including primary metabolites; ultra-pressure liquid chromatography-quadruple-TOF-MS (UPLC-Q-TOF-MS) for hydrophilic compounds; capillary electrophoresis-TOF-MS (CE-TOF-MS) for ionic compounds; and liquid chromatography-ion trap-TOF-MS (LC-IT-TOF-MS) for polar lipids. The study precisely defined a correlation between genetic diversity and metabolite abundance . After the removal of covariance between the trait data and the population membership, a multi-block-orthogonal projection was conducted for latent structures (MB-OPLS) regression analysis. Traits such as amylose/total starch ratio and ear emergence day can be predicted from the metabolic composition by using the MB-OPLS model. The model for the amylose/total starch ratio showed a tight and negative correlation with fatty acids and lysophosphatidylcholines (Figure 2). Evaluation of the model using an external set of RDRS samples, other rice varieties, and the two mutants, showed high-, middle-, and low-amylose/total starch ratios, respectively. The amylose/total starch ratio was found to be associated with metabolites in rice kernels of the cultivars. However, this association was not observed in the mutants. The two loss-of-function mutants-e1, a starch synthase IIIa (SSIIIa)-deficient mutant and the SSIIIa/starch branching enzyme (BE) double-knockout mutant 4019—showed a high amylose/total starch ratio [42, 44]. Examination of starch granules with scanning electron microscopy (SEM) showed that the starch granules of the mutants were loosely packed in rice kernels . Thus, fatty acids and lysophosphatidylcholines most likely play a role in packing normal starch granules into rice kernels.
5.2.2. mQTL analysis using back-cross inbred (BIL) lines
Matsuda et al. (2012) investigated 85 BILs generated by backcrossing O. sativa L. ssp. japonica “Sasanishiki” and O. sativa L. ssp. indica “Habataki” to find an association between genotype and metabolic composition . The genotypic data recorded on such mapping populations are useful for QTL mapping of various agronomical traits. The genotypic data of the BIL lines cover 12 rice chromosomes, and the genotype of each BIL line was analyzed with 236 RFLPs . A metabolite profiling using multi-MS-based pipelines yielded a metabolite profile dataset comprising 759 metabolite signals. Of these, 131 metabolites were identified or annotated. The lower heritability of the mQTL in yeast, mice, humans, and Arabidopsis than that of the expression QTL (eQTL) [47, 48] could be attributable to greater susceptibility of metabolite accumulation to environmental factors . Therefore, they evaluated the effects of heritable factors on the 759 metabolic traits. Although more than half of the metabolic traits showed relatively low broad-sense heritability (H2), high H2 values were observed for some of the secondary metabolites, such as lysophosphatidylcholines, oryzanols, and flavone glycosides. Notably, heritability profiles obtained in rice were not similar to those of tomato fruits and Arabidopsis leaves [49, 50]. The QTL mapping results identified 802 mQTL from 759 metabolic traits and suggested for a coordinated control of some metabolites, such as amino acids and triacylglycerols, through a mQTL hotspot on chromosome three. The extent of genetic control was determined for the annotated flavone glycoside level. The authors determined the structure of the flavone glycoside by using multi-step chromatography, MS, and NMR. The mQTL analysis provides faster and efficient breeding technique to dissect useful metabolic traits of both primary and secondary metabolites in rice.
6. Metabolomic approach to improve legume crops
Forage and grain legumes contribute 27% of the world gross primary crop. The grain legumes alone cater 33% of required human dietary protein, thus contributing to the global food security and environmental sustainability [51, 52]. Barring a few extensively investigated model legumes, metabolomics studies in other legumes remain limited. The studies in model legumes demonstrate a decrease in oxylipins as effect of rhizobial node factor (Nod) in Medicago  and metabolic adjustments of shoot constituent in salt tolerant Lotus species for its survival .
Stress conditions such as salinity and anoxia cause an accumulation of alanine, and its biosynthesis co-substrates such as glutamate and GABA, and succinate in soybean . Differential expression was also obtained for genes involved in nitrogen fixation and fermentation in root. Interestingly, a negative correlation was observed for amino acid derived from glycolysis and the TCA cycle during water logging; several TCA cycle enzymes were induced upon exposure to water logging . Likewise, a study on metabolic changes associated with flooding stress in soybean revealed a set of 81 mitochondria-associated metabolites, suggesting a boost in concentrations of metabolites involved in respiration and glycolysis such as, amino acids, NAD, and NADH coupled with the depletion of free adenosine triphosphate (ATP) . Under drought and salinity conditions, metabolite phenotyping of four different Mediterranean accessions of lentil suggested a decrease in intermediates of the TCA cycle and glycolytic pathway . Importantly, the study yielded metabolite markers for specific stress; such as threonate, asparagine/ornithine, and alanine/homoserine for NaCl, drought, and salinity, respectively. Another study aimed to assess the impact of water deficiency on Lupinus albus demonstrated that the plant stem served as a storage organ for sugars and amino acids . Importantly, tolerant plant accumulated high level of metabolites such as asparagine, proline, sucrose, and glucose in the stem stelar region . This suggests for reorganization of nitrogen and carbon metabolism pathways in plants in order to tolerate salinity stress. In soybean, consistent increase in pinitol (sugar alcohol, osmoprotectant) was reported in the tolerant plant at both normal and drought-stressed conditions . Similarly, accumulation of sucrose, free amino acids, and soluble proteins was observed in tolerant soybean in response to water stress .
7. Metabolomic approaches to evaluate GM crops
GM crops are now widely used worldwide . The International Service for the Acquisition of Agri-Biotech Applications (ISAAA) reported that in 2011, 160 million hectares of arable land was used to grow biotech crops, including GM crops (
Metabolism refers to the processes involved in maintaining life, such as the synthesis and breakdown of proteins, nucleic acids, and carbohydrates. Metabolomics offers a snapshot of the current biochemical status, including important nutritional and toxicological characteristics. Furthermore, the metabolite composition is reported to have close association with the organism’s phenotype. Hence, metabolomics is a useful tool for investigating the metabolic composition of GM crops. The application of metabolomic technology could generate a database of metabolites in both GM crops and traditional varieties. For instance, metabolomics approach was employed to assess the chemical composition of GM tomatoes in order to compare the modified crops with the traditional varieties . The authors used GM tomatoes overexpressing a foreign gene encoding miraculin, a glycoprotein found in tropical plants but normally absent in tomatoes . The MS-based multiple platforms detected 86% of the total chemical diversity in the tomato cultivars used in the study. Subsequently, statistical approach for “proof-of-safety” rather than “proof-of hazard” approach was used to evaluate “similarities” and “differences” between GM tomatoes and six traditional cultivars, including the control line Moneymaker. Results suggested that the GM tomatoes had a reproducible metabolic signature; moreover, more than 92% of the compounds showed an acceptable variation in both green and red stages of the tomato, highlighting striking similarity of the GM tomatoes with that of the control line Moneymaker in terms of their metabolite profiles.
Furthermore, a comparison was drawn for the metabolite profiles obtained from two independent experiments. The study determined the levels of the most commonly altered metabolites in the GM tomatoes, such as proline, 4-hydroxy-proline, spermidine, asparagine, arginine, serine, and inositol-1-phosphate, across all growth conditions. The expression of these metabolites was unaltered by genetic modification, not associated with the expression of foreign genes. This approach could be useful for evaluating GM crops for assessing their metabolomic equivalence with traditional crops.
8. Conclusions and future perspective
The growing attention that metabolomics is receiving in the field of plant research could be ascribed to plant’s ability to produce a vast array of metabolites, far greater than that produced by animals and microorganisms. Achieving a comprehensive coverage of metabolome analysis calls for multiparallel complementary technologies instead of relying on a single analytical technology. Increasing the annotation rate of unknown signals still poses a big challenge. The cooccurrence principle of transcripts and metabolites, particularly transcriptome co-expression network analysis, is powerful for decoding functions of genes not only in a model plants but also in crops and medicinal plants. The mQTL analysis along with scoring of gene expression and agronomical traits emerges as a promising technique to support crop breeding . In addition to expedite the development of improved cultivars, metabolomics plays a key role in the evaluation of GM crops.
Combining de novo transcriptome assembly  and metabolomic techniques enables us to adopt a systems biology approach to investigate genetic populations as both techniques do not require a reference genome sequence. These post-genomics tools and techniques can considerably shorten the time required for selection in plant breeding and accelerate the discovery of novel genes in crops, vegetables, and medicinal plants [67, 68]. In summary, systems biology, metabolomics, and other omics will play a key role in understanding plant systems and developing novel biotechnology applications for crop improvement.
The authors are grateful to DBT, India for funding to Computational Core for Plant Metabolomics (CCPM) project jointly at IIIT-Hyderabad and JNU-Delhi, (No. BT/PR14715/PBD/16/903/2010). Authors are grateful to thank Prof. Indira Ghosh for her consistent guidance and support. K.S and S.S are grateful to DBT for the fellowship support.