Open access peer-reviewed chapter

Metabolomics: Basic Principles and Strategies

By Sinem Nalbantoglu

Submitted: May 9th 2019Reviewed: July 12th 2019Published: August 7th 2019

DOI: 10.5772/intechopen.88563

Downloaded: 183


Metabolomics is the study of metabolome within cells, biofluids, tissues, or organisms to comprehensively identify and quantify all endogenous and exogenous low-molecular-weight (<1 kDa) small molecules/metabolites in a biological system in a high-throughput manner. Metabolomics has several applications in health and disease including precision/personalized medicine, single cell, epidemiologic population studies, metabolic phenotyping, and metabolome-wide association studies (MWAS), precision metabolomics, and in combination with other omics disciplines as integrative omics, biotechnology, and bioengineering. Mass spectrometry (MS)-based metabolomics/lipidomics provides a useful approach for both identification of disease-related metabolites in biofluids or tissue and also encompasses classification and/or characterization of disease or treatment-associated molecular patterns generated from metabolites. Here, in this review, we provide a brief overview of the current status of promising MS-based metabolomics strategies and their emerging roles, as well as possible challenges.


  • metabolomics
  • untargeted metabolomics
  • targeted metabolomics
  • omics
  • mass spectrometry

1. Introduction

Metabolomics is an evolving field to comprehensively identify and quantify all endogenous and exogenous low-molecular-weight (<1 kDa) small molecules/metabolites in a biological system in a high-throughput manner. The composition of these endogenous compounds is affected by the upstream influence of the proteome and genome as well as environmental factors, lifestyle factors, medication, and underlying disease. Metabolomics is reported as the reflection of the phenotype. The metabolome is downstream of the transcriptome and proteome and is considered to be complementary to genomics, transcriptomics, and proteomics. It has been reported that due to the close relation of metabolome to the genotype, physiology, and environment of an organism, genotype-phenotype as well as genotype-envirotype relationships could be successfully documented by metabolomics [1, 2, 3].

Metabolomics is the study of metabolome within cells, biofluids, tissues, or organisms and applied in molecular and personalized medicine involved in clinical chemistry, transplant monitoring, newborn screening, pharmacology, and toxicology. Metabolome can be defined as the small molecules and their interactions within a biological system which has been estimated as 3000–20,000 global metabolite profiles under a given genetic, nutritional, environmental conditions. Since the metabolome is the final downstream product, changes and interactions between gene expression, protein expression, and the environment are directly reflected in metabolome making it more physically and chemically complex than the other “omes.” The metabolome is closest to the phenotype among other omics approaches, and metabolomics best modulates and represents the molecular phenotype of health and disease [4]. It has been demonstrated that relations of genotype-genomics and phenotype-metabolomics refer to specific gene variations and resultant metabolite changes which ultimately give information about genetic epigenetic phenotypic changes [1, 5, 6, 7, 8]. In this regard, metabolomics is a brilliant source for biomarker discovery with advantages over other omics approaches.

An overview of first metabolomics experiments has demonstrated metabolite quantitation in biofluids which was first utilized in 1971 [9, 10]. After that, in the same year, the first use of the definition of “metabolic profiling” was observed [9, 11], while “metabolome” was first used in 1998 [9, 12]. In 1999, the term “metabonomics” was described as the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification [13].

Due to the complexity of the metabolome, a wide variety of chemically diverse compounds such as lipids, organic acids, carbohydrates, amino acids, nucleotides, and steroids, among others, have not yet been validated and reported totally; thus, we do not know the complete number of metabolites present in the human [14]. Human Metabolome Database (HMDB;, version 4.0, 2018) reported 114,098 (April 2019) metabolite entries including both water-soluble and lipid-soluble metabolites as well as metabolites that would be regarded as either abundant (>1 uM) or relatively rare (<1 nM). Those small molecules, which have been identified and experimentally confirmed in various human tissues and biofluids, have been suggested as only 20% of the total metabolome [15]. Additionally, 5702 protein sequences are linked to these metabolite entries. The database also contained listing normal and abnormal concentrations of different metabolites for 23 different biospecimens.

Metabolomics strategies cover two primary analysis platforms including “untargeted-discovery-global” and “targeted-validation-tandem” based on the objective of the study (Figure 1). In order to systematically identify and quantify metabolites from a biological sample and achieve comprehensive characterization of biomarker targets, the analysis considers both endometabolome and exometabolome. Untargeted discovery metabolomics has a hypothesis-generating manner and allows for full scanning of the metabolome, pattern identification, and “metabolic fingerprinting” for the global classification of phenotypes with interacting pathway interactions. Targeted metabolomics is hypothesis testing and generally performed for validation of an untargeted analysis. In the targeted approaches (tandem-MS/MS), using a known standard, a quantitative analysis is performed on specific small molecules/metabolites or perturbations along a metabolic pathway [9] also known as “biased or directed metabolomics” or “metabolic profiling.” Hypothesis-generating metabolomics covers different strategies as (i) nontargeted profiling, (ii) fingerprinting, and (iii) footprinting [16], while hypothesis-testing strategies are target analysis and diagnostic analysis. Nontargeted global metabolomics profiling refers to comprehensive metabolite/small molecule analysis. This analysis performs semiquantitative analysis with putative identifications of the detected features. Metabolomics fingerprint examines global snapshot of the intracellular metabolome enabling classification and screening, while metabolomics footprint analysis explores global snapshot of the extracellular fluid metabolome (secretions from cells or changes in metabolites consumed from the exometabolome). Quantification and identification of the features do not comply with fingerprint and footprint metabolomics. Metabolomics strategies for validation purposes refer to quantitative tandem/targeted analysis and diagnostic analysis of a known clinical associated compound/biomarker [16]. Untargeted and targeted approaches should be performed consecutively in order to achieve an accurate identification and absolute quantitation of the metabolites [9]. Here, in this review, we provide a brief overview of the current status of promising MS-based metabolomics strategies and their emerging roles, as well as possible challenges.

Figure 1.

Metabolomics workflow with main bulleted points.

2. Basic workflow of MS-based metabolomics

MS is used to identify and quantify metabolites even at very low concentrations (femtomolar to attomolar) with high resolution, sensitivity, and dynamic range [17]. MS-based analyses basically include sample preparation, extraction, capillary electrophoresis (CE), and/or chromatographic separation, introduction of sample for ionization process (charged molecules), and detection of possible metabolites on the basis of their mass-to-charge ratio (m/z). In this review, the main aspects of MS-based untargeted metabolomics strategies are briefly outlined below:

  1. Sample acquisition: Metabolome analysis can be performed in various biological samples including tissue, biofluids (blood, urine, feces, seminal fluid, saliva, bile, cerebrospinal fluid), and cell culture [18] (Table 1). Careful sampling, sample preparation and management, and sample biobanking/biorepositories together with sample labeling are critical and essential for an optimum, reproducible, and high-throughput analysis with high recovery, extraction, and enriched metabolite coverage.

  2. Sample preparation/extraction: During sample preparation different extraction solvents/methods are used for high recovery of both polar and nonpolar compounds based on nontargeted and targeted approaches on different biological samples involving tissue, blood, plasma, serum, cells, urine, etc. Basically applied approaches include optimized methanol-water-chloroform combinations to extract both hydrophilic and hydrophobic compounds. For high recovery of both hydrophilic and hydrophobic compounds, separate extraction applications give better results. During sample preparation after the centrifugation process, a biphasic mixture of the upper (aqueous) and lower (organic) layers is extracted separately. In the two sequential or two-phase extraction applications optimized for both polar and nonpolar metabolites such as lipids, an aqueous extraction using polar organic solvents (e.g., methanol or acetonitrile) mixed with water followed by organic extraction (lipid extraction) with dichloromethane or chloroform-methanol is carried out [16, 19]. The first one of the two-phase extractions involves aqueous solvent (e.g., methanol-water) followed by extraction with a nonpolar solvent (e.g., chloroform) of the centrifuged pellet.

  3. Separation: Chromatographic separation techniques including liquid chromatography (LC) and gas chromatography (GC) are used coupling to MS systems (GC-MS, HPLC-MS, UPLC-MS), while direct injection techniques include direct infusion MS [20] and direct analysis in real-time MS (DART-MS) [21, 22]. In addition, capillary electrophoresis (CE) coupled to MS systems (CE-MS) is an important technique for separation and profiling of polar metabolites in biological samples. Reversed-phase LC using C18 columns is used for separation of nonpolar compounds, while hydrophilic interaction chromatography (HILIC) is used for separation of polar compounds [23].

    Gas chromatography/mass spectrometry (GC/MS) is one of the widely used untargeted and targeted metabolomics platforms which offer high chromatographic resolution. In addition to other compounds, volatile organic compounds (VOCs) such as fatty acids and organic acids which are important biomarker candidates in biological samples can be successfully achieved by GC-MS. Analysis by GC-MS requires derivatization with reactions of alkylation, acylation, and silylation in order to increase detection or retention of the compound [24].

    Volatile organic compounds are important components of the metabolome and include metabolites such as alcohols, alkanes, aldehydes, furans, ketones, pyrroles, and terpenes. Volatilomics is a new field with adductomics into the metabolomics. For the extraction of VOCs, solvent-free sample preparation/extraction method “solid-phase microextraction (SPME)” is used, which enables extraction of organic compounds from gaseous, aqueous, and solid materials [24].

    LC/MS and GC-MS have different sensitivities for detecting metabolites with high recovery/coverage. Compared to GC/MS, a wide range of molecular features can be analyzed via LC-MS. GC-MS is capable of analyzing less polar biomolecules involving alkylsilyl derivatives, eicosanoids, essential oils, esters, perfumes, terpenes, waxes, volatiles, carotenoids, flavonoids, and lipids. LC-MS is capable of analyzing more polar biomolecules involving organic acids, organic amines, nucleosides, ionic species, nucleotides, and polyamines and does not require derivatization. Both LC-MS and GC-MS are able to analyze alcohols, alkaloids, amino acids, catecholamines, fatty acids, phenolics, polar organics, prostaglandins, and steroids [25].

  4. Ionization: After chromatographic separation, samples are pumped through MS capillary to obtain positive or negative electrically charged ions in gas phase. Introduction of heat and dry nitrogen in the MS cause the droplets to evaporate. Then the resultant evaporated droplets transfer the charge to the analytes and ionize them both in the positive and negative mode via charge transfer [26]. Polarity of the ionization/ion sources has great importance for avoiding metabolites’ losses. Based on the polarity of the molecule, applied ionization sources include electron ionization (EI), chemical ionization (CI), electrospray ionization source (ESI), atmospheric pressure chemical ionization (APCI), atmospheric pressure photo-ionization (APPI), and matrix assisted laser desorption ionization (MALDI) [27, 28, 29, 30, 31].

  5. Detection: High-resolution mass spectrum composed of mass-to-charge (m/z) ratios of fragment ions created by ionized biomolecules is detected by MS at sub-femtomole levels. Mass analyzers include time of flight (TOF), quadrupole time of flight (QTOF), quadrupole, ion trap, and orbitrap. For targeted metabolomics analysis, tandem or MS/MS is performed for validation of potentially discovered metabolites during untargeted analysis. For MS/MS, ion trap or triple quadrupole (QQQ) with multiple reaction monitoring (MRM) is generally used with high sensitivity, mass resolution, and accuracy (<1 ppm mass error) [9].

  6. Data analysis and metabolite identification: The large amounts of complex raw data involving specific metabolic signals are extracted from MS and analyzed in specialized software to properly interpret the data and identify the metabolite of interest. Commercially available and free software bioinformatic analysis tools automatically perform processing of peak selection, assessment, and relative quantitation. Raw data signal spectrum preprocessing includes background spectral filtering (noise elimination), retention time correction, appropriate peak assignment for the same compound (identification of matching m/z and assigning adducts appropriately), peak detection, peak alignment (matching peaks across multiple samples) and peak normalization (adjusting peak intensities and reducing analytical drift), and chromatogram alignment. Following this, data preparation includes data integrity checking, data normalization, and compound name identification using the univariate, multivariate, clustering, and classification statistical analyses [32]. Following data processing, data interpretation, and metabolite identification from mass spectrum can be performed with the following: functional interpretation, enrichment analysis, pathway analysis, and metabolite pathway networks mapping. Commonly used tools include XCMS [33], Metaboanalyst [34], Progenesis [35], MetaCore [36], and 3Omics [37], with different analysis capabilities. The software processes raw mass spectrum data, perform various statistical analyses to find significantly altered ions/features, and for metabolite identifications connect to the metabolite database search such as Human Metabolome Database (HMDB) [14], Metabolite and Tandem MS Database (METLIN) [38], LIPID MAPS [39], Madison Metabolomics Consortium Database (MMCD) [40], BiGG [41], SetupX [42], KNApSAcK [43], and MetaboLights [44]. Compound or compound-specific databases include PubChem [45], Chemical Entities of Biological Interest (ChEBI) [46], ChemSpider [47], The KEGG GLYCAN [48], KEGG COMPOUND [49], and In Vivo/In Silico Metabolites Database (IIMDB) [50]. Metabolic pathway databases include Kyoto Encyclopedia of Genes and Genomes (KEGG) [51], BiKEGG [52], KEGG PATHWAY [53], MetaCyc [54], BioCyc [54], Model SEED [55], Reactome [56], and Ingenuity Pathway Analysis (IPA) [57].

Types of biospecimenSample acquisition and preparation
  • Mechanical and nonmechanical homogenization can be performed for tissues, cells, and other biological samples

  • Mechanical homogenization include homogenizer, ultrasound, microwave, manual grinding, ball mill, and grinding in a liquid nitrogen-cooled mortar and pestle

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

Suspension-cultured mammalian cellsCells (intracellular-fingerprint-metabolite profiling)
  • Quenching with ice-cold 50 mM ammonium bicarbonate + methanol

  • Centrifuge at 1000×g for 1 min at −20°C

  • Removal of the media/quenching solution from the cells

  • Snap freeze in liquid nitrogen

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

Cell medium (extracellular-footprint-metabolite profiling)
  • Centrifuge the media at 500×g for 5 min (remove cells)

  • Remove supernatant

  • Snap freeze in liquid nitrogen

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

  • Blood samples should be collected into serum separator tubes and incubate 30 min and no longer than 60 min for clotting procedure on ice instead of room temperature to minimize residual metabolic activity

  • At the end of the clotting time, the blood sample is centrifuged or 20 min at 1100–1300×g at ambient temperature

  • The serum visible in the upper layer of the tube as supernatant is collected and stored at −80°C, and freeze-thaw cycles are avoided

  • Blood samples collected into heparin, citrate, or EDTA-containing tubes under fasting conditions

  • Centrifuge at 13,000×g, 15 min, 4°C

  • Supernatant isolation and removal

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

  • Various coagulants such as heparin, citrate, or EDTA represent different retention time peaks

Cerebrospinal fluid (CSF)
  • CSF sampling via lumbar puncture

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

  • To obtain highly resolved chromatograms, careful handling is essential in case of blood-contaminated CSF sample

  • Sodium azide addition to the sample as bacteriostatic agent during sample storage

  • 0.2 μm filtration for each sample in order to avoid any particulates

  • Centrifuge at 1000–3000 rcf for 5 min

  • Removing supernatant

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

  • 3 mL saliva samples collected into sample tubes under fasting conditions

  • Aliquoting

  • Storage at −80°C

  • Sweat secretion is stimulated by exercise, heat, or chemicals such as pilocarpine

  • The skin area is cleaned with ethanol and then with distilled water

  • The sweat is collected by using a micropipette (from 20 to 200 μL at least 100 μL)

  • The minimum sweat rate demanded to obtain a valid sweat sampling is 1 g/m2 per min

  • The use of deodorants, perfumes, and cosmetics was excluded at least 1 day prior to sweat collection

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

  • A single spot 1–2 g of feces

  • Storage at −80°C

  • Exhaled breath condensate (EBC) upon 8 h of fasting including smoking is performed

  • Collecting time for EBC is 20 min, producing approximately 1 mL condensate sample

  • Commercial collection samplers, bags, and devices are used

  • Compatible with GC, GC-MS

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

Seminal plasma
  • Semen is centrifuged at 700×g, 4°C, 10 min

  • Supernatant (seminal plasma) is separated

  • Supernatant seminal plasma is recentrifuged at 10,000×g, 60 min, 4°C

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

  • 2 mL gallbladder bile is collected intraoperatively

  • Aliquoting

  • Storage at −80°C

  • Freeze-thaw cycles avoided

Table 1.

Systematic collection and storage of human samples for metabolomics/lipidomics strategies.

3. Challenges and affecting factors

Though there are extensive tools and databases for analysis and identification of metabolites, challenges remain in the field of data analysis/integration, pathway analysis, and metabolite identification in untargeted metabolomics due to the high-throughput heterogeneous omics data which essentially requires improved bioinformatics and computational techniques to comprehensively evaluate the metabolomic profiles and completion of the human metabolome [32, 58, 59, 60, 61, 62].

Chromatographic resolution, absolute MS signals, and compound identification can be affected during ionization process due to polarity, ion sources, ion suppression, flow rates, and MS vacuum, sample preparation strategies, purity and temperature (prechilled) of reagents and solvents, different laboratory staff and techniques in different analysis days (typical sources of variance), different mass analyzers, chromatographic separation columns and compositions that cause forming of different fragments of the same molecule, frequent detection of the most abundant molecules, ion suppression, and day-day variation. Another challenge is the existence of isomers with identical masses and highly similar spectra which complicates distinguishing and differentiation during metabolite assignment to spectrum features [63, 64].

During the analytical process, a plot of the area internal standard (IS) response ratio vs. analyte concentration of each sample within a batch is performed to obtain calibration curves, which are needed to control for accuracy and reproducibility of the system. Performance monitoring in terms of plots of absolute MS signals, retention time points, and mass patterns/chromatographic peak shape of the analytes/ISs are required for maintaining sensitivity, integrity, and robustness of the analytical results. It has been reported that high system pressure and short columns with small particle sizes (<2 μm) lead to better signal:noise ratios than columns with larger particle sizes. Small particle-sized columns also affect the fastness of screening feature of MS [65, 66, 67, 68].

Quality control (QC) sample plots of analytical parameters vs. retention times and peak shapes during calibration, sample preparation, and the analysis belonging to each batch as a quality check are essential for optimum standardization. All QC samples of each batch collected from each sample to create the pooled QC should be identical to each other and placed 1 in 5–1 in 10 samples in the analyses [16]. During sample preparation, a general contamination due to matrix compounds occurs and degrades the analytical system. In order to eradicate those possibilities, monitorization and randomization using technical QC and pooled QC samples which involve small aliquots of each biological sample, batch samples, and blanks considering peak shapes and retention times should be performed. Using a set of ISs, control samples, and blanks, performance of the method and monitorization have to be performed. In addition, randomization has to be performed on both sample preparation and the analysis order of system.

In order to eliminate or minimize methodological challenges, systematic errors, and bias factor due to analytical drift such as batch effect and obtain analytical sensitivity and specificity, reproducibility, accurate quantitation, and high recovery + low metabolite losses, each analysis should be carried out specifically under conditions of method optimization and optimum performance monitoring. Furthermore, using quality control samples and ISs during sample preparation, injection, extraction, fractionation, separation, detection, and normalization periods is essential. Generally, stable isotopes are used as the most identical ISs to compounds of interest in order to obtain accurate identification and to eliminate metabolite losses and ion suppression.

4. Metabolomics in health and disease

Metabolomics has several applications in health and disease including precision/personalized medicine, single cell, epidemiologic population studies, metabolic phenotyping and metabolome-wide association studies (MWAS), precision metabolomics, and in combination with other omics disciplines as integrative omics [69]. Single-cell metabolomics and single-cell lipidomics technologies allow high-dimensional characterizations of individual cells, disease heterogeneity and complexity, identification, expression and abundance of disease-associated small molecules, metabolites, and by LC-MS/MS, GC-MS/MS, and live single-cell mass spectrometry (LSCMS) [70]. Imaging MS analysis of human breast cancer samples at the single-cell level revealed cell-cell interactions and tumor heterogeneity [71].

Clinical biomarkers and different metabotypes of disease severity correlated to exposures [72], and biological outcomes [73] have been studied and identified through metabolomics profiling, MWAS, and metabolomics fingerprinting and footprinting techniques in individuals and populations which will enable precision medicine and public healthcare [74, 75, 76, 77, 78, 79]. Studies moving from genome-wide association studies (GWAS) to metabolome-wide association studies (MWAS) were first described in 2008 as “environmental and genomic influences to investigate the connections between phenotype variation and disease risk factors” [78, 79, 80]. Rattray and colleagues suggested exposotypes on single individual phenotypes and populations, in epidemiologic research, and disease risk using metabolome-wide association studies and impacts on precision medicine [79].

In conclusion, MS-based metabolomics/lipidomics provides a useful approach for both identification of disease-related metabolites in biofluids or tissue and also encompasses classification and/or characterization of disease or treatment-associated molecular patterns generated from metabolites [81, 82].

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Sinem Nalbantoglu (August 7th 2019). Metabolomics: Basic Principles and Strategies, Molecular Medicine, Sinem Nalbantoglu and Hakima Amri, IntechOpen, DOI: 10.5772/intechopen.88563. Available from:

chapter statistics

183total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Introductory Chapter: Insight into the OMICS Technologies and Molecular Medicine

By Sinem Nalbantoglu and Abdullah Karadag

Related Book

First chapter

Encapsulation and Surface Engineering of Pancreatic Islets: Advances and Challenges

By Veronika Kozlovskaya, Oleksandra Zavgorodnya and Eugenia Kharlampieva

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us