Metabolomics: Basic Principles and Strategies

Metabolomics is the study of metabolome within cells, biofluids, tissues, or organisms to comprehensively identify and quantify all endogenous and exogenous low-molecular-weight ( < 1 kDa) small molecules/metabolites in a biological system in a high-throughput manner. Metabolomics has several applications in health and disease including precision/personalized medicine, single cell, epidemiologic population studies, metabolic phenotyping, and metabolome-wide association studies (MWAS), precision metabolomics, and in combination with other omics disciplines as integrative omics, biotechnology, and bioengineering. Mass spectrometry (MS)-based metabolomics/lipidomics provides a useful approach for both identification of disease-related metabolites in biofluids or tissue and also encompasses classification and/or characterization of disease or treatment-associated molecular patterns generated from metabolites. Here, in this review, we provide a brief overview of the current status of promising MS-based metabolomics strategies and their emerging roles, as well as possible challenges.


Introduction
Metabolomics is an evolving field to comprehensively identify and quantify all endogenous and exogenous low-molecular-weight (<1 kDa) small molecules/ metabolites in a biological system in a high-throughput manner. The composition of these endogenous compounds is affected by the upstream influence of the proteome and genome as well as environmental factors, lifestyle factors, medication, and underlying disease. Metabolomics is reported as the reflection of the phenotype. The metabolome is downstream of the transcriptome and proteome and is considered to be complementary to genomics, transcriptomics, and proteomics. It has been reported that due to the close relation of metabolome to the genotype, physiology, and environment of an organism, genotype-phenotype as well as genotypeenvirotype relationships could be successfully documented by metabolomics [1][2][3].
Metabolomics is the study of metabolome within cells, biofluids, tissues, or organisms and applied in molecular and personalized medicine involved in clinical chemistry, transplant monitoring, newborn screening, pharmacology, and toxicology. Metabolome can be defined as the small molecules and their interactions within a biological system which has been estimated as 3000-20,000 global metabolite profiles under a given genetic, nutritional, environmental conditions. Since the metabolome is the final downstream product, changes and interactions between gene expression, protein expression, and the environment are directly reflected in metabolome making it more physically and chemically complex than the other "omes." The metabolome is closest to the phenotype among other omics approaches, and metabolomics best modulates and represents the molecular phenotype of health and disease [4]. It has been demonstrated that relations of genotype-genomics and phenotype-metabolomics refer to specific gene variations and resultant metabolite changes which ultimately give information about genetic epigenetic phenotypic changes [1,[5][6][7][8]. In this regard, metabolomics is a brilliant source for biomarker discovery with advantages over other omics approaches.
An overview of first metabolomics experiments has demonstrated metabolite quantitation in biofluids which was first utilized in 1971 [9,10]. After that, in the same year, the first use of the definition of "metabolic profiling" was observed [9,11], while "metabolome" was first used in 1998 [9,12]. In 1999, the term "metabonomics" was described as the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification [13].
Due to the complexity of the metabolome, a wide variety of chemically diverse compounds such as lipids, organic acids, carbohydrates, amino acids, nucleotides, and steroids, among others, have not yet been validated and reported totally; thus, we do not know the complete number of metabolites present in the human [14]. Human Metabolome Database (HMDB; http://www.hmdb.ca, version 4.0, 2018) reported 114,098 (April 2019) metabolite entries including both water-soluble and lipid-soluble metabolites as well as metabolites that would be regarded as either abundant (>1 uM) or relatively rare (<1 nM). Those small molecules, which have been identified and experimentally confirmed in various human tissues and biofluids, have been suggested as only 20% of the total metabolome [15]. Additionally, 5702 protein sequences are linked to these metabolite entries. The database also contained listing normal and abnormal concentrations of different metabolites for 23 different biospecimens.
Metabolomics strategies cover two primary analysis platforms including "untargeted-discovery-global" and "targeted-validation-tandem" based on the objective of the study ( Figure 1). In order to systematically identify and quantify metabolites from a biological sample and achieve comprehensive characterization of biomarker targets, the analysis considers both endometabolome and exometabolome. Untargeted discovery metabolomics has a hypothesis-generating manner and allows for full scanning of the metabolome, pattern identification, and "metabolic fingerprinting" for the global classification of phenotypes with interacting pathway interactions. Targeted metabolomics is hypothesis testing and generally performed for validation of an untargeted analysis. In the targeted approaches (tandem-MS/MS), using a known standard, a quantitative analysis is performed on specific small molecules/metabolites or perturbations along a metabolic pathway [9] also known as "biased or directed metabolomics" or "metabolic profiling." Hypothesis-generating metabolomics covers different strategies as (i) nontargeted profiling, (ii) fingerprinting, and (iii) footprinting [16], while hypothesis-testing strategies are target analysis and diagnostic analysis. Nontargeted global metabolomics profiling refers to comprehensive metabolite/ small molecule analysis. This analysis performs semiquantitative analysis with putative identifications of the detected features. Metabolomics fingerprint examines global snapshot of the intracellular metabolome enabling classification and screening, while metabolomics footprint analysis explores global snapshot of the extracellular fluid metabolome (secretions from cells or changes in metabolites consumed from the exometabolome). Quantification and identification of the features do not comply with fingerprint and footprint metabolomics. Metabolomics strategies for validation purposes refer to quantitative tandem/targeted analysis and diagnostic analysis of a known clinical associated compound/biomarker [16]. Untargeted and targeted approaches should be performed consecutively in order to achieve an accurate identification and absolute quantitation of the metabolites [9]. Here, in this review, we provide a brief overview of the current status of promising MS-based metabolomics strategies and their emerging roles, as well as possible challenges.

Basic workflow of MS-based metabolomics
MS is used to identify and quantify metabolites even at very low concentrations (femtomolar to attomolar) with high resolution, sensitivity, and dynamic range [17]. MS-based analyses basically include sample preparation, extraction, capillary electrophoresis (CE), and/or chromatographic separation, introduction of sample for ionization process (charged molecules), and detection of possible metabolites on the basis of their mass-to-charge ratio (m/z). In this review, the main aspects of MS-based untargeted metabolomics strategies are briefly outlined below: 1. Sample acquisition: Metabolome analysis can be performed in various biological samples including tissue, biofluids (blood, urine, feces, seminal fluid, saliva, bile, cerebrospinal fluid), and cell culture [18] ( Table 1). Careful sampling, sample preparation and management, and sample biobanking/ biorepositories together with sample labeling are critical and essential for an  different biological samples involving tissue, blood, plasma, serum, cells, urine, etc. Basically applied approaches include optimized methanol-waterchloroform combinations to extract both hydrophilic and hydrophobic compounds. For high recovery of both hydrophilic and hydrophobic compounds, separate extraction applications give better results. During sample preparation after the centrifugation process, a biphasic mixture of the upper (aqueous) and lower (organic) layers is extracted separately. In the two sequential or two-phase extraction applications optimized for both polar and nonpolar metabolites such as lipids, an aqueous extraction using polar organic solvents (e.g., methanol or acetonitrile) mixed with water followed by organic extraction (lipid extraction) with dichloromethane or chloroformmethanol is carried out [16,19]. The first one of the two-phase extractions involves aqueous solvent (e.g., methanol-water) followed by extraction with a nonpolar solvent (e.g., chloroform) of the centrifuged pellet. Volatile organic compounds are important components of the metabolome and include metabolites such as alcohols, alkanes, aldehydes, furans, ketones, pyrroles, and terpenes. Volatilomics is a new field with adductomics into the metabolomics. For the extraction of VOCs, solvent-free sample preparation/extraction method "solid-phase microextraction (SPME)" is used, which enables extraction of organic compounds from gaseous, aqueous, and solid materials [24].
LC/MS and GC-MS have different sensitivities for detecting metabolites with high recovery/coverage. Compared to GC/MS, a wide range of molecular features can be analyzed via LC-MS. GC-MS is capable of analyzing less polar biomolecules involving alkylsilyl derivatives, eicosanoids, essential oils, esters, perfumes, terpenes, waxes, volatiles, carotenoids, flavonoids, and lipids. LC-MS is capable of analyzing more polar biomolecules involving organic acids, organic amines, nucleosides, ionic species, nucleotides, and polyamines and does not require derivatization. Both LC-MS and GC-MS are able to analyze alcohols, alkaloids, amino acids, catecholamines, fatty acids, phenolics, polar organics, prostaglandins, and steroids [25]. 6. Data analysis and metabolite identification: The large amounts of complex raw data involving specific metabolic signals are extracted from MS and analyzed in specialized software to properly interpret the data and identify the metabolite of interest. Commercially available and free software bioinformatic analysis tools automatically perform processing of peak selection, assessment, and relative quantitation. Raw data signal spectrum preprocessing includes background spectral filtering (noise elimination), retention time correction, appropriate peak assignment for the same compound (identification of matching m/z and assigning adducts appropriately), peak detection, peak alignment (matching peaks across multiple samples) and peak normalization (adjusting peak intensities and reducing analytical drift), and chromatogram alignment. Following this, data preparation includes data integrity checking, data normalization, and compound name identification using the univariate, multivariate, clustering, and classification statistical analyses [32]. Following data processing, data interpretation, and metabolite identification from mass spectrum can be performed with the following: functional interpretation, enrichment analysis, pathway analysis, and metabolite pathway networks mapping. Commonly used tools include XCMS [33], Metaboanalyst [34], Progenesis [35], MetaCore [36], and 3Omics [37], with different analysis capabilities. The software processes raw mass spectrum data, perform various statistical analyses to find significantly altered ions/features, and for metabolite identifications connect to the metabolite database search such as Human Metabolome Database (HMDB) [14],  [54], BioCyc [54], Model SEED [55], Reactome [56], and Ingenuity Pathway Analysis (IPA) [57].

Challenges and affecting factors
Though there are extensive tools and databases for analysis and identification of metabolites, challenges remain in the field of data analysis/integration, pathway analysis, and metabolite identification in untargeted metabolomics due to the highthroughput heterogeneous omics data which essentially requires improved bioinformatics and computational techniques to comprehensively evaluate the metabolomic profiles and completion of the human metabolome [32,[58][59][60][61][62].
Chromatographic resolution, absolute MS signals, and compound identification can be affected during ionization process due to polarity, ion sources, ion suppression, flow rates, and MS vacuum, sample preparation strategies, purity and temperature (prechilled) of reagents and solvents, different laboratory staff and techniques in different analysis days (typical sources of variance), different mass analyzers, chromatographic separation columns and compositions that cause forming of different fragments of the same molecule, frequent detection of the most abundant molecules, ion suppression, and day-day variation. Another challenge is the existence of isomers with identical masses and highly similar spectra which complicates distinguishing and differentiation during metabolite assignment to spectrum features [63,64].
During the analytical process, a plot of the area internal standard (IS) response ratio vs. analyte concentration of each sample within a batch is performed to obtain calibration curves, which are needed to control for accuracy and reproducibility of the system. Performance monitoring in terms of plots of absolute MS signals, retention time points, and mass patterns/chromatographic peak shape of the analytes/ISs are required for maintaining sensitivity, integrity, and robustness of the analytical results. It has been reported that high system pressure and short columns with small particle sizes (<2 μm) lead to better signal:noise ratios than columns with larger particle sizes. Small particle-sized columns also affect the fastness of screening feature of MS [65][66][67][68].
Quality control (QC) sample plots of analytical parameters vs. retention times and peak shapes during calibration, sample preparation, and the analysis belonging to each batch as a quality check are essential for optimum standardization. All QC samples of each batch collected from each sample to create the pooled QC should be identical to each other and placed 1 in 5-1 in 10 samples in the analyses [16]. During sample preparation, a general contamination due to matrix compounds occurs and degrades the analytical system. In order to eradicate those possibilities, monitorization and randomization using technical QC and pooled QC samples which involve small aliquots of each biological sample, batch samples, and blanks considering peak shapes and retention times should be performed. Using a set of ISs, control samples, and blanks, performance of the method and monitorization have to be performed. In addition, randomization has to be performed on both sample preparation and the analysis order of system. In order to eliminate or minimize methodological challenges, systematic errors, and bias factor due to analytical drift such as batch effect and obtain analytical sensitivity and specificity, reproducibility, accurate quantitation, and high recovery + low metabolite losses, each analysis should be carried out specifically under conditions of method optimization and optimum performance monitoring. Furthermore, using quality control samples and ISs during sample preparation, injection, extraction, fractionation, separation, detection, and normalization periods is essential. Generally, stable isotopes are used as the most identical ISs to compounds of interest in order to obtain accurate identification and to eliminate metabolite losses and ion suppression.

Metabolomics in health and disease
Metabolomics has several applications in health and disease including precision/ personalized medicine, single cell, epidemiologic population studies, metabolic phenotyping and metabolome-wide association studies (MWAS), precision metabolomics, and in combination with other omics disciplines as integrative omics [69]. Single-cell metabolomics and single-cell lipidomics technologies allow highdimensional characterizations of individual cells, disease heterogeneity and complexity, identification, expression and abundance of disease-associated small molecules, metabolites, and by LC-MS/MS, GC-MS/MS, and live single-cell mass spectrometry (LSCMS) [70]. Imaging MS analysis of human breast cancer samples at the single-cell level revealed cell-cell interactions and tumor heterogeneity [71].
Clinical biomarkers and different metabotypes of disease severity correlated to exposures [72], and biological outcomes [73] have been studied and identified through metabolomics profiling, MWAS, and metabolomics fingerprinting and footprinting techniques in individuals and populations which will enable precision medicine and public healthcare [74][75][76][77][78][79]. Studies moving from genome-wide association studies (GWAS) to metabolome-wide association studies (MWAS) were first described in 2008 as "environmental and genomic influences to investigate the connections between phenotype variation and disease risk factors" [78][79][80]. Rattray and colleagues suggested exposotypes on single individual phenotypes and populations, in epidemiologic research, and disease risk using metabolome-wide association studies and impacts on precision medicine [79].
In conclusion, MS-based metabolomics/lipidomics provides a useful approach for both identification of disease-related metabolites in biofluids or tissue and also encompasses classification and/or characterization of disease or treatmentassociated molecular patterns generated from metabolites [81,82].