Volatilomics studies the emission of volatile compounds from living organisms like plants, flowers, animals, fruits, and microorganisms, using metabolomics tools to characterize the analytes. This is a complex process that involves several steps like sample preparation, extraction, instrumental analysis, and data processing. In this chapter, we provide balanced coverage of the different theoretical and practical aspects of the study of the volatilome. Static and dynamic headspace techniques for volatile capture will be discussed. Then, the main techniques for volatilome profiling, separation, and detection will be addressed, emphasizing gas chromatographic separation, mass spectrometry detection, and non-separative techniques using mass spectrometry. Finally, the whole volatilome data pre-processing and multivariate statistics for data interpretation will be introduced. We hope that this chapter can provide the reader with an overview of the research process in the study of volatile organic compounds (VOCs) and serve as a guide in the development of future volatilomics studies.
- volatile organic compounds (VOCs)
- microbial volatile organic compounds (mVOCs)
- static headspace
- dynamic headspace
- metabolomics workflow
Volatilomics indicates the qualitative and quantitative study of the volatilome, defined as the complex blend of volatile organic compounds (VOCs) originating from different biosynthetic pathways and emitted by living organisms . VOCs are small molecules (below 500 Da), with hydrophobic character, low boiling points, and high vapor pressure at ambient temperature. Unconjugated volatiles can freely diffuse across membranes to be released from flowers, fruits, and vegetative tissues into the atmosphere and from roots into the soil to be perceived at short and long-distance. Therefore, plants and animals use VOCs for chemical communication with the surrounding ecosystem, and plants also use them as attractors for pollinators and defense against herbivory and biotic and abiotic stress [2, 3, 4, 5].
The study of VOCs of plants has focused not only on the qualitative and quantitative composition of the volatile fraction but on the bioactive compounds as well as flavors and fragrances [6, 7]. Similarly, the understanding of fruits’ sensorial attributes is of great interest as quality control, as well as in the determination of origin mark, and the performance of ecological studies aimed at the establishment of the relationship between the ripening stage and the incidence of fruit diseases for insect or microorganism attack [8, 9, 10].
Microorganisms produce a plethora of important microbial volatile organic compounds (mVOCs), that play an essential role in inter- and intra-kingdom connections. The study of mVOCs has allowed, for example, to detect terpenes, compounds normally associated with plants, also in fungi and bacteria . Also, these compounds are related to ecological interactions between living organisms found in the soil, including the rhizosphere .
In addition, several studies of VOCs from animals not only have allowed decoding the signal of the animal chemical communication but also have demonstrated the potential use of that knowledge in early disease’s diagnostics. For example, recent studies have shown novel practice for the detection of biomarkers to identify the intoxication using unusual biological fluids like ear wax, being fast, economic, and noninvasive bioanalysis, with minimal sample preparation and very versatile to identify the first signals of intoxication [13, 14].
Differently to the genomes, the volatilome changes continuously across time, and its composition depends on external and internal factors, such as the environmental conditions, and/or the physiological state . Therefore, the study of the volatilome is not a simple task and the researchers in this area entail multiple challenges derived from the chemical complexity of the samples and the superposition of VOCs signals as proper of the ecosystems. Thus, sensitive yet unbiased methodologies are needed to provide researchers with comprehensive and accurate representations of a plant species’ volatile metabolome.
However, current methodologies are limited in their ability to isolate, and even more critically to identify, many of the compounds present in each sample. In volatile metabolomics, the emitted metabolites are already isolated from tissues, they need to be temporarily trapped, and eventually preconcentrated, in a way that allows them to be released unadulterated for separation and identification.
A variety of technologies have been developed. In these methods, the sample of interest is enclosed in a collection chamber and the released volatiles present in the airspace surrounding the sample, headspace (HS), are trapped onto an adsorbent. And are subsequently analyzed by gas chromatography in combination with mass spectrometry (GC–MS) as the method of choice for volatilomics.
Hence, in the next sections of this chapter, we will provide an overview of the volatilome study process, including the main practical and theoretical aspects of volatiles capture, sample preparation, and the main analytical techniques employed to monitor VOCs, together with the chemoinformatics tools used for volatilome dereplication, elucidation, annotation, and interpretation of data.
2. Volatiles collection
Sample acquisition in volatilomics experiments requires consistency, therefore due to the high variability of chemical structures, concentrations levels, sample types, and physiological variations, other variables different than metabolites (addressed as meta-variables from now on) should be controlled or at least carefully monitored in order to evaluate their effect on the study outcome. Some important variables that should be taken into account include replicate number, taxonomic identification, geographic location, phenotypic or phylogenetic variant, sample weight, phenotypic characteristics, sex, developmental stage, health status, collection date, and time. Photographs should be taken. A useful reference for registering meta-variables is the ReDU Sample Information Template. (https://docs.google.com/spreadsheets/d/1v71bnUd8fiXX51zuZIUAvYETWmpwFQj-M3mu4CNsHBU/edit?usp=sharing)  build by the collaborative Global Natural Products Social Networking (GNPS) (https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp)  where researchers can add new meta-variables and share their data in an open-source and collaborative environment.
The plant volatilome is defined as the complex blend of essential oils (EOs) and VOCs fed by different biosynthetic pathways and emitted by plants, constitutively and/or after induction, as a defense strategy against biotic and abiotic stress. Plants have a vast diversity in their range of metabolites and their concentrations, as there are hundreds of thousands of metabolites in different categories. As such, there is no single analytical technique that has the capability of extracting and detecting the whole metabolome .
Plant volatile emissions are linked to the physiological status of the emitter, therefore special care must be taken to control the plant-growing environment as well as all variables concerning the developmental stage of the plant to limit unwanted fluctuations in metabolism that might affect collected. These include the time of day, photoperiod, temperature, humidity, water conditions, collection site altitude, plant age, climate, and soil type so that a careful experimental design is recommended. Whenever possible, growth chambers must be used for plant cultivation and volatile collection [19, 20]. EOs and VOCs can be extracted and analyzed from both fresh and dried plant materials. When using fresh material, particular attention must be paid to the health status of plants, since microbial and other infections may alter metabolites production. Plants must not show necrotic areas and be at the same developmental stage if comparative analyses are needed. Since the content of water may vary, it is a good practice to use some of the fresh material to calculate the dry matter percentage .
Since volatile emissions from many plant species vary with respect to the time of day, and different organs in the plant are known to produce and/or accumulate different profiles of secondary metabolites, collection strategies should consider volatile sampling over an extended period of time and from the investigated organ or entire plant, to prevent unintentional exclusion of volatile components in the sampled mixture. Also, when running VOCs analyses from living plants it must be remembered that rooted plants in pots respond differently than cuttings, and that soil in pots may contain microorganisms that can produce VOCs [22, 23]. Once a plant part is collected, at least two herbarium samples should be prepared and identified or authenticated by a taxonomist. One of these voucher specimens should be deposited in a local national herbarium. A card with details of the place, altitude, environment, and photographs should be attached to the herbarium sample, in case a recollection of the plant material is necessary. Although depositing herbarium samples is a basic step in performing phytochemical investigation, many researchers in the past neglected this step and thus were unable to reproduce their work [23, 24, 25].
Living flowers change their volatile profile in a continuous way that depends on intrinsic and extrinsic factors. Once cut, flowers undergo rapid deterioration and loose volatiles. Flower volatiles allow discrimination between different plants and attract insects for pollination when they are released. The amount of emission is not uniform through time, with some differences between diurnal and nocturnal emission levels, and between reproduction phases. The volatile compounds emitted by flowers are mainly aliphatics, terpenoids, benzenoids, and phenylpropanoids. Flower volatiles require special methods for their isolation with preconcentration and can be obtained from the air surrounding the living or excised flower, or from the flower tissues themselves. The selected extraction technique determines the composition of the isolated volatiles mixture [26, 27].
Fruits are very complex samples, rich in a great number of different classes of metabolites, including volatile, semi-volatile, and no volatile compounds. The flavor is one of the most important characteristics to value the quality of fruit. Volatile and semi-volatile compounds usually are responsible for aroma fruit, and their study has conducive to identify both positive and negative sensory attributes . VOCs are produced in trace amounts, and although they are easily perceptible by the human nose, their sampling and monitoring can be challenging at an analytical level . The volatile fraction of fruits is composed of hundreds of different chemical substances that can vary according to the type of fruit, but the emitted compounds can be grouped according to the chemical function mainly into esters, alcohols, aldehydes, ketones, lactones, and terpenoids . Moreover, VOCs emitted by fruit depend on the production conditions (cultivars, state of maturity, post-harvest treatment, and storage) the sample format (whole fruit, sliced, wet, dry), and the type of analysis (in-field or in-lab). Capturing volatiles in-situ is a challenge, as small amounts of VOCs are released and diffuse in a large volume of air, which requires highly efficient sampling techniques to capture them. Solid-phase microextraction (SPME) and solid-phase extraction (SPE) are usually the most profitable techniques for the capture of fruit volatiles in-situ. Once the volatile compounds are retained in an adsorbent material, their storage and transport are facilitated. On the other hand, in laboratory capture of VOCs from fruits, can be efficiently performed by solvent or gas-based extraction techniques, such as Soxhlet, simultaneous distillation extraction, purge and trap, and headspace, among others.
Analysis of mVOCs is commonly performed under controlled culture media, temperature, and agitation. Also, the percentage of humidity and exposure to UV–visible light among other growing conditions should be taken into account. In order to account for reproducibility of the experiments, laboratory tests on microorganisms must be performed using international reference strains e.g.: American Type Culture Collection (ATCC), instead of clinical or field isolations, or even strains isolated and saved in the research group for a long time. Because the emission of VOCs can vary in terms of presence or absence, and in terms of fluctuation in concentration, throughout the life span of the microorganisms (which can be from a few hours to days), it is advisable to perform analyses both in the exponential or logarithmic growth phase, as well as in the stationary phase [12, 30, 31]. During the exponential phase, the microorganism is reactivating its biosynthetic pathways after having been in a state of latency. Therefore, in this stage, there is generally a high concentration of some metabolites that are part of the first stages of the biosynthetic pathways, which can later diminish and disappear in the exponential phase. The stationary phase is achieved when the initial metabolic processes have been reached and occurs when the survival process of the species begins . The metabolic changes produced in these two stages of microbial culture are fundamental to understanding and solving research questions [33, 34]. The determination of each of the culture phases is commonly done with a measurement of the absorption of light in the visible region between 500 and 650 nm for liquid growth medium. This is achieved by counting the colony-forming units (CFU) in the solid medium. The sampling time for analysis of mVOCs must coincide with those obtained in the growth curves, correctly differentiating the exponential and stationary phases.
For conducting volatile sampling from animals, the specimens could be either raised in captivity at controlled vivaria or extracted from their natural environments. Proper training in animal manipulation is an important aspect to be fulfilled before performing animal experimentation, as well as an approved permit by the Institution in charge to validates the procedures. Also, when animals are to be collected in their habitats, it is necessary to review if a Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) permit is needed for protected species. A specialist should validate taxonomic identification and, in those cases, where sample collection involves euthanization of specimens that should be registered at a recognized Museum, and voucher numbers should be annotated and published on the research paper. In the same way, as other organisms could be sampled by different methods, almost all animals could be sampled in vivo, but in some cases, tissue extraction could be preferred for guaranteeing detection of less abundant metabolites. Some techniques applied for VOCs analysis from terrestrial arthropods [35, 36, 37, 38, 39], aquatic organisms [40, 41, 42], mammals [43, 44, 45, 46, 47, 48], birds [49, 50], reptiles [51, 52], fishes , and amphibians [37, 38, 54, 55, 56, 57] include headspace-adsorbent traps, polydimethylsiloxane (PDMS) patches, swabs and stir bar sorptive extraction (SBSE).
3. Volatiles extraction
Sample preparation is one of the most important steps in the analytical process. The goal of sample preparation is to efficiently isolate target analytes from potential interferences and to extract as many VOCs as possible to provide a true representation of the studied system.
Some steps of pre-treatment of the sample are necessary in order to minimize the manipulation of the sample and avoid its modification, to clean-up the sample efficiently, and to quench metabolic reactions that could cause degradation and decomposition. To date, two different types of headspace sampling, static and dynamic, are widely used for volatilomics investigation.
Static headspace sampling is a passive technique for VOCs collection, where no air is circulated for the concentration of the volatiles on a sorbent matrix . As a result, the background noise is drastically reduced due to the absence of a continuous airflow that can contain impurities that could mask compounds released at trace amounts. In static headspace methods, samples are typically sealed inside a container or bag, where the volatiles are released and, in the more traditional version of the technique, the headspace is sampled directly using a gas-tight syringe and transferred to the Gas Chromatography (GC) injection port. When the analytes are present at trace level, it might be necessary to carry out static headspace methods with special techniques to concentrate volatiles during collection and reduce the dilution of the sample during desorption in the GC inlet. In such a context, SPME stands out as the most versatile strategy for volatile capture from the sample headspace in static mode. Nowadays, SPME is the leading technique in the analysis of volatiles of biological origin because it uses a fiber coated with a sorbent phase to combine extraction and pre-concentration compounds. SPME fibers are available in a wide range of coatings that allow the sampling of compounds of different polarities and volatilities. Considering that the goal of volatilomics profiling is to analyze as many metabolites as possible, the use of divinylbenzene/carboxen/polydimethylsiloxane (DVB/CAR/PDMS) fibers is the most suited to increase the number of analytes that can be trapped on the fiber because it can allow capture VOCs in a wide range of polarity and molecular weight .
This type of coating contains a layer of CAR particles underneath a layer of DVB particles. Because the ability of adsorbent coatings to extract a particular analyte strongly depends on the size of the pores, larger analytes will be retained in the outer DVB layer, while the smaller analytes will migrate through this layer and are retained by the inner layer of CAR. On the contrary, if the study targeted only on the most volatile fraction, PDMS/CAR would be an appropriate choice of coating, since the micropores of the CAR retain smaller analytes better than other coatings, although introducing a high degree of discrimination towards high-molecular-weight compounds.
On another hand, although other coatings, such as PDMS, polyacrylate (PA), and Carbowax (CW), are also commercially available, their use in volatilomics is quite scarce due to the higher selectivity towards certain classes of polarities [58, 59].
From a practical point of view, SPME is a versatile technique for in-field sampling as a non-destructive strategy for the study of the volatiles emitted ex-vivo, for example, by grapes. In this case, an aluminum wire cage can be used to support a polymeric film to enclose a whole cluster of grapes, and SPME fiber is introduced through a port fitted with a silicone septum (Figure 1a).
Also, an interesting strategy for speeding up the volatiles’ uptake is vacuum-assisted SPME. For example, in-field sampling of volatiles from a single grape berry, a modified screw top, and a 2 mL glass vial can be used for fiber exposition. A syringe is usually used to create a negative pressure to hold the sampling device with the SPME sealed onto the sample surface (Figure 1b).
This type of coating contains a layer of CAR particles underneath a layer of DVB particles. Because the ability of adsorbent coatings to extract a particular analyte strongly depends on the size of the pores, larger analytes will be retained in the outer DVB layer, while the smaller analytes will migrate through this layer and are retained by the inner layer of Carboxen. Conversely, if the study targeted only on the most volatile fraction, PDMS/CAR would be an appropriate choice of coating, since the micropores of the CAR retain smaller analytes better than other coatings, although introducing a high degree of discrimination towards high-molecular-weight compounds.
Although SPME generally exhibits better extraction efficiency as the polarity of the compound decreases, these three coatings can provide balanced metabolome coverage as long as most polar analytes are present at reasonable concentration levels. Absorbent coatings, such as PDMS, PA, and CW, were rarely employed in profiling studies. These coatings display selectivity based on polarity, resulting in poor metabolomic coverage. The second case is dynamic headspace sampling, which offers a highly concentrated sample that can be desorbed into a solvent at volumes suitable for multiple analyses. To date, it is the most frequently used technique in all areas of plant volatile analysis. Dynamic headspace sampling collects a much larger quantity of compounds at higher concentrations because the continuous stream of air allows the sorbent to act as a filter trapping the volatiles.
Also, push and pull headspace sampling, two examples of dynamic headspace sampling, allow to avoid problems often encountered with the sealed systems used in static headspace and closed-loop stripping methods including heat, water vapor, and, in the case of plants, ethylene accumulation that can affect not only sampling efficiency but also plant physiology. Among the several methods, closed-loop stripping systems have broad utility for the collection of volatiles: volatiles are collected during continuous circulation of HS air inside closed chambers in which air circulation pumps are connected to supporting columns or coated supports .
As an example, SPME is a versatile technique for in-field sampling handling as a non-destructive strategy for the study of the volatiles emitted ex-vivo, for example, by the whole cluster of grapes. In this case, an aluminum wire cage can be used to support a polymeric film to enclosing a whole cluster of grapes, and SPME fiber introduced through a port fitted with silicone septa (Figure 1a) . Also, an interesting strategy for speed up the volatile’s uptake is vacuum-assisted SPME. For example, in-field sampling of volatiles from a single grape berry, a modified screw top, and a 2 mL glass vial can be used for fiber exposition. A syringe is usually used to create a negative pressure to hold the sampling device with SPME sealed unto the sample surface (Figure 1b) .
Alternatively, to the SPME, some liquid-phase microextractions (LPME), such as the single drop microextraction (SDME) or the hollow fiber liquid-phase microextraction (HF-LPME), can also provide efficient and profitable volatiles recoveries in the headspace static mode. For example, SDME is a technique based on a few microliters of solvent, in which volatiles can be capture in a small drop of extraction solvent-exposed to the headspace of the sample [20, 59]. In the same way, to address the drawbacks of the drop instability, the extraction solvent can be deposited into the lumen of a porous fiber HF-LPME, improving the extraction kinetics by use of a bigger transference surface or by the incorporation of an acceptor solvent into the membrane pores (Supported Liquid Membrane, SLM). Although the use of hazardous organic solvents can be considered a drawback, nowadays those solvent-based extractions can be performed with environmental-friendly alternatives, such as ionic liquids, deep eutectic solvents, or supramolecular solvents, among others.
The second type of headspace sampling is the dynamic headspace (DHS) method. It encompasses strategies in which VOCs are captured in a sorbent-packed trap by passing a continuous flow of inert dry gas through the sample. In this way, the emission of VOCs speeds up by the continuous renovation of the headspace fraction. After extraction, concentrated VOCs can be desorbed from the sorbent-packed trap with a suitable solvent or via thermal desorption. Besides, DHS address some drawbacks of the static modes such as the accumulation of water vapor or highly concentrated compounds, which presence can affect extraction efficiency. Two examples of dynamic headspace sampling which allow avoiding some drawbacks of the static mode, e.g., heat and water vapor accumulation that can affect not only sampling efficiency but also plant physiology, are closed-loop stripping and push and pull methods. These systems collect VOCs in sorbent-packed traps or coated devices, via the continuous circulation of gas inside closed circuits .
In addition to headspace sampling techniques, some sui generis approaches can combine two methods from different groups, for example, solvent-assisted flavor evaporation (SAFE). SAFE is an exhaustive extraction technique based on the high volatility rather than the polarity of the target compounds. In this case, a crude-extract from dry sample pieces is prepared with an appropriate solvent, such as dichloromethane, and then added into the dropping funnel and passed through a specific distillation chamber. Extraction takes place at high vacuum, and low-temperature conditions (20–30°C), and VOCs are collected in a cooled extraction vessel . Other techniques including in this group are simultaneous extraction-distillation (SDE) and/or liquid–liquid extraction (LLE). Nevertheless, those can be subjected to some drawbacks, like the use of hazardous solvents, as well as the requirement of high temperatures and long extraction times, with potential formation of artifacts and degradation of some compounds.
Finally, volatile compounds also can be obtained for direct collection of the secretions of odoriferous glands or via non-invasive strategies using PDMS patches or swabs . These techniques are especially useful in the monitoring of VOCs from animals. For example, obtaining the animal skin volatilome on PDMS patches is an excellent option . Patches could be prepared by cutting a Silicone Elastomer Sheet (Goodfellows mfr. No. 942-965-49, Coraopolis, PA) and then carefully fix it on the animal skin with Tegaderm® dressings or water block clear Band-aids®. Alternatively, this procedure could be modified by gently swabbing the skin with or without previous stress-induced secretion. PDMS patches also can be placed into an animal enclosure and used without direct contact for capturing the volatiles that emanates in the headspace.
4. Volatilome profiling: separation and detection
Currently, gas chromatography coupled to mass spectrometry (GC–MS) is the primary analytical technique for the elucidation of the volatilome profile from natural sources. In gas chromatography analytes elute according to their volatility carried by a gas, usually Helium, through a coated fused silica capillary using a temperature gradient. Separation occurs based on the differential partition between the gas phase and the coating and the eluting peaks will give a response in the detector. The sample is vaporized in the injection system before it enters the column.
Several injection systems can be used to introduce the sample onto the column. Split injection allows transferring to the column only controlled sample amounts and prevent overloading of the column, thanks to a split valve at the base of the hot injector that divides the flow between column and waste in a fixable ratio. High-concentration samples can easily overload the GC column, resulting in all active sites on the column becoming occupied and leading to additional analytes not being retained and therefore to poor chromatographic resolution. For trace analysis, the injector can be used in splitless mode, which allows the entire volume of sample vaporized in the injector to reach the column. An alternative to the split/splitless interface is the programmed temperature vaporizer (PTV). Samples are injected onto a cool (40–60°C) PTV where they are trapped and concentrated on different sorbent materials before the inlet is rapidly heated to desorb the sample onto the column.
Different selectivity and sizes of columns have been used for GC–MS–based metabolomic analysis. The most used phase is 5% phenyl, 95% methyl siloxane, which offers a sufficiently generic selectivity, optimal for metabolomic applications where analytes with a wide range of volatilities have to be separated. Capillary columns of 25 to 30 m will provide the highest resolution and are available in most phases. An important point for all capillary GC–MS work is the need to condition the column prior to running valuable samples. Sangster et al. have recommended that several quality control samples be run at the beginning of a sample batch to condition the column . Care also needs to be taken to randomize the injection sequence in order not to compromise subsequent statistical analysis.
In GC–MS ionization of analytes is mainly produced by electron ionization (EI) or chemical ionization (CI), while ion separation is obtained by mass analyzers operating on different principles. In EI, analytes that elute from the GC column are vaporized into the ion source and collide with an electron beam at 70 eV. As a result of the high energy imparted by electrons to the vaporized molecules, characteristic fragmentation occurs, providing structural information. EI is very robust and highly reproducible between instruments, and spectral libraries are available that can be used to search for the identities of unknown compounds based on m/z and intensity ratios of the observed fragment ions. A disadvantage of EI is that fragmentation is usually so efficient that the intensity of the molecular ion can be extremely low or even lost. For CI, a reagent gas, such as methane or ammonia, is introduced into the source of the mass spectrometer. Protonated gas ions, produced by the collision with electrons originating from an electron beam, ionize the analytes eluting from the column after vaporization into the ion source. Significantly less energy than in EI is transferred to the analytes, and as a result, the dominant ion is usually the molecular ion.
Mass spectrometer based detectors are mainly used in metabolomic analysis and can be grouped according to the spectral information they provide, i.e., low-resolution instruments such as quadrupole mass spectrometer (qMS), ion-trap mass spectrometer (IT-MS), and high-speed time-of-flight mass spectrometer (TOF-MS) give nominal molecular weights and fragmentation of an analyte, while high-resolution instruments (high-resolution TOF-MS and hybrids) give the precise elemental composition of nominal masses. The single quadrupole mass analyzer is widely used and relatively inexpensive. The ions move along the axis of four parallel rods to which a direct current (DC) and an alternating current (AC) voltage are applied. These voltages affect the trajectory of ions traveling down the flight path between the rods in a way that only ions of a given m/z are transmitted at a given point in time. Scan speeds are rather low on quadrupole instruments, therefore considering the very high separation power of GC with peak widths of only a few seconds, it will be difficult to acquire several spectra across the width of a typical peak on a single quadrupole instrument. Time-of-flight (TOF) instruments are the most common mass analyzers in GC–MS–based metabolomics. The ions are accelerated in an electric field in which ions with the same charge will have the same kinetic energy, but different velocity depending on their mass-to-charge ratio (m/z). Successively, the ions enter a field-free region (flight tube) where they separate based on their m/z. TOF instruments are characterized by the fastest scan rate among all mass analyzers: a significant number of spectra can be acquired across each peak, leading to higher sensitivity and better spectral quality.
GC–MS has very high sensitivity and can therefore be used for the analysis of less commonly encountered samples that might only be available in trace amounts. Monodimensional GC–MS analysis provides suitable resolving-power for the analysis of relatively simple mixtures of VOCs. Nevertheless, volatilome samples can be very complex mixtures, involving a diverse plethora of chemical structures in a wide range of polarities, so that the restricted chromatographic resolution commonly limits the identification via MS to the more abundant compounds. Complex mixtures can be better resolved by employing comprehensive two-dimensional gas chromatography–mass spectrometry (GCxGC–MS), which has been defined as “…an orthogonal two-column separation, with complete transfer of a solute from the separation system 1 (column 1) to the separation system 2 (column 2), such that the separation performance from each system (column) is preserved” . In GC × GC, two columns with different polarity—usually a nonpolar column in the first dimension and a moderately polar column for the second one—are run in series. Analytes eluting from the first dimension (1D) column are trapped, focused, and then rapidly injected, as a narrow band of few milliseconds, in the second dimension (2D) column, then the eluting peaks are detected by MS. The transfer process is actuated by a modulator, a thermal or valve-based focusing system. Each single modulator cycle takes a fixed time (4–8 s) and each fraction, injected online into the second column must be analyzed in a time equal to that of the successive modulation. The challenge is to avoid continuously transmitting analyte onto the second column, which would lead to a loss of resolution. A solution to this problem is to make the separation on the second column much faster than the separation on the first column. The volume of data generated is significantly larger than the one obtained in a one-dimensional analysis. However, this approach allows for better separation of the number of components in the sample. Although single qMS instruments are cheaper, can provide very low LODs via selected ion monitoring (SIM), and can provide maximum acquisition rates (20,000 amu/s) suitable for metabolic profiling, TOF has become the preferentially MS analyzers for GCxGC volatilome analysis. TOF-MS instruments are capable of full-spectrum collection rates up to 500 Hz with improved sensitivity. Besides the high-resolution mass spectrometry (HRMS) provide accurate mass data, which increases the identification confidence and allows to annotate molecular formulas for unknown compounds, being especially useful in untargeted metabolomic studies.
Metabolite identification remains a major complication. Although EI generates highly reproducible fragmentation spectra, only a relatively small percentage of metabolites can be identified by searching databases, mainly because these have traditionally been a repository of EI spectra of synthetic organic compounds. Only recently, the number of metabolite spectra started to increase. A more powerful identification method involves comparing both EI/CI spectra and retention indices obtained from analyzing a reference compound under identical analytical conditions. If commercial standards are not available, metabolite identification can be cumbersome.
Retention indexes (RI) were first introduced by Kováts  for isothermal analysis and then by Van den Dool  for temperature-programmed analysis (linear retention indices, LRIs) and are calculated vs. a homologous series of linear hydrocarbons run in the same GC conditions as samples. RI can also be automatically calculated using the Automated Mass Spectral Deconvolution and Identification System (AMDIS), freely available from the National Institute of Standards and Technology (NIST) at this site (http://www.amdis.net/).
In order to achieve the identification of unknown compounds, their background-subtracted EI spectra are searched against EI libraries (such as the NIST library) to achieve identification. Values of
The high variability of data obtained from the investigated matrix composition makes it hard to indicate a universal approach to quantitatively evaluate the volatilome composition. The most widely used approaches are: (a) relative percentage abundance, (b) internal standard normalized percentage abundance, and (c) “absolute” or true quantitation of one or more target components, with or without a validated method. Relative percentage abundance can be applied only to evaluate relative component ratios within the same sample. Internal standard normalized percentage abundance is the ideal approach when a group of samples is compared: raw data must first be corrected vs. analyte response factors to the detector, then normalized vs. an internal standard. Percentage abundance must be calculated vs. the sum of the areas of a fixed number of selected components, found in all the samples. The quantitation of marker components is obtained from the chromatographic area in SIM mode vs. an internal (or external) standard and calculated via a calibration curve constructed from amounts of pure standards in the selected concentration range.
Some common non-separative techniques used in the study of volatilome using mass spectrometry are selected-ion flow-tube mass spectrometry (SIFT-MS) and proton-transfer-reaction mass spectrometry (PTR-MS). These techniques are focused on the use of soft chemical ionization, allow on-line detection of VOCs with low levels of detection without the need for pre-concentration or sample preparation, which facilitates obtaining reproducible results. For example, Vendel and co-workers , used SIFT-MS and HS-SPME-GC–MS for the analysis of strawberry aroma. Although both techniques provided similar results in the study of the fruit ripening, the SIFT-MS analysis was about 11 times faster than HS-SPME-GC–MS. Moreover, SIFT-MS showed low detection limits, so that the postharvest analysis can be easily performed by the analysis of individual fruit. Capellin and collaborators  developed a similar study was using PTR-TOF-MS to study the volatilome of clones belonging to three types of apple. They concluded that PTR-TOF-MS is a very useful tool for volatilome studies once this technique allows obtaining a rapid and non-invasive fingerprint of the VOCs profile from single apple fruits.
With an alternative focus, the chromatographic system can be coupled to an olfactometer detector to identify the aroma-active compounds present in a determinate volatilome. This type of analysis allows determining the compounds which generate a positive response to the electronic noise detector, obtaining their identification by comparison of the mass spectrum, retention index, and odor descriptions with reference compounds. Using gas chromatography-olfactometry-mass spectrometry (GC-O-MS), Zhu and co-workers  studied the volatile profile of three cultivars of mulberries, establishing benzaldehyde, ethyl butanoate, (E)-2-nonenal, 1-hexanol, hexanal, methional, 3-mercaptohexyl acetate, and 3-mercapto-1-hexanol as the main compounds responsible for the characteristic aroma of mulberry.
5. Volatilome data processing
Once the raw data have been acquired following chromatographic separation and mass spectrometry analysis, the large amount of data generated needs to be processed following a standardized procedure that includes data conversion, pre-processing, pre-treatment, and metabolite annotation . An additional step, sharing data derived from any metabolomics analysis, currently is optional for researchers but highly recommended.
5.1 Extract raw files from instruments and proceed to data conversion
Data processing starts with a set of raw data files for different samples. Usually, default vendor formats from instruments need a conversion. A useful toolkit compatible with several instruments formats is ProteoWizard (http://proteowizard.sourceforge.net/download.html) . Open-source formats usually supported by many software packages are Network Common Data Form (NetCDF) , Extensible Markup Language (mzXML) , and Mass Spectrometry Markup Language (mzmL) . Each file is processed to an easily accessible and more informative data table, where rows represent samples and columns represent different features from volatilome. Values from this matrix represent intensity values of peak area/height, standing for relative concentration. The data should be checked for missing values and possible outliers.
5.2 Set parameters to perform data pre-processing
Pre-processing involves setting different filters to recognize signals from noise, select masses or intensities to perform feature detection, and finally adjust the retention time shifts parameters needed to align features throughout all samples. The aim of pre-processing is to minimize the number of false positives features and to establish quantitative procedures for discarding less reliable signals with low signal-to-noise ratio, or low prevalence within a similar set of samples .
5.3 Choose the best method to perform data pre-treatment
Pre-treatment or data correction is one of the most important steps from data analysis because systematic and technical variation could obscure relevant biological patterns. The variation in the data resulting from a metabolomics experiment is the sum of the induced variation and the total uninduced variation . Some sources of variation could be controlled by researchers through a careful experimental design. In other cases, this variation is very difficult to control. Natural variation in the metabolism of an organism can cause 5000-fold differences in signal intensities for different metabolites, or sampling could not be performed on the exact conditions for all samples, sample work-up varies naturally between batches, and analytical errors are always present. This variation could be accounted for using different classes of corrections that include centering, scaling, transformation, and normalization of raw data and several methods are available to do so (e.g., autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, normalization by sum, normalization by a reference sample). The selection of the most appropriate method depends on the hypothesis to be tested and the statistical behavior of the data matrix. Before applying pre-treatment methods, it is required to check if data is fit for analysis. For example, performing the treatment may enhance the results of a clustering method (if the hypothesis is related to comparison of similarities), while obscuring the results of a Principal Component Analysis (PCA) (if in contrast, the hypothesis is related with determining redundancy between metabolites) .
5.4 Metabolite annotation
The analysis by comparison with pure standards of different family of compounds is advisable, in order to compare the retention rates of the compounds. However, the characterization of a certain metabolite that there are no pure standards, its determination can be done by comparison with homologues of a certain family of compounds, which the detailed analysis of the fragmentation pattern. Metabolite annotation is still challenging despite all efforts made for establishing specialized databases with mass spectral properties of different metabolites. Annotation and identification levels for metabolites were defined by the Chemical Analysis Working Group of the Metabolomics Standards Initiative (MSI). Level 1 indicates compromise identified compounds, level 2 is used for putatively annotated compounds, level 3 is used for putatively characterized compound classes, and level 4 is used for unidentified or unclassified metabolites that still can be differentiated and quantified based upon spectral data. Dark matter, also called “unknown unknowns”, represents the majority of metabolites analyzed on a metabolomics experiment, because instruments collect much more information than it is currently possible to annotate . It is estimated that an average of only 2% of the data can be annotated. This is even a most common problem in metabolomics analysis from animals because many databases are specialized in human-derived metabolites, or some molecular structures from animals have been solved but are absent from the reference databases. Analysis from non-model organisms tends to have a higher number of truly novel compounds, called “unknown unknowns” . As it is impossible to collect spectra for every molecule in the universe, computer-generated (in silico) spectral prediction algorithms are also recommended during metabolite annotation such as CSI:FingerID (https://www.csi-fingerid.uni-jena.de/) and Competitive Fragmentation Modeling-ID (CFM-ID, https://cfmid.wishartlab.com/) for analyzing fragmentation patterns. For volatilome analysis NIST (https://www.mswil.com/software/spectral-libraries-and-databases/nist20/) and Wiley (https://www.mswil.com/software/spectral-libraries-and-databases/wiley-spectral-libraries/wiley-gcms-libraries/) electronic collections are the most used mass spectra databases. The Dictionary of Natural Products (DNP) (http://dnp.chemnetbase.com/faces/chemical/ChemicalSearch.xhtml;jsessionid=DBE98AD72918A1607A7E739064D0DB21), Pherobase (https://www.pherobase.com/), Human Metabolome Database (HMDB) (https://hmdb.ca/), METLIN (https://metlin.scripps.edu/landing_page.php?pgcontent=mainPage), MassBank Japan (http://www.massbank.jp/), MassBank Europe (https://massbank.eu/MassBank/), MassBank North America (https://mona.fiehnlab.ucdavis.edu/), Supernatural II (http://bioinf-applied.charite.de/supernatural_new/index.php), ChEMBL (https://www.ebi.ac.uk/chembl/), Mass Spectral and GC Data of Drugs, Poisons, Pesticides, Pollutants, and Their Metabolites (https://www.wiley.com/en-gb/Mass+Spectral+and+GC+Data+of+Drugs%2C+Poisons%2C+Pesticides%2C+Pollutants%2C+and+Their+Metabolites%2C+5th+Edition-p-9783527342877) and vocBinBase (https://bitbucket.org/fiehnlab/binbase/src/master/) are other useful resources. When compound annotation is not possible and only chemical class could be assigned to a metabolite it is recommended to employ the comprehensive, and computable chemical taxonomy from Classyfire (http://classyfire.wishartlab.com/). See  for a review focused on mass spectral databases for LC/MS- and GC/MS-based metabolomics. For the analysis of mVOCs, in 2014 was developed a software that allows the characterization of mass spectra obtained in microorganisms. It was updated in 2018 with more than 2000 compounds from more than 1000 species, which is called mVOC database 2.0 (http://bioinformatics.charite.de/mvoc) . With this tool a more precise characterization of the different volatilome of the microbes studied at present is achieved.
6. Select the best statistical analysis for the research question and coherent with data pre-treatment
Select the univariate statistics according to the variables of interest. T-test, U-test, and analysis of variance (ANOVA) are the most common univariate statistics employed for data mining in volatilomics. As datasets usually include a large number of features, the significance level should be determined appropriately to reduce the number of false positives and false negatives. For reducing false positive, family wise error rate (FWER) correction, such as a Bonferroni correction, is a conservative approach, in which the p-values are multiplied by the number of comparisons. In contrast, for reducing false negatives, false discovery rate (FDR) correction is a highly sensitive method .
7. Select the best suitable multivariate statistics
Multivariate statistical methods are very powerful at summarizing large and multidimensional data generated from volatilomics. Exactly as for pre-treatment methods, multivariable analysis should be chosen carefully and selected coherently with the hypothesis of interest and methods used for data pretreatment. Unsupervised approaches and supervised approaches differ in how samples are grouped within the multivariate calculations. Unsupervised solely have access to the matrix to find features useful for grouping and categorizing the samples. Clustering methods, such as hierarchical clustering (HCA), K-means clustering, self-organizing maps, principal component analysis (PCA) are among this group. Once the data have been analyzed by unsupervised methods, supervised methods (e.g. partial least squares discriminant analysis (PLD-DA), artificial neural networks, and evolutionary algorithms) should be applied for further evaluation . Supervised methods have access to qualitative or quantitative traits (e.g., specie, location, body size, tissue type) and the matrix of measurements and can classify samples. Volcano plots have also recently been used to identify significantly covarying metabolites in binary comparisons. Volcano plots show each features’ statistical significance, p-value, on the y-axis, and fold change along the x-axis .
7.1 Determine if network inference provide better insights about data interpretation
Correlation networks is a visualization tool that summarizes positive and negative correlations found between samples that represent different biological process . Molecular networking organizes metabolite features from a volatilomics analysis into a connectivity network based on similarities in molecular fragmentation patterns obtained from mass spectrometry . This analysis cluster families of molecules through vector correlations between fragment ions and enhance the interpretation of volatilome differentiation using a chemically informed visualization. Also, it enhances the annotation process with experimental and in silico databases . When it is possible to combine Volatilomic and Genomic analysis, molecular networking can also be useful to prioritize features by linking observed natural products to their cognate biosynthetic gene clusters and gene cluster families .
7.2 Whenever possible, share data in public repositories
Recently, many researchers have shared raw data files on open repositories, and this has motivated computer scientists to develop modern algorithms for facilitating the comparison of MS spectra obtained in different conditions . This comparison still needs human inspection from experts trained in mass spectrometry fragmentation patterns, because is not an automatic process. Some examples of sites that allow raw experimental data to be shared in public repositories include MetaboLights (http://www.ebi.ac.uk/metabolights/), the Metabolomics Workbench (https://www.metabolomicsworkbench.org/), XCMS Online (https://xcmsonline.scripps.edu/landing_page.php?pgcontent=mainPage), MetabolomeExpress (https://www.metabolome-express.org/), GNPS (https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp) and the Metabolomic Repository Bordeaux (http://services.cbib.u-bordeaux.fr/MERYB/).
Current technological advances in sample collection, extraction techniques, volatile profiling, and data processing allow that the analysis of an invisible world where VOCs mediates different ecological processes could recover a more accurate picture of the complex chemical communication that occurs in nature. Different combinations of procedures need to be followed by researchers with the aim to answer specific scientific questions or hypotheses. Microextraction techniques emerge as tools for increasing extraction efficiency and at the same time facilitating faster extraction times without the environmental impact of large volume solvent wastes. Gas chromatography has played a fundamental role to detect volatile compounds often present as trace levels. Mass spectrometry has proved to be the preferred technique for the structure elucidation of new compounds and annotation of known VOCs. Current improvements in data analysis allow to extract of more biologically relevant information from a single study and to standardize procedures for evaluating hypothesis properly. All these steps are of paramount importance to evaluate both the ecological function of these compounds and the economic value in the medical, agricultural, flavor, and fragrance industry.
The authors thank the Department of Chemistry and Vicerrectoria de Investigaciones at Universidad de los Andes, Bogotá, Colombia for financial support. We wish to thank to Ministerio de Ciencia, Tecnología e Innovación (MinCiencias) for Julie Paulin Garcia Rodriguez (No 679), Mabel Gonzalez (No 757) and Gerson-Dirceu López (No 785), as well as the support to No. 44842-058-2018 and No. 80740-532-2019 projects. Also, the Faculty of Sciences of the Universidad de los Andes forgivable loan and research funds (INV-2018-2033-1259, INV-2019-2067-1747, INV-2018-2048-1338, and INV-2019-2086-1843). Scholarship granted by Fulbright to Mabel González as a Visiting Scholar at the Dorrestein Laboratory at Skaggs School of Pharmacy & Pharmaceutical Sciences, University of California, San Diego, United States.
Conflict of interest
The authors declare no conflict of interest.
|AMDIS||Automated Mass Spectral Deconvolution and Identification System|
|ANOVA||Analysis of variance|
|ATCC||American Type Culture Collection|
|CFM-ID||Competitive Fragmentation Modeling-Id|
|CITES||Convention on International Trade in Endangered Species of Wild Fauna and Flora|
|DNP||Dictionary of Natural Products|
|FWER||Family Wise Error Rate|
|FDR||False Discovery Rate|
|GC–MS||Gas Chromatography–Mass Spectrometry|
|GC-O-MS||Gas Chromatography-Olfactometry-Mass Spectrometry|
|GNPS||Global Natural Products Social Networking|
|GCxGC||Comprehensive Two-Dimensional Gas Chromatography|
|HF-LPME||Hollow Fiber Liquid-Phase Microextraction|
|HMDB||Human Metabolome Database|
|HRMS||High-Resolution Mass Spectrometry|
|IT-MS||Ion-Trap Mass Spectrometer|
|LRI||Linear Retention Indices|
|MSI||Metabolomics Standards Initiative|
|mVOCs||Microbial Volatile Organic Compounds|
|mzmL||Mass Spectrometry Markup Language|
|mzXML||Extensible Markup Language|
|NetCDF||Network Common Data Form|
|NIST||National Institute of Standards and Technology|
|PCA||Principal Component Analysis|
|PLD-DA||Partial Least Squares Discriminant Analysis|
|PTR-MS||Proton-Transfer-Reaction Mass Spectrometry|
|PTV||Programmed Temperature Vaporizer|
|qMS||Quadrupole Mass Spectrometer|
|SAFE||Solvent Assisted Flavor Evaporation|
|SBSE||Stir Bar Sorptive Extraction|
|SDME||Single Drop Microextraction|
|SIFT-MS||Selected-Ion Flow-Tube Mass Spectrometry|
|SIM||Selected Ion Monitoring|
|SLM||Supported Liquid Membrane|
|TOF-MS||Time-of-Flight Mass Spectrometer|
|VOCs||Volatile Organic Compounds|