Quality Control of Biomarkers: From the Samples to Data Interpretation

The recent advances in biotechnology and the improved understanding of disease’s mecha‐ nisms and pathophsyology have strongly shifted the treatment paradigm of empiric knowl‐ edge to targeted therapy. Science has enhanced its ability to guide application of new and existing treatments with development, assay verification, biological validation and applica‐ tion of biomarkers; however, in order to be successful, it is needed a thorough understand‐ ing of the relationship between the choice of a biomarker and its influence on the treatment effects. [1]


Introduction
The recent advances in biotechnology and the improved understanding of disease's mechanisms and pathophsyology have strongly shifted the treatment paradigm of empiric knowledge to targeted therapy. Science has enhanced its ability to guide application of new and existing treatments with development, assay verification, biological validation and application of biomarkers; however, in order to be successful, it is needed a thorough understanding of the relationship between the choice of a biomarker and its influence on the treatment effects. [1] Current biochemical and molecular biological knowledge states that genetic information flows from genomic DNA to mRNA transcripts, which are then translated to proteins; this class of molecules, which also include enzymes, directly influence the concentrations of their substrates and products, which are integrating parts in several tightly-controlled metabolic pathways. Finally, the existence and multiple interactions of these low-molecular weight metabolites within a cell, tissue, or organism, generates a phenotype. [2] Metabolome, the link between phenotype and genotype, is the last comprehensive grouping for downstream products of the genome and contemplates the total complement of all the low-molecular weight molecules (metabolites) in a cell, tissue, or organism, required for growth, maintenance, or basal function in any given specific physiological state. [3] The potential size of the metabolome is arguable, as studies suggest more and more that an important role is played by residing microflora and its metabolic products. [2] The monitoring of metabolite changes has been the primary indicator of disease, and has made it possible to diagnose it in individuals. For that reason, the measurement of metabo-lites has become an essential part of clinical practice. Employing a wide range of biological fluids, such as blood (including both plasma and serum), saliva, cerebrospinal fluid (CSF), synovial fluid, urine, semen, and tissue homogenates have ensured the widespread use of metabolites as a very powerful diagnostic tool. [4] Despite significant advances in analytical technologies the past few years, the discovery of metabolomic biomarkers in biological fluids still remains a challenge. As discussed, metabolome plays an important role in biological systems, hence, are attractive candidates to understand disease phenotypes. [5][6] It represents a diverse group of low-molecular weight structures including lipids, amino acids, peptides, nucleic acids, organic acids, vitamins, thiols, carbohydrates and a few others. [7] Biomarkers are defined as "characteristics that are objectively measured and evaluated as indicators of normal biological processes, pathogenic processes or pharmacological responses to therapeutic intervention". They can be categorized as biomarkers of exposure, biomarkers of effect and biomarkers of susceptibility. [8] Those characteristics are informative for clinical outcome and can be broadly understood as prognostic or predictive biomarkers. [9][10] Along the variety of chemical classes and physical properties that constitute metabolites, as well as the dynamic range of metabolite concentrations across large orders of magnitude, it becomes clear why it is necessary to employ an extensive array of analytical techniques in metabolomic research, for it represents a comprehensive method for metabolite assessment. [11][12] Enabling the parallel assessment of the levels of a broad number of endogenous and exogenous metabolites, it has been demonstrated to have great impact on investigation of physiological status, diseases diagnosis, biomarker discovery and identification of disrupted pathways due to disease or treatment. [13][14]

Mass spectrometry in metabolomics
Nowadays, mass spectrometry is one of the most promising approaches for quantifying and qualifying known and unknown specific molecules within a very complex sample, and for elucidating the structure and chemical properties of different compounds. A mass spectrometer consists of three major components: (1) Ion Source: For producing gaseous ions from the substance being studied, some examples are electron impact (EI), chemical ionization (CI), electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), atmospheric pressure photon ionization (APPI), thermospray ionization (TSI), among others; (2) Analyzer: For resolving or separating ions according to their mass-to-charge ratios, some analyzer examples are: quadrupole, time of flight, ion traps, Fourier transform ion cyclotron resonance, orbitrap, among others; (3) Detector system: For detecting the ions and recording the relative abundance of each of the resolved ionic species, for example: electron multiplier, microchannel plate detector, Daly detector, Faraday cup, among others. The mass spectrom-etry technique relies on the capacity of converting neutral molecules into gaseous ions, with or without fragmentation, which are then characterized by their mass to charge ratios (m/z) and relative abundances. The introduction of a sample into the system, which can be a gas chromatography or liquid chromatography system is necessary to allow the study of different structures and ionic forms.
Historically, most studies with metabolites have been performed with a combination of high resolution capillary gas chromatography, combined with electron impact ionization mass spectrometry (GC-MS). This configuration allowed, for decades, the separation and identification of key micromolecules from complex mixtures, including fatty acids, amino acids, and organic acids in biofluids, generating diagnostic information for several metabolic disorders in qualitative and quantitative pathways [15][16].
Despite its age, GC is still a very useful and informative technique that seems to be far away from retirement; however, there are some limitations in relation to the size and metabolite types that can be analyzed by this technique, and the extensive sample preparation for this purpose. This resulted in the use of nuclear magnetic resonance (NMR) as a tool for metabolite profiling; however, besides the richness of information about molecular structures obtained by this approach, NMR has low sensitivity, allowing just the most abundant compounds to be identified. In contraposition of GC-MS and NMR, the mass spectrometry with a high performance liquid chromatographic system (LC-MS), and the possibility of tandem mass spectrometry (LC-MS) as post-source fragmentation, especially after soft ionization techniques, offers the possibility of analyzing a wide range of polar and medium polarity compounds with good quantification, sensibility and reproducibility [16].
According to Birkemeier et al. (2005) [17], the metabolomic approaches are in dynamic development and a diversity of synonyms have been suggested, such as metabonomics, metabolite profiling (fingerprinting), among others. Several analytical platforms have been introduced, including spectroscopies using diverse electromagnetic wavelengths, like metabolite profiling with the use of infrared spectroscopy (IR), near infrared (NIR), or ultraviolet (UV), besides gas chromatography coupled to mass spectrometry (GC-MS), liquid chromatography with electrospray ionization mass spectrometry (LC-ESIMS), capillary electrophoresis with mass spectrometry (CE-MS) or liquid chromatography with nuclear magnetic resonance (LC-NMR), and these are only a few examples of the technologies involved with metabolomic studies. There is not a single approach to analyze the wide range of chemically different biomolecules, but it is important to choose the technology that fits better to your target molecules [17]. Hollywood et al. (2006) [18] have summarized the main metabolomic strategies: 1. Metabolomic target analysis, which is a more restrict approach. For example, the metabolites originated from a particular enzymatic system after any kind of biotic or abiotic disturbance.
2. Metabolite profiling, which is focused in a group of specific metabolites, for example, lipids associated to a determined metabolic pathway; or related with clinical and pharmaceutical analyses, to map drug metabolism in an organism. This strategy can be also applied with other approaches, e.g.: a. "Metabolite fingerprinting", this approach is used in order to classify samples based both in their biological relevance to the organism, and in their origin. The fingerprinting technology is fast, but not necessarily gives specific information about metabolites. b. "Metabolite footprinting", exometabolome or secretome, this is similar approach to the fingerprinting, however the target now is a non-invasive analysis, in order to identify the extracellular metabolites. This technique is generally employed to the study of culture cells, with the advantage of not needing to extract the metabolites, and not having to interrupt the metabolism in a given moment before the analysis. Otherwise, this technique can be used for analysing the secretion of any organism, including the secretome of human embryos before in vitro fertilization, with the purpose of finding viable embryos and general disease biomarkers.
3. Metabolomics itself, which is the comprehensible analysis of the whole metabolome (all the mensurable metabolites), under a specific analysis condition. This term is frequently mistaken with metabonomics, a technique that focus in a wider profile of metabolites involved with different metabolic pathways interacting under the effect of some external stimuli, including diseases, drugs, toxins, among other.

MALDI AND MALDI-Imaging
Matrix-assisted laser desorption/ionization (MALDI) is an ionization method with common applications to high mass biomolecules, being a key technique in mass spectrometry (MS), and more traditionally to the proteomics field. MALDI-MS is extremely sensitive, easy-toapply, and relatively tolerant to contaminants [19]. Its high-speed data acquisition and large-scale, off-line sample preparation has made it once again the focus for high-throughput proteomic analyses. These and other unique properties of MALDI offer new possibilities in applications such as rapid molecular profiling and imaging by MS [19].
More recently, there is a growing focus on the use of MALDI ionization system to the analysis of small molecules, however it is important to take into consideration that the coupling of LC-MALDI is a more delicate issue than the coupling of HPLC with other ionization sources such as ESI, because MALDI, based on desorption of molecules from a solid surface layer, is a priori not compatible with LC or CE [20]. A simple alternative to this limitation is the automatic deposition of fractions from a chromatographic separation on a MALDI-TOF target. More advanced techniques have been developed recently: electrospray deposition, electrically mediated deposition, rotating ball inlet, continuous vacuum deposition, and continuous off-line atmospheric-pressure deposition. The current interfacing improvements will surely expand the use of LC-MALDI in the metabolomic area [20,21].
Another good advantage of MALDI ionization is the possibility of obtaining tissue imaging. This is a new technology that allows the simultaneous investigation of the content and temporal/spatial distribution of molecules within a tissue section, enabling to find the exact localization of any biomarker of interest for the prediction of pathologies and for the discovery of future secondary complications originated from different metabolic disease [22].
One of the most common applications for this new approach besides the well described proteomics application is the identification of membrane lipids, which have been successfully analyzed by different authors for several biological tissues. MS imaging of cryosections of mature cotton embryos revealed a distinct, heterogeneous distribution of molecular species of triacylglycerols and phosphatidylcholines, the major storage and membrane lipid classes in cotton embryos. Other lipids were imaged, including phosphatidylethanolamines, phosphatidic acids, sterols, and gossypol, indicating the broad range of metabolites and applications for this chemical visualization approach [23].
There are several possibilities for MALDI imaging technology; however applications to the study of small molecule biomarkers are becoming an interesting novel possibility for this ionization method, mainly when considering the development of new matrices which generate low noise levels in the low m/z range of the spectra. Bnabdellah et al. (2009) [24] have described the detection and identification of 13 primary metabolites (AMP, ADP, ATP, UDP-GlcNAc, among others), directly from rat brain sections by chemical mass spectrometry imaging. Matrix-assisted laser desorption/ionization tandem mass spectrometry (MAL-DI-MS/MS) was combined with 9-aminoacridine as a powerful matrix in this study.
Metabolite distribution via imaging mass spectrometry (IMS) is an increasingly utilized tool in the field of neurochemistry. As most previous IMS studies analyzed the relative abundances of larger metabolite species, it is important to expand its application to smaller molecules, such as neurotransmitters [25]. However, it has been pointed out two technical problems that must be resolved to achieve neurotransmitter imaging, the lower concentrations of bioactive molecules, compared with those of membrane lipids, require higher sensitivity and/or signal-to-noise (S/N) ratios in signal detection, and the rapid molecular turnover of the neurotransmitters; thus, tissue preparation procedures should be performed carefully to minimize postmortem changes [25].
Furthermore, matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry has attracted great interest for monitoring drug delivery and metabolism. Since this emerging technique enables simultaneous imaging of many types of metabolite molecules, MALDI-IMS can visualize and distinguish the parent drug and its metabolites. As another important advantage, changes in endogenous metabolites in response to drug administration can be mapped and evaluated in tissue sections [26].
Another applications of MALDI and MALDI imaging to the study of small molecule biomarkers are the use of the method for detecting drug-related degradation products [27] analysis of drugs from intact biological samples and crude extracts, a method that can be applied to rapid drug screening and precise identification of toxic substances in poisoning cases and postmortem examinations [28], the application of MALDI imaging mass spectrometry to the study of elevated nigral levels of dynorphin neuropeptides in L-DOPA-induced dyskinesia in rat model of Parkinson's disease [29], it is also possible to point out the recent advances in the field of lipidomics and oxidative lipidomics based on the applications of mass spectrometry and imaging mass spectrometry as they relate to studies of phospholipids in traumatic brain injury [30] and the using of proteomic or lipidomic signatures for discovery and spatial mapping of molecular disturbances within the microenvironment of chronic wounds using MALDI imaging technology [31].

Orbitrap
The orbitrap mass analyzer is a powerful and relatively new technology, which operates in the absence of any magnetic or rf fields. In this analyzer, ion stability is achieved only due to ions orbiting around an axial electrode. Orbiting ions also perform harmonic oscillations along the electrode with frequency proportional to (m/z)-1/2. These oscillations are detected using image current detection and are transformed into mass spectra using fast FT, similarly to FT-ICR [32]. In an orbitrap, ions are injected tangentially into the electric field between the electrodes and trapped because their electrostatic attraction to the inner electrode is balanced by centrifugal forces. Thus, ions cycle around the central electrode in rings. In addition, the ions also move back and forth along the axis of the central electrode. Therefore, ions of a specific mass-to-charge ratio move in rings which oscillate along the central spindle. The frequency of these harmonic oscillations is independent of the ion velocity and is inversely proportional to the square root of the mass-to-charge ratio (m/z). The entire instrument operates in LC/MS mode (1 spectrum/s) with nominal mass resolving power of 60 000 and uses automatic gain control to provide high-accuracy mass measurements, within 2 ppm using internal standards and within 5 ppm with external calibration. The maximum resolving power exceeds 100 000 (Full Width at Half-Maximum -FWHM). Rapid, automated datadependent capabilities enable real-time acquisition of up to three high-mass accuracy MS/MS spectra per second [32,33].
Some recent applications of this mass analyzer in the search of biomarkers include the ontissue digestion of proteins followed by detection of the resulting peptides, taking advantage of the high resolution obtained. Trypsin was applied by a spraying device for MALDI imaging experiments in a LTQ-Orbitrap mass spectrometer. The mass accuracy under imaging conditions was better than 3 ppm RMS. This allowed for confident identification of tryptic peptides by comparison with liquid chromatography/electrospray ionization tandem mass spectrometry (LC/ESI-MS/MS) measurements of an adjacent mouse brain section [34].
Another possible application for this mass analyzer is the monitoring of metabolites in human urine, approximately 970 metabolite signals with repeatable peak areas could be putatively identified in human urine, by elemental composition assignment within a 3 ppm mass error. The ability of the methodology for the verification of non-molecular ions, which arise from adduct formation, and the possibility of distinguishing isomers could also be demonstrated. Careful examination of the raw data and the use of masses for predicted metabolites produced an extension of the metabolite list [35].
Orbitrap mass analyzer has been also successfully applied to the monitoring of environmental contamination. The use of pharmaceuticals in livestock production is a potential source of surface water, groundwater and soil contamination. A rapid, versatile and selective multimethod was developed and validated for screening pharmaceuticals and fungicides compounds, in surface and groundwater, in one single full-scan MS method, using benchtop U-HPLC-Exactive Orbitrap MS at 50,000 (FWHM) resolution. It demonstrates that the ultrahigh resolution and reliable mass accuracy of Exactive Orbitrap MS permits the detection of pharmaceutical residues in a concentration range of 10-100 ng.L -1 , applying a post-target screening approach, in the multi-method conditions [36].
Other recent applications of orbitrap mass analyzer in the search of biomarkers include: the analysis of serotonin and related compounds in urine and the identification of a potential biomarker for attention deficit hyperactivity/hyperkinetic disorder [37,38]; the quantitative profiling of phosphatidylethanol molecular species, which are a group of aberrant phospholipids formed in cell membranes in the presence of ethanol by the catalytic action of the enzyme phospholipase D on phosphatidylcholine in human blood, by liquid chromatography high resolution mass spectrometry performed on an LTQ-Orbitrap XL hybrid mass spectrometer equipped with an electrospray ionization source operated in negative ion mode [39]; frozen sections (12 μm thick) of an ex vivo tissue sample set comprising primary colorectal adenocarcinoma samples and colorectal adenocarcinoma liver metastasis samples were analyzed by negative ion desorption electrospray ionization (DESI), with spatial resolution of 100 μm using a computer-controlled DESI imaging stage mounted on a high resolution orbitrap mass spectrometer. DESI-IMS data were found to predominantly feature complex lipids, including phosphatidyl-inositols, phophatidyl-ethanolamines, phosphatidyl-serines, phosphatidyl-ethanolamine plasmalogens, phosphatidic acids, phosphatidyl-glycerols, ceramides, sphingolipids, and sulfatides among others, were identified based on their exact mass and MS/MS fragmentation spectra [40]; among several other applications of this promising technology to the discovery of important biomarkers in different biological systems, taking advantage of the high resolution and speed for LC-MS of this new analytical system.

Gas Chromatography
Gas chromatography (GC) can be understood as the chromatographic technique in which a gas is the mobile phase and, since 1952, when the first paper in this field was published, GC has always been considered simple, fast and applicable to the separation of many volatile materials, especially petrochemicals, for which distillation was the preferred method of separation at that time. Now, GC is a very important technique, and global market for instruments is estimated around to US$ 1 billion or over 30,000 instruments annually [41].
Chromatography is the separation process of a mixture into individual components; through the separation process, each component in the sample can be identified (qualitatively) and measured (quantitatively). There are several kinds of chromatographic techniques with theirs corresponding instruments, and gas chromatography is one of those techniques. GC is used for compounds that are thermally stable and volatile -or that can become volatilizable. Because of its simplicity, sensitivity and effectiveness in separating components, GC is one of the most important tools in chemistry. The principle of basic operation of this instrument involves the evaporation of the sample in a heated inlet port (injector), separation of the components in a mixture employing a prepared column specially and detection of each component by a specific detector. At the end of the process, the amplified detector signals are often recorded and evaluated by integrator software, calculating the analytical results. The sample is introduced into a stream of inert gas, the carrier gas, and transported through the column by its flow. The column can be a packed column or a capillary column, depending on the properties of the sample. As the gas flow passes through the column, the components of the sample move in velocities that are influenced by the degree of interaction of each component with the stationary phase in the column. Consequently, the different components are separated. Since the processes are temperature-dependent, the column is usually contained in a thermostat-controlled oven. Once that the components are eluted from the column, they can be quantified by a suitable detector and/or be collected for further analysis. There are some types of detectors and the choice of the ones depends on the type of components that will be detected and measured. The most common detectors are: flame ionization detectors (FIDs), thermal conductivity detectors (TCDs), electron capture detectors (ECDs), alkali flame ionization detectors -also called nitrogen/phosphorous detectors (NPDs), flame photometric detectors (FPDs) and photo ionization detectors (PIDs). Several of these are further described in separate leaflets [41,42].
GC is a widely used method for separating and analyzing organic compounds. There are a variety of applications for gas chromatography in every laboratory and in different processes within several industries. In chemical, petrochemical and pharmaceutical industries we can have measurements of any kind of organic compounds, such as process control as well as product control. Also for environmental measurements: aromatic pollutants in air and water, detection and measurement of pesticides, etc. Beside the wide application of GC, there are a few examples of applications on which this analysis technique plays an important role [43,44,45,46].
The detection of reliable biomarkers is a major research activity within the field of proteomics and a growing trend on metabolomics. A biomarker can be a single molecule or set of molecules that can be used to differentiate between normal and diseased states and can be separated and detected by Gas Chromatography -Mass Spectrometry (GC/MS). This combined technique is used to identify the presence of different substances in a given sample. Kuhara et al. (2011) [47] has used a GC/MS-based approach to investigate the metabolome in urine of patients whom had been previously diagnosed with citrin deficiency. In this noninvasive technique, urine metabolic profiling provided should assist in the rapid and more reliable differential chemical diagnosis of citrin deficiency from other hyperammonemic syndromes.
Another application of GC/MS in biomarker analysis is its application on the studies of volatile organic compounds (VOCs). These compounds are exhaled in breath and provide valuable information about the human health status. The composition of the breath is variable and depends on the disease's characteristics; for example, a sweetened smell indicates diabetes, while the odor of rotten eggs, which are caused by sulfur-containing compounds, suggests liver problems [48,49]. Rudnicka (2011) [50] employed solid phase micro-extraction technique and gas chromatography coupled to time of flight in mass spectrometry (GC-TOF/MS) for the analysis of VOCs on exhaled air from patients with lung cancer and healthy persons. The total number of identified compounds in breathing samples equal 55 and the compound that enables as an indication of lung cancer was isopropyl alcohol.
These studies show how highly important and relevant are the studies on the use of chromatographic techniques for biomarker analysis and identification. It shows a wide range of applications in a field not yet fully developed, which still may be a very suitable area for new ideas and uses for the next couple decades.

Statistical and chemometrical analysis of biomarkers
In metabolomics, as well as in other branches of science and technology, there is a steady trend towards the use of more variables (properties) to characterize observations (e.g., samples, experiments, time points). Often, these measurements can be arranged into a data table, where each row constitutes an observation and the columns represent the variables or factors we have measured (e.g., wavelength, mass number, chemical shift, etc). This development generates huge and complex data tables, which are hard to summarize and overview without appropriate tools. Recently, with development of "omics" technologies (metabolomics, proteomics, foodomics, genomics, etc), the adoption of chemometric methods has been playing a very important role in planning and analyzing the obtained results. That includes efficient and robust methods for modeling and analysis of complex chemical or biological data tables that produce interpretable and reliable models capable of handling incomplete, noisy, and collinear data structures. These methods include principal component analysis (PCA) and partial least squares (PLS). It is also completely important to emphasize that chemometrics also provides a straightforward way to collect relevant information through statistical experimental design (SED) [51,52,53].
Multivariate statistical analysis such as Principal Components Analysis (PCA) is probably the most widely used technique for analyzing metabolomics. PCA technique is robust and objective and it is an appropriate way to reduce data sets containing high numbers of variables. By reducing the number of original variables to a smaller number of independent variables, this approach highlights fundamental differences between groups of variables. PCA has been extensively used in metabonomics literature. Despite apparent satisfying published results, the known large sensitivity of PCA to noise can suggest that improvements are expected with more robust methods to identify biomarkers in noisy data. Moreover, the traditional use of PCA remains highly questionable: biomarkers are identified from the loadings of the two first principal components, while the two first components do not necessarily contain the most relevant variations between altered and normal spectra. Sometimes, the results of the initial unsupervised analysis are confirmed by a second supervised analysis. This one employs classification methods as Partial Least Squares (PLS), SIMCA and neural networks, allowing firstly to separate normal and altered spectra, and secondly to identify more robust biomarkers [54,55].
Other data analysis methods frequently employed for disease diagnosis and biomarker identification in metabolomics are Univariate Testing, Soft independent modeling of class analogy (SIMCA), Linear discriminant analysis (LDA), Partial least squares discriminant analysis (PLS-DA), Orthogonal projection to latent structures discriminant analysis, (OPLS-DA), Neural networks (NN), Self organizing maps (SOM) and Support vector machines (SVM). Regardless of the chosen method, both statistical and biological validations are critical. Multivariate methods are of special importance to metabolomics since one biomarker often will not be sufficiently specific for a given condition by itself. There is a wide range of methods and it is natural that this can seem confusing to the non-specialist. The literature has already shown in previous works that it is more important that the chosen method is used correctly than the methodology itself. The reason for this is that all methods are data-driven, and since the parameter definition is through pre-processing, the contained features are static. Many statisti-cal methods will highlight the same metabolites with similar classification ability. It is clear, however, that pre-processing and scaling of the data can lead to dramatically different results, both with regard to chosen biomarkers and classification ability of the model [53].