Systems Glycobiology: Past, Present, and Future

Glycobiology is a glycan-based field of study that focuses on the structure, function, and biology of carbohydrates, and glycomics is a sub-study of the field of glycobiology that aims to define structure/function of glycans in living organisms. With the popularity of the glycobiology and glycomics, application of computational modeling expanded in the scientific area of glycobiology over the last decades. The recent availability of progressive Wet-Lab methods in the field of glycobiology and glycomics is promising for the impact of systems biology on the research area of the glycome, an emerging field that is termed “ systems glycobiology. ” This chapter will summarize the up-to-date leading edge in the use of bioinformatics tools in the field of glycobiology. The chapter provides basic knowledge both for glycobiologists interested in the application of bioinformatics tools and scientists of computational biology interested in studying the glycome.


Introduction
Glycans are long chains of carbohydrate-based polymers composed of repeating units of monosaccharide monomers bound together by glycosidic linkages. Complex and diverse glycans appear to be ever-present macromolecules in all cells in nature, and essential to all biological systems. Glycans play physical, structural, and metabolic roles in living organisms [1]. In the last century, knowledge on the biochemistry and biology of nucleic acids and proteins rapidly increased. Nevertheless, it has been much more difficult to understand the biology of glycans, which are main component of the cell surface [2]. The biosynthesis mechanism of glycans is totally different from those of nucleic acids and proteins. Biological mechanism of glycans is complex, which makes analysis of them extremely difficult and limits our understanding of mechanisms responsible for biological functions of glycans [3]. After the genomics revolution and development of high-throughput technologies, scientific interests increased to understand the characterization, function, and interaction of other significant biomolecules (e.g., DNA transcripts, proteins, lipids, and glycans) for the cell. These interests resulted in emergence of other omic types such as transcriptomics, proteomics, metabolomics, lipidomics and glycomics [4]. From the perspective of evolutionary conservation, conservation decreased in the order genomics, transcriptomics, proteomics, metabolomics, lipidomics, and glycomics. On the other hand, reverse order is present for informational diversity of these fields of omics (Figure 1) [5].
With the progress in high-throughput technologies, studies on glycobiology increased to screen cells quickly and generate huge glycomics data sets. Moreover, advanced analytical techniques and tools for data analysis provide possibility to improve high-throughput techniques for screening glycans as a marker of diseases and to classify structure of glycans in therapeutic proteins [6].

Glycans
Glycans are linear or branched sugar macromolecules composed of repeating monosaccharides linked glycosidically. Beside nucleic acids and protein, glycans are known as the third dimension in molecular biology [7,8]. These macromolecules can be found in the form of heteropolysaccharides or homopolysaccharides. Furthermore, glycoconjugates (glycolipid, glycoprotein and proteoglycan), can be also considered as glycan despite the fact that the carbohydrate part of glycoconjugates are only oligosaccharides [9]. In glycoproteins, oligosaccharides and proteins can be linked in different forms, namely N-linked glycans and O-linked glycans. Nacetylglucosamine is linked to the amide side chain of asparagine in N-linked glycans. C-1 of N-acetylgalactosamine is linked to the hydroxyl function of serine or threonine in O-linked glycans [10].
With the increasing researches in glycoscience, many different roles of glycans in biological systems have been revealed in the last decades. Significant functions of glycans have been determined in numerous research areas such as immunity, development and differentiation, biopharmaceuticals, cancer, fertilization, blood types, infectious diseases, etc. Glycans are called as "cloths of cells" since they are present on the surface of the cell and responsible for the signaling and communications between cells. Glycans can be classified in several ways. Varki divided the biological roles of glycans into four main categories: (1) structural and modulatory roles, (2) extrinsic (interspecies) recognition of glycans, (3) intrinsic (intraspecies) recognition of glycans, and (4) molecular mimicry of host glycans. A total of 50 distinct roles are defined under these main categories [1].
Glycans perform huge range of biological function due to the diversity of them, and they have significant roles in several physiological and pathological events, such as cell growth, cell signaling, cell-cell interactions, differentiation, and tumor growth [11][12][13]. In biological systems, information is carried by glycans, which are significant biomarker candidates for many diseases such as cardiovascular diseases, deficiencies of immune system, genetically inherited disorders, several cancer types, and neurodegenerative diseases [14][15][16]. Alteration of glycan expression is observed during the development and progression of these diseases, which is caused by misregulated enzymes such as glycosyltransferases and glycosidases. As a result, altered glycan structures have potential use for the identification of these diseases at an early stage. Besides significant role of glycans in diagnosis and management of disease, they can be used as therapeutics, markers for identification and isolation of special cell types, and targets in discovery of drugs [17][18][19]. Moreover, glycans can be considered as an ideal target for vaccines due to the presence of them on the surface of several different pathogens and malignant cells. High affinity and exquisite specificity of other molecules to recognize glycans are a vital point of developments in the research of glycans and related diagnostics and therapeutic applications.

Glycomics
Glycosylation plays significant roles in many biological processes including growth and development of cell, tumor growth and metastasis, immune recognition and response, intercommunication of cells, and microbial pathogenesis. As a result, glycosylation of proteins is the one of the most common and significant posttranslational modifications of proteins [20,21]. Furthermore, more than half of proteins undergo glycosylation [6]. Many issues such as genetic factors, nucleotide levels of monosaccharides, cytokines, metabolites, hormones, and ecological factors can affect and change glycosylation process [20][21][22][23][24]. Thus, integration of omics approaches (e.g., proteomics, genomics, transcriptomics, and metabolomics) to the field of glycobiology is essential to view the big picture of the whole biological system [20,21,25]. Furthermore, for the analysis of glycans and glycosylation pathways, many glycoinformatics tools and databases are now accessible [6].
Glycomics is one of the most recent types of omics area which is responsible for the structure and function evaluations of glycans in bio-systems [26]. Integrating glycomics to other fields of omics provides new system-scale insights in integrative biology [27].
Moreover, glycomics informs other crucial scholarships such as systems glycobiology and personalized glycomedicine that collectively aim to explain the role of glycans in person-to-person and between population variations in disease susceptibility and response to health interventions such as drugs, nutrition, and vaccines. Glycosylation is present in both normal and diseased individuals [1]. Abnormal glycosylation is observed in a variety of diseases. Difference between glycosylation patterns of healthy and diseased individuals can be used as glycobiomarkers in personalized medicine [28]. As a result, many new medical implications will be enabled by glycobiology and glycopathology [29]. Development of glycomedicine can be contributed by holistic approach of functional and structural glycomics, which have applications in therapy development, fine-tuning immunological responses and the performance of therapeutic antibodies and boosting immune responses [28,30]. Many applications of glycan arrays are present in many fields, from basic biochemical research to biomedical applications [31]. In addition to shotgun glycan microarrays [32], cell-based array resource has been developed [33]. These developments enable deeper understanding of the many biological roles of the glycome. Nevertheless, multiplatform and multiomics technologies are expected to further extend the knowledge of molecular mechanisms of glycans.

Major glycomics techniques
Monosaccharides represent four free hydroxyl groups for the linkage of another monosaccharide. As a result of this, glycans have more complex structure compared to structure of peptides and nucleic acids. It is known that glycans are more than the sequential monosaccharides; monomer types, modifications, the position of modifications around the ring of sugar, glycopolymer branching, and linkages chirality are the factors that are responsible for the complexity. As a result, sequencing techniques used for peptides or DNA (Sanger or Edman sequencing) are not appropriate for glycans. Moreover, most of the glycans are present as a part of a glycoconjugate. Therefore, glycan part should be released from lipid or protein part, by the use of enzymatic or chemical methods and isolated for analysis.
In the last decades, a number of techniques developed and applied to determine structure of the glycans with different degrees of detail [34]. A traditional method is to label the glycoconjugates radioactively and then apply anionic exchange, gel filtration, or paper chromatographic analyses prior and subsequent to enzymatic or chemical treatments. Still, it is difficult to figure out the definition of the actual structure; in consequence, in earlier studies, if adequate amounts were present, gas chromatography together with mass spectrometry (GC-MS) and/or nuclear magnetic resonance (NMR) studies were performed. However, these analyses involve special expertise to perform the research and interpret the results, particularly if standards were unavailable to compare with results.
HPLC and UPLC have superseded simple chromatography systems in recent years, and radioactive labeling has been replaced by fluorescent labeling. Nowadays, variable columns such as graphitized carbon, reversed-phase (RP), anion exchange, normal phase, or hydrophilic interaction resins can be used along with suitable enzymatic/chemical treatments. A less used alternative is to analyze glycans at elevated pH. As a result of this, the hydroxyl side chain deprotonation occurs, that enables the usage of anion exchange together with amperometric detection (HPAEC-PAD). On the other hand, glycan structure cannot be defined only by HPLC retention times, and for the unknown structure, analyses in the absence of standards should be interpreted with attention [35].
With the improvements in the types and the sensitivity, contribution of mass spectrometer to studies of glycans and glycoconjugates has increased in the last decades [36,37]. At first, for the analysis of variable types of glycopolymers from different sources, researchers used fast atom bombardment mass spectrometers (FAB-MS). For the analyses with FAB-MS, chemical modifications such as methylation and acetylation were required. As an alternative method, matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was developed and analysis of both permethylated and native glycans can be performed with MALDI-TOF MS. Furthermore, current numerous electrospray techniques with many detector types have significance in glycomics. Mainly, a significant point in MS-based analysis is the capability to obtain glycan fragments. Besides, the preparation and separation techniques are of great importance to obtain the best results. As a consequence, liquid chromatography-mass spectrometry (LC-MS) in a number of forms is in general necessary since glycans with low abundancy or poor ionization capacity can be suppressed in the case of whole glycome examination. Moreover, reanalysis after the treatment of a chemical and an enzyme results in maximization of the ability to obtain clear results from the existing data.
Glycan is generally a part of the glycoconjugate; thus, glycoproteomics and glycolipidomics that consider both peptide/lipid and glycan parts are significant fields. At this point, mass spectrometry technique comes into prominence [38]. Both glycan and polypeptide/lipid parts can be studied with this technique. On the other hand, glycan parts of glycoproteins and glycolipids can be in various forms even if the polypeptide/lipid part is same, defined as microheterogeneity. The nature of glycan modifications is non-template driven and that leads to mentioned microheterogeneity [39].
Blotting technique can be used for simple screens. Reagents such as lectins and anti-carbohydrate antibodies with low specificity are often used for this technique; as a result, misleading results are often obtained [40]. Still, lectins, and antiserums have significance for immune responses in animals. New array-based systems can provide essential clues on proteins bounded to glycans [41].

Systems glycobiology and integration of omics data sets
Developments in integrative informatics and systems biology of glycans based on a holistic approach can make available a more comprehensive analysis. It elucidates annotation of glycans, enzyme levels, abundances of glycans biosynthesis pathways, and other omics data sets which are complementary. Though, several tools are developed for proteomics and genomics data sets and standard bioinformatics approaches are used in these tools, the complex relationships between diverse components (such as glycans, enzymes, transporters, and sugar nucleotides) of the glycosylation process are not considered by most of the existing bioinformatics tools. Consequently, the use of these tools for glycomics data sets has some limitations. The genome does not encode glycans directly and unlike proteins, interconnected action of many enzymes provides assembly of glycans. Due to mentioned limitations, developments in glycan analysis tools and methods have been delayed and most of the present glycoinformatics tools are special for single type data analysis [42][43][44][45]. For instance, database matching between obtained MS results and specific glycans in a glycan library is used as a mutual method for MSbased glycoprofiling for the purpose of individual peak annotation [46,47]. If the complexity of glycosylation is wanted to be considered, enzymes of the organism which synthesize the studied glycans should generate glycan structures used for the annotation of the spectrum [48]. Due to this alignment, activities of enzyme and those structures assigned to each peak in the same spectrum will be consistent.
Although many omics approaches have significant progress in the last decades, existing techniques of bioinformatics are still unsatisfactory for the integration of varied data sets [49][50][51]. For instance, relations between expression levels of gene and specific glycan linkages abundance are investigated by statistical databasedriven approaches, and these approaches could not predict quantity of detailed glycan distributions [50,51]. This indicates the necessity of glycoinformatics and systems biology tools integration for the identification of glycan structure and these should be also linked to the information of gene expression responsible for glycosylation enzymes which synthesize these glycans. In order to understand levels of mRNA which is related with the distribution and quantity of glycans present within healthy and diseased cells, mathematical modeling of glycosylation is considered as a promising method [48,49,52].
Variability in the platform of analytical high-throughput experiments can be reduced by data integration approach. Increased confidence of biomarker predictions and recommendations can be obtained if different data from experiments such as glycogenes expression information or mass spectra profile confirm the results from integrative glycoinformatics and systems glycobiology tools. Although integrated glycoinformatics tools have limitations in analytical sensitivities, analysis and comparison of various results with various platforms are enabled by these tools [6].
The integration of glycomics with other various omics data is promising for further innovation in diagnosis and treatment of diseases [30]. The start point of multiomics data integration is to sort the data based on the omics level. In the following part, association between glycomics and other omics levels will be represented.

Genomics
Integration of glycomics with genetic sequence can be occur in a number of ways. For instance, glycosylation site can be gained or lost with the variation of sequence. A single-nucleotide polymorphism (SNP) affects glycosylation of prostate-specific antigen (PSA) and an altered function of it increases the risk of prostate cancer. Functional analysis indicated that the stability and structural conformation of PSA are affected by missense variant rs61752561, which causes an additional extra glycosylation site [53]. Furthermore, computational studies revealed that variations in cancer somatic cells have potential to cause gain or loss of glycosylation. In addition to SNP, variations in structure and abnormalities in cytogenetics could be integrated with glycomics. Cytogenetic abnormalities have been associated with glycome expression [54]. A particular glycosyltransferase can glycosylate numerous proteins, so genetic variants of it have extraordinary significance because function of many glycoproteins can be affected by a single difference in activity of enzyme. Several downstream pathways and cell metabolism can be affected by a genetic or epigenetic variant that is called pleiotropic effect of genetic or epigenetic variant on glycosylation [55].

Transcriptomics
Most of the glycomics research have been done at the level of transcriptome, which can be performed either at a particular locus or with a technology of microarray. In colorectal cancer (CRC), glycosyltransferase ST6GAL1 is associated with cancer, and altered ST6GAL1 expression was found by The Cancer Genome Atlas (TCGA) mining [56]. Moreover, in order to identify differential expression of glycosylation-related genes Saravanan et al. [57] used GLYCOv2 glycogene microarray technology. In the further studies, myeloma was compared with normal plasma cell samples and 60 upregulated and 20 downregulated genes were found among 243 genes in glycan-biosynthesis pathway [54]. A novel molecular signature that is enriched for enzymes of glycosylation was revealed by meta-analysis performed for gene expression of prostate cancer [58]. Additionally, hepatocellular carcinoma was investigated by reviewing gene expressions that are related with core fucosylation of the disease [59]. More systematic reviews and meta-analyses are required to develop reliable biomarkers.

Glycoproteomics
Studies on glycoproteomics include peptide structures, glycan structures, and sites of glycosylation [30]. Single site on the peptide chain can be glycosylated by different glycans, and by this way, glycans can modulate function of the protein [60]. In the literature, diverse techniques were associated with different phenotypes, for instance, breast cancer, colon cancer, liver cancer, skin cancer, ovary cancer, bladder cancer, and neurodegenerative diseases, and additionally, a number of structural variations including sialylation, fucosylation, degree of branching, and specific glycosyltransferases expression [61][62][63]. For instance, cerebrospinal fluid N-glycoproteomics is of significant importance in early diagnosis of Alzheimer's disease. Glycosylation patterns were assessed in patients and therapeutics targets such as glycoenzymes were suggested [63]. For the diagnosis of pancreatic cancer, specific glycoforms together with protein levels should be measured to improve potential for diagnosis [64]. Glycoproteins constitute the majority of protein tumor markers approved by Food and Drug Administration (FDA), and they are also used currently in clinical practice. Many of these glycoproteins have alterations of glycosylation in cancer [60]. MUC-1 (CA15-3/CA27.29) [65] and plasminogen activator inhibitor (PAI-1) [66] are biomarkers of breast cancer; beta-human chorionic gonadotropin (Beta-hCG) [67] is biomarker of colorectal cancer; alpha-fetoprotein (AFP) [68] is a biomarker of liver cancer and germ cell tumors; chromogranin A (CgA) [69] is a biomarker of neuroendocrine tumors; MUC16 (CA-125) [70] and HE4 [71] are biomarkers of ovarian cancer; and many other biomarkers are present for a variety type of cancer. Most of the results in the existing publications are heterogeneous; thus, systematic integrative reviews of the literature are required for further development of glycoproteomics.

Metabolomics
Metabolomics is the large-scale study of the small molecule substrates that investigates variations in the metabolites within cells, biofluids, tissues, or organism. Metabolomics and glycomics were investigated in the research of posttraumatic stress [72]. According to the researchers of this study, these biomarkers together with omics markers should be integrated to understand the biological differences responsible for this stress. For discovery of liver cancer biomarker, proteomics, glycomics, and metabolomics were integrated and this integration enhanced performance when compared to separate omics data [73]. Physiological and pathological conditions are reflected by metabolomic and glycomic data in individuals. Similar to metabolites, small glycans can be quantified easily [74]. Human Metabolome Database (HMDB) is the most inclusive metabolite source that offers significant resource for the discovery of biomarkers in glycomics [75].

Glycolipidomics
Glycolipidomics is a scientific field that identifies and quantifies glycolipids. For the determination of physiological and pathological conditions of individual, glycolipids can be used as a specific biomarker. They take role in development of neurological and neurodegenerative diseases, such as Lewy body dementia, Alzheimer's disease, Parkinson's disease, and frontotemporal dementia [30]. Furthermore, glycosphingolipids are associated with cancer and they are promising molecules for diagnosis as biomarkers and for malignant tumor immunotherapy as target [76]. More recently, Dehelean et al. [77] reviewed trends in the discovery of glycolipid biomarker by MS.

Interactomics
Interactomics is the research field that investigates whole set of interactions between molecules including glycans. Interaction of glycans with glycan-binding proteins (GBPs) is of significant importance in immune response, signaling, cell recognition, infections, neurodegenerative diseases, and cancer. High-throughput technologies ease studies also on interactomics [78]. UniLectin3D is a database that catalog lectins that are most studied GBPs. Database consists of curated information on 3D structures and interacting ligands [79]. Lectin-glycan interaction on surface of the cell is a significant factor for the regulation in corneal biology (i.e., corneal infection) and pathophysiology (i.e., inflammation) [80]. The whole protein-glycan interactome information has not been obtained yet [41]. For future studies, estimated number of interactions is of importance. GenProBiS is a bioinformatics tool that analyzes binding sites between peptide-peptide, peptide-nucleic acid, and peptide-compound and also sites of glycosylation and other posttranslational modifications. Furthermore, it provides maps between sequence variations and structure of protein. More developments of bioinformatics tools analyzing huge data will prioritize the objections for experimental verification and provide contribution to interactomics development.

Other omics fields
In future studies, many other omics fields should be associated with glycomics such as comparative genomics, epigenomics, regulomics, NcRNomics, MiRNomics, LncRNomics, etc. Although glycomics is the significant field related with molecular interactions, information about how these complex processes controlled by regulatory network is still inadequate. In addition to classic omics fields, omics applications such as iatromics, environmental omics, pharmacogenomics, and nutrigenomics should also be reviewed.

Bioinformatics tools and databases
Glycoinformatics combines bioinformatics tools with glycome. Glycomics data is collected by the tools and databases to investigate, reveal, and associate with other repository of related data of proteomics, genomics, and interactomics. Commonly used tools and databases are summarized in Table 1.

Current bottlenecks for systems glycobiology
System-based analyses applied smoothly to network of signaling, metabolic processes, and physiological modeling; however, applications in systems glycobiology still have problems in computational and analytical studies and this situation arises from prominent bottlenecks [81]: (i) there is no accepted standard for model building; (ii) glycoinformatics databases are underdeveloped; (iii) and insufficient quantitative data are from glycoproteomics experiments.
In recent years, many systems based models have been developed to simulate biosynthesis of glycans. Nevertheless, difficulty in the incorporation of glycan structure and specificity data of enzymes related with glycosylation into mathematical models. As a result of this difficulty, systematic model building is still not present in this field. Moreover, limited number of the current models is available in Systems Biology Markup Language (SBML) format [82], which is the obstacle to develop, share, and validate computational models.
In the last decades, many databases related to glycoscience have emerged. Nevertheless, functional information is limited when compared to glycan structure and taxonomy data. In the future, relation of glycan structure to specific enzymes that synthesize them, the rates of their synthesis, and also their function are required in order to build model. For the measurement of glycome, two main approaches are common. In the first approach, enzymes or mild hydrolysis is used to separate the glycans from the peptide backbone. Next, to obtain information about the composition and relative abundance of the carbohydrate structures, permethylation of glycans and MS analysis are used [83]. The bottleneck is the lack of well-developed software. For the data analysis of glycoproteomics and correspondingly acceleration of system-based model building and validation, more sophisticated computational tools are required.

Mathematical modeling of biochemical reaction networks
Mathematical models of glycosylation are developed in three main stages: (i) biological information gathering; (ii) model formulation; (iii) and simulation and postsimulation analysis. First step includes definition of components (enzyme, substrate, and product) crucial for the model. All of the components present in the biochemical network and connections between them are cataloged in this step. The process relies on information of biochemistry and cell biology, and analytical tools. In the next step, behavior in the steady state of the system is investigated by using simple linear algebra and principles of optimization. If time is a variable, the computer model can incorporate ordinary differential equations (ODEs) or Boolean networks. Proper kinetic/thermodynamic/stochastic/optimization parameters are collated depending on the formulation nature of the model and processes which are specified by enzymatically/nonenzymatically. The last step is performed to simulate the experimental system in the computer and to define unknown parameters of model by the help of fitting experimental data [81]. Visualization of multidimensional results is significant because numerous diverse models may attempt to fit one data set obtained from time labor and concentration-dependent experiments. As a result, consolidation of the findings obtained by simulations of complex reaction network and generation of hypotheses that can be tested experimentally require network analysis strategies.

Conclusions
Glycomics is a very comprehensive research area of science and interacts with several different omics fields. As many other omics types, it consists of a huge number of genomics components. In the future, techniques in high-throughput analysis and bioinformatics will be developed and enable the integration of all available data of glycomics into a particular diagram and by this way, it will be possible to develop biomarker and identify potential new therapeutic targets. Moreover, progresses in the field reveal that integrative multiomics approach should include glycomics in order to develop new biomarkers for robust diseases. One of the specific fields of systems biology is the systems glycobiology. It is based on a holistic approach that indicates process of complex glycosylation and associations between its constituents. A more complete glycome overview is targeted by using enzyme levels, abundances of glycans, pathways for biosynthesis, glycan annotation, and related omics data sets.
An approach of systems glycobiology is constructed in combination of various data sets of glycomics with that of other omics fields by the use of glycoinformatics tools to clarify understanding on process of glycosylation from various data sets. With the presented chapter, main aspects of glycobiology, glycomics, and systems glycobiology are summarized. However, these fields are still developing and further developments provide more insight to this specific research area.