InTechOpen uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Computer and Information Science » Numerical Analysis and Scientific Computing » "Computational Biology and Applied Bioinformatics", book edited by Heitor Silverio Lopes and Leonardo Magalhães Cruz, ISBN 978-953-307-629-4, Published: September 2, 2011 under CC BY-NC-SA 3.0 license. © The Author(s).

Chapter 4

In Silico Analysis of Golgi Glycosyltransferases: A Case Study on the LARGE-Like Protein Family

By Kuo-Yuan Hwa, Wan-Man Lin and Boopathi Subramani
DOI: 10.5772/22549

Article top


Bioinformatics workflow of LGTBase.
Figure 1. Bioinformatics workflow of LGTBase.
The contents of LGTBase database
Figure 2. The contents of LGTBase database
Phylogenetic tree of LARGE-like Protein Family
Figure 3. Phylogenetic tree of LARGE-like Protein Family
Database selected for construction of the knowledge management platform
Figure 4. Database selected for construction of the knowledge management platform
The BlastP tool of the LGTBase platform to find similar sequences to LARGE
Figure 5. The BlastP tool of the LGTBase platform to find similar sequences to LARGE
DXD motif search tool of the LGTBase platform for DXD motif prediction
Figure 6. DXD motif search tool of the LGTBase platform for DXD motif prediction
TMHMM analysis tool of the LGTBase platform for Transmembrane domain prediction
Figure 7. TMHMM analysis tool of the LGTBase platform for Transmembrane domain prediction
MEME analysis tool of the LGTBase platform to predict the sequence motifs
Figure 8. MEME analysis tool of the LGTBase platform to predict the sequence motifs
Pfam analysis tool of the LGTBase platform to identify the known protein family of the target protein which is studied
Figure 9. Pfam analysis tool of the LGTBase platform to identify the known protein family of the target protein which is studied
Phylogenetic analysis tool of the LGTBase platform to study the evolutionary relationship of the target protein
Figure 10. Phylogenetic analysis tool of the LGTBase platform to study the evolutionary relationship of the target protein

In Silico Analysis of Golgi Glycosyltransferases: A Case Study on the LARGE-Like Protein Family

Kuo-Yuan Hwa1, 2, Wan-Man Lin1, 2 and Boopathi Subramani1, 2

1. Introduction

Glycosylation is one of the major post-translational modification processes essential for expression and function of many proteins. It has been estimated that 1% of the open reading frames of a genome is dedicated to glycosylation. Many different enzymes are involved in glycosylation, such as glycosyltransferases and glycosidases.

Traditionally, glycosyltransferases are classified based on their enzymatic activities by Enzyme Commission ( Based on the activated donor type, glycosyltransferases are named, for example glucosyltransferase, mannosyltransferase and N-acetylglucosaminyltransferases. However, classification of glycosyltransferases based on the biochemical evidence is a difficult task since most of the enzymes are membrane proteins. Reconstruction of enzymatic assay for the membrane proteins are intrinsically more difficult than soluble proteins. Thus the purification of membrane-bound glycosyltransferase is a difficult task. On the other hand, with the recent advancement of genome projects, DNA sequences of an organism are readily available. Furthermore, bioinformatics annotation tools are now commonly used by life science researchers to identify the putative function of a gene. Hence, new approaches based on in silico analysis for classifying glycosyltransferase have been used successfully. The best known database for classification of glycosyltransferase by in silico approach is the CAZy (Carbohydrate- Active enZymes) database ( (Cantarel et al., 2009).

Glycosyltransferases are enzymes involved in synthesizing sugar moieties by transferring activated saccharide donors into various macro-molecules such as DNA, proteins, lipids and glycans. More than 100 glycosyltransferases are localized in the endoplasmic reticulum (ER) and Golgi apparatus and are involved in the glycan synthesis (Narimatsu, H., 2006). The structural studies on the ER and golgi glycosyltransferases has revealed several common domains and motifs present between them. The glycosyltransferases are grouped into functional subfamilies based on similarities of sequence, their enzyme characteristics, donor specificity, acceptor specificity and the specific donor and acceptor linkages (Ishida et al., 2005). The glycosyltransferase sequences comprise of 330-560 amino acids long and share the same type II transmembrane protein structure with four functional domains: a short cytoplasmic domain, a targeting / membrane anchoring domain, a stem region and a catalytic domain (Fukuda et al., 1994). Mammals utilize only 9 sugar nucleotide donors for glycosyltransferases such as UDP-glucose, UDP-galactose, UDP-GlcNAc, UDP-GalNAc, UDP-xylose, UDP-glucuronic acid, GDP-mannose, GDP-fucose, and CMP-sialic acid. Other organisms have an extensive range of nucleotide sugar donors (Varki et al., 2008). Based on the structural studies, we have designed an intelligent platform for the LARGE protein, a golgi glycosyltransferase. The LARGE is a member of glycosyltransferase which has been studied in protein glycosylation (Fukuda & Hindsgaul, 2000). It was originally isolated from a region in chromosome 22 of the human genome which was frequently deleted in human meningiomas with alteration in glycosphingolipid composition. This led to a suggestion that the LARGE may have possible role in complex lipid glycosylation (Dumanski et al., 1987; Peyrard et al., 1999).


LARGE is one of the largest genes present in the human genome and it is comprised of 660 kb of genomic DNA and contains 16 exons encoding a 756-amino-acid protein. It showed 98% amino acid identity to the mouse homologue and similar genomic organization. The expression of LARGE is ubiquitous but the highest levels of LARGE mRNA are present in heart, brain and skeletal muscle (Peyrard et al., 1999).

LARGE encodes a protein which has an N-terminal transmembrane anchor, coiled coil motif and two putative catalytic domains with a conserved DXD (Asp-any-Asp) motif typical of many glycosyltransferases that uses nucleoside diphosphate sugars as donors (Longman et al., 2003& Peyrard et al., 1999). The proximal catalytic domain in the LARGE was most homologous to the bacterial glycosyltransferase family 8 (GT8 in CAZy database) members (Coutinho et al., 2003). The members of this family are mainly involved in the synthesis of bacterial outer membrane lipopolysaccharide. The distal domain resembled the human β1,3-N-acetytglucosaminyltransferase (iGnT), a member of GT49 family. The iGnT enzyme is required for the synthesis of the poly-N-acetyllactosamine backbone which is part of the erythrocyte i antigen (Sasaki et al., 1997). The presence of two catalytic domains in the LARGE is extremely unusual among the glycosyltransferase enzymes.

2.1. Functions of LARGE

2.1.1. Dystroglycan glycosylation

The Dystroglycan (DG) is an important constituent of the dystrophin-glycoprotein complex (DGC). This complex plays an essential role in the maintaining the stability of the muscle membrane and for the correct localization and/or ligand-binding activity, the glycosylation of some of these components are required (Durbeej et al., 1998). The DG comprises of two subunits, the extracellular α-DG and the transmembrane β-DG (Barresi, 2004). Various components present in the extracellular matrix including laminin (Smalheiser & Schwartz 1987), agrin (Gee et al., 1994), neurexin, (Sugita et al., 2001), and perlecan (Peng et al., 1998) interacts with α-DG. The carbohydrate moieties present in the α-DG are essential to bind with laminin and other ligands. The α-DG is modified by three different types of glycans such as: mucin type O-glycosylation, O-mannosylation, and N-glycosylation. The glycosylated α-DG is essential for the protein’s ability to bind the laminin globular domain-containing proteins of the Extracellular Matrix (Kanagawa, 2005). LARGE is required for the generation of functional, properly glycosylated forms of α-DG (Barresi, 2004).

2.1.2. Human LARGE and α-Dystroglycan

The α-DG functional glycosylation by LARGE is likely to be involved in the generation of a glycan polymer which gives rise to the broad molecular weight range observed for α-DG detected by VIA4-1 and IIH6 antibodies. Both the human and mouse LARGE C-terminal glycosyltransferase domain is similar to β3GnT6, which adds GlcNAc to Gal to generate linear polylactosamine chains (Sasaki et al., 1997), the chain formed by LARGE might also be composed of GlcNAc and Glc.

In 1963, Myodystrophy, myd, was first described (Lane et al., 1976) as a recessive myopathy mapping to chromosome (Chr) 8, was identified as an intragenic deletion within the glycosyltransferase gene, LARGE. In Largemyd and enr mice, the hypoglycosylation of α-DG in DGC was due to the mutation in LARGE (Grewal et al., 2001). The α-DG function was restored in Largemyd skeletal muscle and ameliorates muscular dystrophy when LARGE gene was transferred, which indicated that adjustment in the glycosylation status of α-DG can improve the muscle phenotype.

The patients with clinical spectrum ranging from severe congenital muscular dystrophy (CMD), structural brain and eye abnormalities [Walker-Warburg syndrome (WWS), MIM 236670] to a relative mild form of limb-girdle muscular dystrophy (LGMD2I, MIM 607155) are linked to the abnormal O-linked glycosylation of α-DG (van Reeuwijk et al., 2005). A study made by Barresi R. et al. (2004) revealed the existence of dual and concentration dependent functions of LARGE. In physiological concentration, LARGE may be involved in regulating the α-DG O-mannosylation pathway. But when the LARGE is expressed by force, it may trigger some other alternative pathways for the O-glycosylation of α-DG which can generate a type of repeating polymer of variable lengths, such as glycosaminoglycan-like or core 1 or core 2 structures. This alternative glycan mimics the O-mannose glycan in its ability to bind α-DG ligands and can compensate for the defective tetrasaccharide. The functional LARGE protein is also required for neuronal migration during CNS development and it rescues α-DG in MEB fibroblasts and WWS cells (Barresi R. et al., 2004).

2.1.3. LARGE in visual signal processing

The role of LARGE in proper visual signal processing was studied from the retina retinal pathology in Largemyd mice. The functional abnormalities of the retina was investigated by a sensitive tool called Electroretinogram (ERG). In Largemyd mice, the normal a-wave indicated that the mutant glycosyltransferase does not have any effect on its photoreceptor function.

But the alteration in b-wave may have resulted in downstream retinal circuitry with altered signal processing (Newman & Frishman, 1991). The DGC may also have a possible role in this aspect of the phenotype. The abnormal b-wave was responsible for the loss of retinal isoforms of dystrophin in humans and mice similar to the Largemyd mice.

2.2. LARGE homologues

A homologous gene to LARGE was identified and named as LARGE2. It is found to be involved in α-DG maturation as like LARGE, according to Fujimura et al., (2005). It is still not well understood whether these two proteins are compensatory or cooperative. The co-expression of LARGE and LARGE2 did not increase the maturation of α-DG in comparison with either one of them alone and it proved that for the maturation of α-DG, the function of LARGE2 is compensatory and not cooperative. Gene therapy for muscular dystrophy using the LARGE gene is a current topic of research (Barresi R. et al., 2004; Braun, 2004). When compared to LARGE, LARGE2 gene may be more effective because it can glycosylate heavily than LARGE and it also prevents the harmful and immature α-DG production.

The closely related homologues of LARGE are found in the human genome, (glycosyltransferase-like 1B; GYLTL1B), mouse genome (Glylt1b; also called LARGE-Like or LargeL) and in some other vertebrate species (Grewal & Hewitt, 2002). The homologue gene is positioned on the chromosome 11p11.2 of the human genome and it encodes 721 amino acid protein which has 67% identity with LARGE, suggests that the two genes may have risen by gene duplication. Like LARGE, it is also predicted to have two catalytic domains, though it lacks the coiled-coiled motif present in the former protein. The hyperglycosylation of α-dystroglycan by the overexpression of GYLTL1B increased its ability to bind laminin and both the genes showed the same level of increase in laminin binding ability (Brockington, et al., 2005).

3. Bioinformatics workflow and platform design

Many public databases and bioinformatics tools have been developed and are currently available for use (Ding & Berleant, 2002). The primary goal of bioinformaticians is to develop reliable databases and effective analysis tools capable of handling bulk amount of biological data. But the objective of laboratory researchers is to study specific areas within the life sciences, which requires only a limited set of databases and analysis tools. Thus the existing free bioinformatics tools are sometimes too complicated for the biologists to choose. One solution is to have an expert team who are familiar with both bioinformatics databases and to know the needs of a research group in a particular field. The expert team will recommend a workflow by using selected bioinformatics tools and databanks and also helps the scientists with the complicated issue of tools and databases. Moreover, such a team could organize large number of heterogeneous sources of biological information into a specific, expertly annotated databank.

The team can also regularly and systematically update the information essential to help biologists overcome the problems of integrating and keeping up-to-date with heterogeneous biological information (Gerstein, 2000).

We have built a novel information management platform, LGTBase (Hyperlink).This composite knowledge management platform includes the “LARGE-like GlcNAc Transferase Database” by integrating specific public databases like CAZy database, and the workflow analysis combined the usage of specific, public & designed bioinformatics tools to identify the members of the LARGE-like protein family.

4. Tools and database selection

To analyze a novel protein family, biologists need to understand many different types of information. Moreover, the speed of discovery in biology has been expanding exponentially in recent years. So the biologists have to pick the right information available from the vast resources available. To overcome these obstacles, a bioinformatics workflow can be designed for analysing a specific protein family. In our study, a workflow was designed based on the structure and characteristics of LARGE protein as shown in Figure 1 (Hwa et al., 2007). The unknown DNA/protein sequences will be first identified as members of the known gene families by using the Basic Local Alignment Search Tool (BLAST). The blastp search tool is used to look for new LARGE-like proteins present in different organisms. The researchers who wish to use our platform can obtain the protein sequences either from the experimental data or through the blastp results. The search results were then analyzed with the following tools. To begin with, the sequences are searched for the aspartate-any residue-aspartate (DXD) motif. The DXD motifs present in some glycosyltransferase families are essential for its enzyme activity.


Figure 1.

Bioinformatics workflow of LGTBase.

The DXD motif prediction was then followed by the transmembrane domain prediction by using the TMHMM program (version 2.0; Center for Biological Sequence Analysis, Technical University of Denmark []). The transmembrane domain is a characteristic feature of the Golgi enzymes.

The sequence motifs are then identified by MEME (Multiple Expectation-maximization for Motif Elicitation) program (version 3.5.4; San Diego Supercomputer Center, UCSD []).

This program finds the motif-homology between the target sequence and other known glycosyltransferases. In addition to all the above mentioned tools, the Pfam search (Sanger Institute []) can also be used to find the multiple sequence alignments and hidden Markov models in many existing protein domains and families. The Pfam results will indicate what kind of protein family the peptide belongs to. If it is a desired protein, investigators can then identify the evolutionary relationships by using phylogenetic analysis.

4.1. LARGE-like GlcNAc transferase database

The specific annotation entries used in the LGTBase are currently being used in a configuration that uses the information retrieved from several databases.

In CAZy database (Carbohydrate- Active enZymes) database ([]), the glycosyltransferases are classified as families, clans, and folds based on their structural and sequence similarities, and also on their mechanistic investigation. The other databases used in this platform were listed in Table 1.

Database Description Website
EntrezGeneNCBI's repository for gene-specific information
GenBankNIH genetic sequence database, an annotated collection of all publicly available DNA sequences
DictybaseDatabase for model organism Dictyostelium discoideum
High-quality, manually annotated, non-redundant protein sequence database
InterProDatabase of protein families, domains and functional sites
MGIDatabase provides integrated genetic, genomic, and biological data of the laboratory mouse
EnsemblIt provides genome- annotation information
HGMDHuman Gene Mutation Database (HGMD) provides comprehensive data on human inherited disease mutations
UniGeneNCBI database of the transcriptome
GeneWikiThe database transfers information on human genes to Wikipedia article
TGDBDatabase with information about the genes involved in cancers
HUGEThe database provides the results of the Human cDNA project at the Kazusa DNA Research Institute
RGDDatabase with collection of genetic and genomic information on the
OMIMDatabase provides information on human genes and genetic disorders.
CGAPInformation of gene expression profiles of normal, precancer, and cancer cells.
PubMedDatabase with 20 million citations for biomedical literature from medical journals, life science journals, related books.
GORepresentation of and attributes across all

Table 1.

The information sources of LARGE-like GlcNAc Transferase Database

All the information related to the LARGE-like protein family was retrieved from the different biological databases. In order to confirm that the information obtained was reliable, the data was scrutinized at two levels. First the information was selected from the above mentioned biological databases with customized programs (using the perl compatible regular expressions). Then the obtained information was annotated and validated by experts in glycobiology and bioinformatics.

The annotated data in the LGTBase database was divided into nine categories (Figure 2). The first category is related to genomic location, displays the chromosome, the cytogenetic band and the map location of the gene. The second is related to aliases and descriptions, displays synonyms and aliases for the relevant gene, and descriptions of its function, cellular localization and effect on phenotype. The third category on proteins provides annotated information about the proteins encoded by the relevant genes. The fourth is about protein domains and families, provides annotated information about protein domains and families and the fifth on protein function which provides annotated information about gene function. The sixth category is related to pathways and interactions, provides links to pathways and interactions followed by the seventh on disorders and mutations which draws its information from OMIM and UniProt. The eighth category is on expression in specific tissues, shows that the tissue expression values are available for a given gene. The last category is about research articles, lists the references related to the proteins which are studied. In addition, the investigator can also use DNA or protein sequences to assemble the dataset for the analysis using this workflow.


Figure 2.

The contents of LGTBase database

4.2. LARGE-like GlcNAc transferase workflow

4.2.1. Reference sequences search

The unknown DNA/protein sequences are identified as members of the known gene families using the Basic Local Alignment Search Tool (BLAST). BlastP is one of the BLAST programs and it searches protein databases using a protein query. We used BlastP to look for new LARGE-like proteins from different species and gathered the protein sequences of LARGE like GlcNAc Transferases and built a protein database of ‘LARGE-like protein’. This database would assist in search for more reference sequences of LARGE-like protein.

4.2.2. DXD motif search

In several glycosyltransferase families, the DXD motif is essential for the enzymatic activity (Busch et al. 1998). So we first searched for aspartate-any residue-aspartate (DXD) motif, commonly found in glycosyltransferase. Therefore, the ‘DXD Motif Search’ tool was designed. The input protein sequences are loaded or pasted in this tool and the results indicate the presence or absence of DXD motif.

4.2.3. Transmembrane helices search

The LARGE protein is a member of the N-acetylglucosaminyltransferase family. The presence of transmembrane domain is a characteristics feature of this family. TMHMM program is used to predict the transmembrane helices based on the hidden Markov model. The prediction gives the most probable location and orientation of transmembrane helices in the sequence. TMHMM can predict the location of transmembrane alpha helices and the location of intervening loop regions. This program also predicts the location of the loops that are present between the helices either inside or outside of the cell or organelle. The program is designed based on a 20 amino acids long alpha helix which contains hydrophobic amino acids that can span through a cell membrane.

4.2.4. MEME analysis

A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MEME (Multiple Expectation-maximization for Motif Elicitation) represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. The program can search for homologous sequences among the input protein sequences.

4.2.5. Protein families search

The Pfam HMM search was used to identify the protein family to which the input protein sequences belong. The Pfam database contains the information about most of the protein domains and families. The results from the Pfam HMM search will show the relation of input protein sequences with the existing protein families and domains.

4.2.6. Phylogenetic analysis

The phylogenetic analysis was performed to find any significant evolutionary relationship between the new protein sequences and the LARGE protein family and to support our previous findings. ClustalW, a multiple alignment program which aligns two or more sequences to determine any significant consensus sequences between them (Thompson et al., 1994). This approach can also be used for searching patterns in the sequence. The phylogenetic tree was constructed by using PHYLIP program (v.3.6.9) and viewed by Treeview software (v.1.6.6). In GlcNAc-transferase phylogenetic analysis, once the multiple alignment of all GlcNAc-transferase has been done, it can be used to construct the phylogenetic tree. About 25 protein sequences were identified as the LARGE-like protein family. By using the neighbor joining distance method, the phylogenetic tree showed that these proteins can be divided into 6 groups (Figure 3). The evolutionary history inferred from phylogenetic analysis is usually depicted as branching, tree-like diagrams which represents an estimated pedigree of the inherited relationships among the protein sequences from different species. These evolutionary relationships can be viewed either as Cladograms (Chenna et al., 2003) or Phylograms (Radomski & Slonimski, 2001).


Figure 3.

Phylogenetic tree of LARGE-like Protein Family

4.3. Organization of the LGTBase platform

The data obtained from the analyses were stored in a MySQL relational database and the web interface was built by using PHP and CGI/Java scripts. According to the characteristics of LARGE-like GlcNAc transferase proteins, the workflow was designed and developed by using Java language and several open source bioinformatics programs. Tools with different languages, C, perl, java were integrated by using Java language (Figure 4). Adjustable parameters of the tools were reserved to fulfill the needs in future.

5. Application with LARGE protein family

A protein sequence (fasta format) can be entered into the BlastP assistant interface, enabling the other known proteins with similar sequences to be identified (Figure 5). The investigator can select all the resulting sequences or use only some of them. The data can then be transferred to the DXD analysis page (Figure 6). The rationale behind choosing the DXD analysis was since they are represented in many families of glycosyltransferases and it will be easy to narrow down the analysis of putative protein sequences to particular protein families or domains. There were many online tools available for the identification and characterization of unknown protein sequences. So depending upon the target protein of study, one can pick the tools to characterize it.


Figure 4.

Database selected for construction of the knowledge management platform

The sequences are analyzed with the DXD motif search tool (Figure 6), which selects those sequences containing the DXD motif for the TMHMM analysis. The transmembrane helices can be predicted with TMHMM analysis (Figure 7). The transmembrane domains are predicted by the hydrophobic nature of the proteins and mainly used to identify the cellular location of the proteins. Similar to transmembrane domain prediction, there were several other domains that can be predicted based on the protein’s characters like hydrophobic, hydrophilic etc., The dataset containing DXD motifs and transmembrane helices are then selected for MEME (Figure 8) and Pfam analysis (Figure 9). Some sequence motifs occur repeatedly in the data set and are conjectured to have a biological significance are predicted by MEME analysis. This application plays a significant role in characterization of the putative protein sequences after the initial studies with the DXD motif, transmembrane domain, and other tools. This tool can be used for all kind of protein sequences since its prediction is based on the pattern of sequences present in the study. The protein sequences in the dataset can be identified to the known protein families by Pfam analysis. The pfam classification can also be used for almost all the putative protein sequences because of its large collection of protein domain families represented by multiple sequence alignments and Hidden Markov Models. After the MEME and Pfam analysis were done, ClustalW and Phylip programs were used for Phylogenetic Analysis (Figure 9) to see the evolutionary relationship among the data sets (Figure 10). Finally, these results can be used to design experiments to be performed in the laboratory.


Figure 5.

The BlastP tool of the LGTBase platform to find similar sequences to LARGE


Figure 6.

DXD motif search tool of the LGTBase platform for DXD motif prediction


Figure 7.

TMHMM analysis tool of the LGTBase platform for Transmembrane domain prediction


Figure 8.

MEME analysis tool of the LGTBase platform to predict the sequence motifs


Figure 9.

Pfam analysis tool of the LGTBase platform to identify the known protein family of the target protein which is studied


Figure 10.

Phylogenetic analysis tool of the LGTBase platform to study the evolutionary relationship of the target protein

6. Future direction

We have described how to construct a computational platform to analyze the LARGE protein family. Since the platform was built based on several commonly shared protein domains and motifs, it can also be modified for analyzing other golgi glycosyltransferases. Furthermore, the phylogenetic analysis (Figure 3) revealed that LARGE protein family is related to β-1,3-N-acetylglucosaminyltransferase 1 (β3GnT). β3GnT (EC is a group of enzymes belong to glycosyltransferases family. Some β3GnT enzymes catalyze the transfer of GlcNAc from UDP-GlcNAc to Gal in the Galβ1-4 Glc(NAc) structure with β-1,3 linkage. These enzymes were grouped into GT family 31, 49 in the CAZy database. The enzyme uses 2 substrates namely, UDP-N-acetyl-D-glucosamine and D-galactosyl-β-1,4-N-acetyl-D-glucosaminyl-R and the products are formed as UDP, N-acetyl-β-D- glucosaminyl-β-1,3 -D-galactosamine. These enzymes participate in the formation of keratan sulfate, glycosphingolipid biosynthesis, neo-lacto series and N-linked glycans. There are currently 9 members known from the β3GnT family.

The β3GnT1 (iGnT) was the first enzyme to be isolated when cDNA of a human β-1,3-N-acetylglucosaminyltransferase essential for poly-N-acetyllactosamine synthesis was studied (Zhou et al., 1999). The poly-N-acetyllactosamine synthesized by iGnT provides critical backbone structure for the addition of functional oligosaccharides such as Sialyl Lewis X. It has been reported recently that β3GnT1 is involved in attenuating prostate cancer cell locomotion by regulating the synthesis of laminin-binding glycans on α-DG (Bao et al., 2009). Since there are several common shared domains similar to the LARGE protein, the new platform for β3GnT protein family can be constructed based on the original platform. Apart from β3GnT1, β3GnT2 enzyme is responsible for elongation of poly-lactosamine chains. This enzyme was isolated based on structural similarity with the β3GalT family. Studies showed that on a panel of invasive and noninvasive fresh transitional cell carcinomas (TCCs) showed strong down regulation of β3GnT2 in the invasive lesions, suggesting that a decline in the expression levels of some members of the glycosyltransferase (Gromova et al., 2001).

The β3GnT3 and β3GnT4 enzymes were subsequently isolated based on the structural similarity with β3GalT family. β3GnT3 is a type II transmembrane protein and contains a signal anchor that is not cleaved. It prefers the substrates of lacto-N-tetraose and lacto-N-neotetraose, and it is also involved in the biosynthesis of poly-N-acetyllactosamine chains and the biosynthesis of the backbone structure of dimeric sialyl Lewis A. It plays dominant role in the L-selectin ligand biosynthesis, lymphocyte homing and lymphocyte trafficking. The β3GnT3 enzyme is highly expressed in the non-invasive colon cancer cells. β3GnT4 is involved in the biosynthesis of poly-N-acetyllactosamine chains and prefers lacto-N-neotetraose as the substrate. It is a type II transmembrane protein and it is expressed more in bladder cancer cells (Shiraishi et al., 2001). β3GnT5 is responsible for lactosyltriaosylceramide synthesis, an essential component of lacto/neolacto series glycolipids (Togayachi et al., 2001 ). The expression of the HNK-1 and Lewis x antigens on the lacto/neo-lacto-series of glycolipids is developmentally and tissue-specifically regulated by β3GnT5. The overexpression of β3GnT5 in human gastric carcinoma cell lines led to increased sialyl-Lewis X expression and increased H.pylori adhesion (Marcos et al., 2008).

The β3GnT6 synthesizes the core 3 O-glycan structure and speculates that this enzyme plays an important role in the synthesis and function of mucin O-glycan in the digestive organs. In addition, the expression of β3GnT6 was markedly down regulated in gastric and colorectal carcinomas (Iwai et al., 2005). Expression of β3GnT7 has been reported to be down-regulated upon malignant transformation (Kataoka et al., 2002). Elongation of the carbohydrate backbone of keratan sulfate proteoglycan is catalyzed by β3GnT7 and β1,4-galactosyltransferase 4 (Hayatsu et al., 2008). β3GnT7 can transfer GlcNAc to Gal to synthesize a polylactosamine chain with each enzyme differing in its acceptor molecule preference. The polylactosamine and related structures plays crucial role in cell-cell interaction, cell-extracellular matrix interaction, immune response and determining metastatic capacity. The β3GnT8 enzyme extends a polylactosamine chain specifically on a tetraantennary N-glycans. β3GnT8 transfers GlcNAc to the non-reducing terminus of the Galβ1-4GlcNAc of tetra antennary N-glycan in vitro. Intriguingly, β3GnT8 is significantly upregulated in colon cancer tissues than in normal tissue (Ishida et al., 2005). The co-transfection of β3GnT8 and β3GnT2 resulted in synergistic enhancement of the activity of the polylactosamine synthesis. This indicates that these two enzymes interact and complement each other’s function in the cell. As a summary, the members of the β3GnT protein family are important in human cancer biology.

Our initial motif analysis showed that there are 3 important functional domains predicted are commonly found among the β3GnT enzymes. The first motif is a structural motif necessary for maintaining the protein fold. The second, DXD motif represented in many glycosyltransferases is involved in the binding of the nucleotide-sugar donor substrate, both directly and indirectly through coordination of metal ions such as magnesium or manganese in the active site. A glycine-rich loop is the third motif found at the bottom of the active site cleft. This loop is likely to play a role in the recognition of both the GlcNAc portion of the donor and the substrate. Since the three common domains of β3GnT are similar to the LARGE protein family, it is feasible to modify the current LARGE platform to analyze other golgi glycosyltransferases such as β3GnT.


1 - X. Bao, M. Kobayashi, S. Hatakeyama, K. Angata, D. Gullberg, J. Nakayama, M. N. Fukuda, M. Fukuda, Tumor suppressor function of laminin-binding α-dystroglycan requires a distinct β-3-N-acetylglucosaminyltransferase. Proceedings of the National Academy of Sciences USA, 106 29July 2009), 12109 12114
2 - R. Barresi, D. E. Michele, M. Kanagawa, H. A. Harper, S. A. Dovico, J. S. Satz, S. A. Moore, W. Zhang, H. Schachter, J. P. Dumanski, R. D. Cohn, I. Nishino, K. P. Campbell, L. A. R. G. E. can, bypass. functionally, glycosylation. alpha-dystroglycan, in. defects, congenital. distinct, dystrophies. muscular, Nature Medicine, 10 7July 2004), 696 703
3 - S. Braun, plasmid. D. N. A. Naked, the. for, of. treatment, dystrophy. muscular, Current Opinion in Molecular Therapeutics, 6October 2004), 499 505
4 - M. Brockington, S. Torelli, P. Prandini, C. Boito, N. F. Dolatshad, C. Longman, S. C. Brown, F. Muntoni, Localization and functional analysis of the LARGE family of glycosyltransferases: significance for muscular dystrophy. Human Molecular Genetics, 14 5March 2005), 657 665
5 - C. Busch, F. Hofmann, J. Selzer, S. Munro, D. Jeckel, K. Aktories, A. common, of. motif, glycosyltransferases. eukaryotic, essential. is, the. for, activity. enzyme, large. of, cytotoxins. clostridial, Journal of Biological Chemistry, 273 31July 1998), 19566 19572
6 - B. L. Cantarel, P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard, B. Henrissat, Carbohydrate. The-Active, Zymes. En, . C. A. database, Zy, expert. an, for. resource, Glycogenomics, Nucleic Acids Research, 37January 2009), D233 D238
7 - R. Chenna, H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins, JD (2003. Thompson, Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31 13July 2003), 3497 3500
8 - P. M. Coutinho, E. Deleury, G. J. Davies, B. Henrissat, An evolving hierarchical family classification for glycosyltransferases. Journal of Molecular Biology, 328April 2003), 307 317
9 - J. Ding, D. Berleant, D. Nettleton, E. Wurtele, 2002Mining MEDLINE: abstracts, sentences, or phrases? Pacific Symposium on Biocomputing, 7 326 337
10 - J. P. Dumanski, E. Carlbom, V. P. Collins, M. Nordenskjold, Deletion mapping of a locus on human chromosome 22 involved in the oncogenesis of meningioma. Proceedings of the National Academy of Sciences USA, 84December 1987), 9275 9279
11 - M. Durbeej, M. D. Henry, K. P. Campbell, Dystroglycan in development and disease. Current Opinions in Cell Biology 10October 1998), 594 601
12 - K. Fujimura, H. Sawaki, T. Sakai, T. Hiruma, N. Nakanishi, T. Sato, T. Ohkura, H. Narimatsu, L. A. R. G. E. facilitates, maturation. the, a-dystroglycan. of, effectively. more, L. A. R. G. E. than, Biochemical and Biophysical Research Communications, 329 3April 2005), 1162 1171
13 - M. Fukuda, O. Hindsgaul, B. D. Hames, D. M. Glover, 1994In Molecular Glycobiology, Oxford Univ. Press, Oxford.
14 - M. Fukuda, O. Hindsgaul, 2000 Molecular and Cellular Glycobiology (2nd ed.), Oxford Univ. Press, Oxford.
15 - S. H. Gee, F. Montanaro, M. H. Lindenbaum, S. Carbonetto, a. Dystroglycan-α, glycoprotein. dystrophin-associated, a. is, agrin. functional, receptor, Cell, 77June 1994), 675 686
16 - M. Gerstein, (2000, Integrative database analysis in structural genomics. Nature Structural Biology, 7November 2000), Suppl: 960-963.
17 - K. Grewal, P. J. Holzfeind, R. E. Bittner, J. E. Hewitt, Mutant glycosyltransferase and altered glycosylation of alpha-dystroglycan in the myodystrophy mouse. Nature Genetics, 28June 2001), 151 154
18 - Grewal, P.K. & Hewitt, J.E.,., Mutation of Large, which encodes a putative glycosyltransferase, in an animal model of muscular dystrophy. 2002Biochimica et Biophysica Acta, 1573December 2002), 216 224
19 - I. Gromova, P. Gromov, J. E. Celis, A. Novel, of. Member, Glycosyltransferase. the, β. Family, T. Gn, down. highly, in. regulated, human. invasive, Transitional. bladder, Carcinomas. Cell, Molecular Carcinogenesis, 32 2October 2001), 61 72
20 - N. Hayatsu, S. Ogasawara, M. K. Kaneko, Y. Kato, H. Narimatsu, Expression of highly sulfated keratan sulfate synthesized in human glioblastoma cells. Biochemical and Biophysical Research Communications, 368 2April 2008), 217 222
21 - K. Y. Hwa, T. L. Pang, M. Y. Chen, 2007Classification of LARGE-like GlcNAc-Transferases of Dictyostelium discoideum by Phylogenetic Analysis. Frontiers in the Convergence of Bioscience and Information Technologies, 289 293
22 - H. Ishida, A. Togayachi, T. Sakai, T. Iwai, T. Hiruma, T. Sato, R. Okubo, N. Inaba, T. Kudo, M. Gotoh, J. Shoda, N. Tanaka, H. A. Narimatsu, beta1,. novel, (beta. N-acetylglucosaminyltransferase, Gn-T8, synthesizes. which, is. poly-N-acetyllactosamine, upregulated. dramatically, colon. in, F. E. B. cancer, FEBS Letters. (January 2005 579 1 71 78
23 - H. Ishida, A. Togayachi, T. Sakai, T. Iwai, T. Hiruma, T. Sato, R. Okubo, N. Inaba, T. Kudo, M. Gotoh, J. Shoda, N. Tanaka, H. Narimatsu, A. novel, beta1,, (beta. N-acetylglucosaminyltransferase, Gn-T8, synthesizes. which, is. poly-N-acetyllactosamine, upregulated. dramatically, colon. in, F. E. B. cancer, FEBS Letters, 579 1January 2005), 71 78
24 - T. Iwai, T. Kudo, R. Kawamoto, T. Kubota, A. Togayachi, T. Hiruma, T. Okada, T. Kawamoto, K. Morozumi, H. Narimatsu, . Core, is. synthase, in. down-regulated, carcinoma. colon, suppresses. profoundly, metastatic. the, of. potential, cells. carcinoma, Proceedings of the National Academy of Sciences USA, 102 12March 2005), 4572 4577
25 - M. Kanagawa, D. E. Michele, J. S. Satz, R. Barresi, H. Kusano, T. Sasaki, R. Timpl, M. D. Henry, K. P. Campbell, Disruption of Perlecan Binding and Matrix Assembly by Post-Translational or Genetic Disruption of Dystroglycan Function. FEBS Letters, 579 21August 2005), 4792 4796
26 - K. Kataoka, N. H. Huh, A. novel, β1,, involved. N-acetylglucosaminyltransferase, invasion. in, cancer. of, as assayed. cells, vitro. in, Biochemical and Biophysical Research Communications, 294 4June 2002), 843 848
27 - P. W. Lane, T. C. Beamer, D. D. Myers, a. Myodystrophy, myopathy. new, chromosome. . on, the. of, mouse, Journal of Heredity, 67 3May-June 1976), 135 138
28 - C. Longman, M. Brockington, S. Torelli, C. Jimenez-Mallebrera, C. Kennedy, N. Khalil, L. Feng, R. K. Saran, T. Voit, L. Merlini, C. A. Sewry, S. C. Brown, F. Muntoni, Mutations in the human LARGE gene cause MDC1D, a novel form of congenital muscular dystrophy with severe mental retardation and abnormal glycosylation of alpha dystroglycan. Human Molecular Genetics, 12 21November 2003), 2853 2861
29 - N. T. Marcos, A. Magalhães, B. Ferreira, M. J. Oliveira, A. S. Carvalho, N. Mendes, T. Gilmartin, S. R. Head, C. Figueiredo, L. David, F. Santos-Silva, C. A. Reis, Helicobacter pylori induces β3GnT5 in human gastric cell lines, modulating expression of the SabA ligand Sialyl-Lewis X. Journal of Clinical Investigation, 118 6June 2008), 2325 2336
30 - H. Narimatsu, (2006, glycogene. Human, focus. cloning, beta. on, 3-glycosyltransferase, 4-glycosyltransferase. beta, families, Current Opinions in Structural Biology. 16 5October 2006), 567 575
31 - E. A. Newman, L. J. Frishman, 1991The b-wave. In Arden, G.B. (ed.), Principles and Practice of Clinical Electrophysiology of Vision, Mosby-Year Book, St Louis, MO.
32 - H. B. Peng, A. A. Ali, D. F. Daggett, H. Rauvala, J. R. Hassell, N. R. Smalheiser, The relationship between perlecan and dystroglycan and its implication in the formation of the neuromuscular junction. Cell Adhesion and Communication, 5 6September 1998), 475 489
33 - M. Peyrard, E. Seroussi, A. C. Sandberg-Nordqvist, Y. G. Xie, F. Y. Han, I. Fransson, J. Collins, I. Dunham, M. Kost-Alimova, S. Imreh, J. P. Dumanski, human. L. A. R. G. E. The, from. gene, 22q12.3-q13, a. is, distinct. new, of. member, glycosyltransferase. the, family. gene, Proceedings of the National Academy of Sciences USA, 96 2January 1999), 589 603
34 - J. P. Radomski, P. P. Slonimski, Genomic style of proteins: concepts, methods and analyses of ribosomal proteins from 16 microbial species. FEMS Microbiol Reviews, 25 4August 2001), 425 435
35 - K. Sasaki, K. Kurata-Miura, M. Ujita, K. Angata, S. Nakagawa, S. Sekine, T. Nishi, M. Fukuda, Expression cloning of cDNA encoding a human beta-1,3-N-acetylglucosaminyl transferase that is essential for poly-N-acetyllactosamine synthesis. Proceedings of the National Academy of Sciences USA, 94 26December 1997), 14294 14299
36 - N. Shiraishi, A. Natsume, A. Togayachi, T. Endo, T. Akashima, Y. Yamada, N. Imai, S. Nakagawa, S. Koizumi, S. Sekine, H. Narimatsu, K. Sasaki, Identification and characterization of 3 novel β1,3-N-Acetylglucosaminyltransferases. Structurally Related to the β1,3-Galactosyltransferase family. The Journal of Biological Chemistry, 276 5February 2001), 3498 3507
37 - N. R. Smalheiser, N. B. Schwartz, a. Cranin, protein. laminin-binding, cell. of, membranes, Proceedings of the National Academy of Sciences USA, 84 18September 1987), 6457 6461
38 - S. Sugita, F. Saito, J. Tang, J. Satz, K. Campbell, T. C. Sudhof, A. stoichiometric, of. complex, neurexins, in. dystroglycan, brain, Journal of Cell Biology, 154 2July 2001), 435 445
39 - J. D. Thompson, D. G. Higgins, T. J. Gibson, (1994, C. L. U. S. T. A. L. W. improving, sensitivity. the, progressive. of, sequence. multiple, through. alignment, weighting. sequence, gap. position-specific, penalties, matrix. weight, choice, Nucleic Acids Research, 22 22November 1994), 4673 4680
40 - A. Togayachi, T. Akashima, R. Ookubo, T. Kudo, S. Nishihara, H. Iwasaki, A. Natsume, H. Mio, J. Inokuchi, T. Irimura, et al.cloning. Molecular, of. U. D. P. characterization-Glc, N. Ac, β1,. Lactosylceramide, (β. N-acetylglucosaminyltransferase, Gn-T5, essential. an, for. enzyme, expression. the, H. N. K. of, X. Lewis, on. epitopes, Journal. glycolipids, Biological. of, Vol. Chemistry, 5March 2001 22032 22040
41 - J. van Reeuwijk, H. G. Brunner, H. van Bokhoven, 2005 Glyc-O-genetics of Walker-Warburg syndrome. Clinical Genetics, 67 4April 2005), 281 289
42 - A. Varki, R. D. Cummings, J. D. Esko, H. H. Freeze, P. Stanley, C. R. Bertozzi, G. W. Hart, M. E. Etzler, 2008 Essentials of Glycobiology, (2nd ed.) Plainview (NY): Cold Spring Harbor Laboratory Press
43 - D. Zhou, A. Dinter, Gallego. R. Gutiérrez, J. P. Kamerling, J. F. Vliegenthart, E. G. Berger, T. Hennet, (1999, A. β-1,, with. N-acetylglucosaminyltransferase, synthase. poly-N-acetyllactosamine, is. activity, related. structurally, β-1,3-galactosyltransferases. to, Proceedings of the National Academy of Sciences USA, 96 2January 1999), 406 411