In recent years, the field of structural biology has seen many advances in technology for the production of recombinant proteins, mainly led by the high-throughput techniques of the structural genomics community. These technologies have largely focussed on expression using Escherichia coli with around 88 % of the protein chains in the Protein Data Bank (PDB) in 2010 being produced using this bacterium . However, many proteins require post-translational modification for correct folding and/or activity. By far the most common modification is glycosylation and it has been estimated that over half of all human proteins including many membrane proteins are glycosylated [2-3]. Therefore, the production of correctly glycosylated proteins for functional and structural studies requires the use of eukaryotic systems .
Glycoproteins represent a unique challenge to the structural biologist due to the size and heterogeneity of the oligosaccharide chains. Glycans can constitute from 1 % to over 80 % of the total protein mass  with variation in the type and number of sugars attached to a glycosylation site and also the occupancy of a site. The heterogeneity introduced by glycosylation can hinder crystallization of a glycoprotein  thus limiting the use of X-ray crystallography, one of the major techniques using in protein structure solution.
The following chapter reviews options for the production of glycoproteins from different types of expression system, both prokaryotic and eukaryotic, along with methods for addressing glycan heterogeneity in order to produce homogeneous glycoprotein samples which are amenable for structural studies. The use of such homogeneous glycoprotein samples in both crystallization and nuclear magnetic resonance (NMR) experiments are discussed.
2. Glycosylation in mammalian cells
In eukaryotes, glycans are added to the polypeptide chain in the endoplamic reticulum (ER) and the Golgi as the protein is secreted (see Figure 1). There are two types of glycosylation, one involves linkage of an oligosaccahride to an asparagine residue and the second involves linkage to a serine or threonine residue referred to as N- and O-glycosylation respectively.
N-glycosylation occurs co-translationally in the ER at the consensus sequon Asn-X-Ser/Thr/Cys-X where X is any amino acid except proline. The exact sequence of the sequon has a bearing on the occupancy of the glycosylation site. N-glycosylation is often essential for the expression and folding of a glycoprotein , with the initial glycan formed in the ER being further modified and decorated in the Golgi apparatus. This elaboration of the glycan core leads to a large number of possible structures which are classified as high mannose, complex or hybrid (Figure 2).
O-glycosylation (See Figure 3) usually occurs in regions containing large numbers of sequential serine, threonine and proline residues which are known as mucin domains. These regions show little secondary structure and are therefore often excluded from structural studies as proteins containing such disorder are unlikely to crystallize. O-glycosylation occurring outside mucin regions is difficult to predict accurately and so is usually only detected after production of a protein. As O-glycosylation occurs in the Golgi, it has little bearing on the early stages of protein folding and therefore sites found to be O-glycosylated can be engineered out of the protein by site-directed mutagenesis of the acceptor Ser/Thr residues.
3. Expression systems
For the correct production and folding of glycoproteins, the expression host needs to post-translationally modify the protein chain by adding sugars at the glycosylation sites. Therefore, production of glycoproteins is most often performed using eukaryotic cells, although the aglycosylated protein can be expressed as inclusion bodies in E. coli and refolded and also produced using cell-free systems. Recent advances in glycosylation pathway engineering have resulted in both E. coli  and cell-free systems  which are capable of introducing N-glycosylation onto a protein. These methods are discussed in the relevant sections below.
In addition to finding an appropriate system for the over-expression of the target glycoprotein, the ability of this system to incorporate selenomethioine into the protein must be assessed. Selenomethioine labelling enables phasing of X-ray diffraction data by multi-wavelength anomalous diffraction (MAD) . Proteins expressed using E. coli routinely incorporate 100 % selenomethionine, whereas incorporation is more variable for proteins produced in higher eukaryotic systems.
3.1. Escherichia coli
E. coli is an attractive host as it is fast, simple to use, robust and cost-effective and therefore remains the dominant host for the production of recombinant proteins. In terms of glycoprotein production, the protein is usually expressed in inclusion bodies as bacteria do not have either the endoplasmic reticulum or the Golgi apparatus needed to secrete and post-translationally modify glycoproteins. As the protein is produced in inclusion bodies, refolding is needed to produce samples for structural biology – a process which can be time consuming and inefficient. However this method has proved useful for a number of products, for instance Watson et al. used E. coli to produce crystals of the extracellular domain of human UL16-binding protein ULBP1 . Here the authors investigated various refolding strategies before finding the optimum protocol of slow dilution of guanidine-solubilised protein in solution containing arginine, ethylenediaminetetraacetic acid, reduced and oxidized glutathione and phenylmethyl-sulphonyl fluoride over 48 hours. In addition, recently the crystal structures of the murine class I major histocompatibility complex H-2Kb (PDB entry 3ROL)  and human β-secretase I with bound inhibitors (PDB entry 3S7M and 3S7L)  and the NMR structure of the sterile alpha motifs of EphA2 and SHIP2 (PDB entry 2KSO)  have solved using production in E. coli followed by refolding.
Another approach is to express the glycoprotein in the periplasm of E. coli using a bacterial signal sequence such as OmpA. For example, human α1-microglubulin was produced in the periplasm of E. coli and used to obtain the crystal structure to 2.3 Å resolution which revealed a potential heme binding site (PDB entry 3QKG) .
Recently an engineered eukaryotic protein glycosylation pathway has been inserted into E. coli resulting in cells capable of producing glycoproteins with Man3GlcNAc2 sugars attached . Four glycosyltransfereases from Saccharomyces cerevisae, the uridine diphosphate-N-acetylglucosamine transferases Alg13 and Alg14 and the mannosyltransferases Alg1 and Alg2, are used to generate the glycan. The glycan is then transferred onto an N-glycosylation site using the oligosaccharyltransferase PglB from Campylobacter jejuni. Valderrama-Rincon et al. tested production of three eukaryotic glycoproteins; the Fc domain of human IgG1, bovine RNaseA and the placental variant of human growth hormone, and detected expression of glycosylated proteins . Although currently only ~1 % of the expressed proteins were found to be glycosylated, this technology represents a huge potential for the cost-effective production of glycoproteins with a defined glycosylation pattern.
3.2. Insect cells
Glycoproteins produced by insect cells, such as Spodoptera frugiperda, Trichoplusia ni and Drosophila melanogaster, express products with glycans which are oligomannose and paucimannose in nature and mainly of the form α1-6 fucosylated Man3GlcNAc2 (Figure 4A) . This compact, relatively homogeneous glycoform is compatible with protein crystallization and a number of structures have been determined of glycoproteins produced using insect cells. For example, the crystal structure of FcγRIIa was solved to 1.5 Å resolution using baculovirus infected S. frugiperda (Sf) 21 cells (PDB entry 3RY4) and also its structure bound to human IgG1-Fc was resolved to 3.8 Å (PDB entry 3RY6) .
The glycoforms produced by insect cells can be trimmed by treatment with endoglycosidases, for example endoglycosidase (endo) H or endo F1 and endo D will remove oligomannose and paucimannose sugars respectively leaving one GlcNAc residue on N-glcyosylation sites. In addition, endo F2 can be used in combination with endo F3 to cleave oligomannose and biantennary complex sugars and core fucosylated bi and triantennary complex gycans. Using this de-glycosylation strategy Fan et al. reported the successful crystallization and structure determination of a complex of human follicle stimulating hormone with its receptor (PDB entry 1XWD). Here crystals of fully glycosylated complex diffracted to 9 Å whereas 2.9 Å resolution was obtained after de-glycosylation with endo F2 and endo F3 . Also, the structure of HIV-1 envelope glycoprotein gp120 was solved to 2.2 Å using de-glycosylation with endo H and endo D (PDB entry 1G9N) . Although less common, tunicamycin has been used during production to block all glycosylation in order to promote crystallization. For example, evasin-1 structure was solved to 1.63 Å in its non-glycosylated form and to 2.7 Å in its glycosylated form (PDB entries 3FPR and 3FPT respectively) .
Selenomethionine labelling in insect cell systems can be difficult due to the toxicity of selenomethionine and the long incubation times needed as late baculovirus promoters such as polyhedron or P10 are normally used . The levels of incorporation for secreted glycoproteins is higher than for intracellular proteins as unlabelled protein is removed during the media exchange process . For example, 85 % selenomethionine incorporation was achieved for envelope glycoprotein D from HSV1  and 76 % for palmitoyl protein thioesterase 1 (PDB entry 1EI9 and 1EH5) . In 2007, Cronin et al. addressed this problem of variability in levels of incorporation and developed a method which consistently gave 70-75 % selenomethionine incorporation with a yield of around 20 % of the unlabelled protein .
3.3. Yeast cells
Yeast have been used for the production of human glycoproteins, with the most popular expression hosts being Saccharomyces cerevisiae and Pichia pastoris [24-25]. Recombinant proteins are secreted and post-translationally glycosylated to give glycoforms that are sensitive to endo H or endo F1 treatment and are therefore suitable for crystallization (Figure 4B). For example, de-glycosylation of protein produced in P. pastoris was used to prepare the G-protein coupled receptor (GPCR) human β2 adrenergic receptor for crystallization  and production of gastric intrinsic factor with cobalamin (vitamin B12) bound (IF-Cbl) in P. pastoris was followed by complex formation with the cubilin IF-Cbl-binding-region and endo H treatment prior to crystallization of the complex to give a 3.3 Å resolution crystal structure (PDB entry 3KQ4) .
There are examples in the literature where retaining the N-glycans on the glycosylated protein produced in yeast is important, such as the of human and mouse glutaminyl cyclases (QC), enzymes linked with Alzheimer’s disease, (PDB entry 3SI0) . The structure of human QC had already been solved using protein produced in E. coli (PDB entry 2AFO) . However, the glycosylated structure was shown to be more stable and to contain a disulphide bond not present in the non-glycosylated structure. The reduced stability of the aglycosylated human QC was associated with ready loss of the catalytic zinc ion .
Production of selenomethionine labelled proteins is possible in yeast with incorporation levels of around 50 % routinely reported for both P. pastoris and S. cerevisiae . However higher selenomethionine incorporation has been reported. For instance, incorporation of ~98 % selenomethionine was reported for production of the carboxy-terminal fragment of the RNA-dependent RNA polymerase from Neurospora crassa, QDE-1 using S. cerevisiae . This labelled protein was used to generate crystals which diffracted to 3.2 Å and allowed phasing of the native data giving a 2.3 Å resolution crystal structure (PDB entry 2J7N). More recently Malkowski et al. described generation of a selenomethionine-resistant strain of S. cerevisiae by blocking the S-adenosylmethionine synthesis pathway . This strain was used to produce tryptophanyl-tRNA synthetase with >95 % selenomethionine incorporation which led to the determination of the structure of this yeast enzyme. . Using the strain to produce and label heterologous proteins has so far not been reported.
3.4. Mammalian cells
Mammalian cells are a popular choice for the expression of glycoproteins, particularly for the production of human proteins. These cells have the correct machinery to fold and post-translationally modify glycoproteins. The two main cell lines used are human embryonic kidney (HEK) 293 cells and Chinese hamster ovary (CHO) cells, both of which are readily transfected with polythyleneimine (PEI), calcium phosphate or commercial lipids giving expression in 60-80 % of the cells [32-33]. There are two main variants of HEK 293 cells, (i) 293T which expresses the SV40 large-T antigen and (ii) 293E which expresses the Epstein-Barr virus (EBV) nuclear antigen 1 (EBNA1). Plasmids containing the SV40 or EBV origins of replication are amplified within these variants, giving more copies of the plasmid per cell and therefore a higher levels of protein expression . An alternative method of introducing genes into mammalian cells is to use viral-mediated transduction such as the BacMam system , which has been shown to give milligram quantities of protein for structural studies .
Mammalian cells can be grown in either attached or suspension culture and automation of the culturing processes is possible in both cases [37-38]. Structural studies have been performed on proteins produced both by transient transfection, formation of a stable cell line, and using stable pools. Transient transfection is attractive due to the short timeframe between transfection and purified protein. Glycoprotein yields for non-antibodies of up to 40 mg/L have been obtained from attached HEK cells  and 36 mg/L from HEK cells in suspension , with yields of 27 mg/L from CHO cells in suspension .
Formation of a stable cell line producing a glycoprotein of interest often gives higher yields than utilizing transient transfection, however establishment of a stable cell line can take 2-6 months . Automation can give a three-fold increase in throughput, although the timeline is the same . An advantage of the system is that once the stable cell line has been established, production of the glycoprotein is fast and robust.
Use of stable transfection pools for expression are becoming increasingly popular as the protein yield is usually higher than for transient transfection but the timeline is shorter than for making a stable cell line. Post-transfection, the cells are sorted, often using a fluorescence marker, in order to enrich the pool for high producers. Using such a method, highly productive pools of cells can be obtained in 3 weeks, though yields may decline over time in culture as the transfection cells are not clonal. Using this method, expression levels of monoclonal antibodies from 100 mg/L to 1 g/L can be achieved in 2 months post transfection using CHO cells .
Selenomethionine labelling in mammalian cells is not as efficient as in E. coli, however levels of up to 90 % have been reported for stable HEK cell lines  and 78 % for transient glycoprotein production . Generally cells are grown for a period of about 12 hours in media lacking methionine in order to deplete the intracellular methionine pools before the addition of selenomethionine. Keeping the concentration of selenomethionine used to 60 mg/L or below is important due to its toxicity which will lead to a lower yield of glycoprotein.
Unlike insect and fungi, mammalian cells produce glycoproteins with complex oligosaccharide chains (Figure 2) which are very heterogeneous and therefore not readily amenable to crystallization. Complete removal of N-linked glycans can be achieved by treatment of a purified glycoprotein with Peptide N-glycosidase (PNGase) F thus aiding crystallization . In practice, two problems are encountered which limit this approach. Firstly, incomplete removal of glycans leading to partial de-glycosylation of the product; and secondly, insolubility due to protein aggregation following removal of all the sugars. Alternatively, glycosylation can be completely blocked in cells by treatment with tunicamycin. However, if glycosylation is required for proper folding and/or solubility in situ, then this approach will compromise the synthesis of the product. Two methods have been developed to get round these problems both of which depend upon manipulating N-glycosylation during glycoprotein biosynthesis by blocking the action of processing enzymes using either chemical inhibitors or null mutant cell lines.
Three inhibitors have been used in the production of glycoproteins to manipulate the glycosylation pathway: N-butyldeoxynojirimycin (NB-DNJ), swainsonine and kifunensine. NB-DNJ inhibits α-glucosidase, thus blocking the early stages of N-glycan processing and giving products which contain high mannose or hybrid type sugars (Figure 4A). NB-DNJ has mainly being used in combination with the mutant CHO cell line, CHO Lec188.8.131.52 (Section 3.4.2), for instance in the crystallization of human costimulatory molecule B7-1 which is important in human immune response which gave crystals diffracting to 2.7 Å after treatment with endo H . Swainsonine blocks α-mannosidase II resulting in high mannose or hybid type sugars shown in Figure 4D. Kifunensine strongly inhibits α-mannosidase I activity resulting in sugars of the form Man9GlcNAc2 (Figure 4D) . Treatment of cells with the any of the above inhibitors results in relatively simple and chemically uniform glycoforms. Further, the high mannose and hybrid glycans resulting from the use of these drugs are cleavable using endoglycosidase (endo) H or endo F1 to leave one GlcNAc residue attached to the N-glycosylation site.
In practice, kifunensine is the most commonly used inhibitor in the production of glycoproteins for structural studies as the resulting glycans are the most homogeneous. In fact, the structures of glycoproteins can be solved following kifunensine treatment, without the use of endoglycosidase (for example, PDB entry 2WAH ), however more commonly the sugars are trimmed with endo H before crystallization studies. For example, Bishop et al. used transient transfection of HEK 293T cells in the presence of kifunensine to produce a number of glycoproteins involved in human Hedgehog signalling pathway. These were treated with endo H before crystallization and co-crystallization to give structures of hedgehog interaction protein ectodomain (Hhip) and Desert hedgehog in isolation and Hhip in complex with Desert hedgehog and sonic hedgehog (PDB entries 2WFT, 2WFR, 2WFQ, 2WG3, 2WFX and 2WG4) . Production in the presence of kifunensine followed by endo H treatment has also been used with a stable CHO cell line in the crystallization of the extracellular region of cytotoxic T-lymphocyte antigen 4 (CTLA-4) giving crystals which diffracted to 1.8 Å .
3.4.2. Mutant cell lines
CHO and HEK cell lines have been generated with mutations in their glycosylation pathways which disrupt the action of GlcNAc transferase I (GnTI) [50-51]. CHO Lec3.2.1 contains four mutations which are in the Gne, Slc35a1, Slc35a2 and Mgat1 open reading frames leading to lower activity of various glycosylation enzymes including GnTI . Use of CHO Lec184.108.40.206 leads to proteins mainly containing high mannose glycans which are sensitive to removal of all but the last GlcNAc residue with endo H or endo F1 (Figure 4C). However, the mutations in CHO Lec220.127.116.11 do not completely inhibit the formation of complex glycans  with, in some cases, as little as 10 % of the glycoprotein produced being endoglycosidase sensitive . Structural studies have been carried out using CHO Lec18.104.22.168 both in the absence and presence of NB-DNJ which has a complementary inhibition effect to the mutant cell line giving endoglycosidase sensitive sugars (For examples see [46, 52-54]).
The GnTI-deficient cell line, HEK 293 GnTI- (also known as HEK 293S) produces glycoproteins with high mannose glycans (Figure 4C) which are endo H and endo F1 sensitive. Use of HEK 293 GnTI- cells gives product containing a very uniform glycosylation pattern of the form Man5GlcNAc2 with only traces of other glycan patterns being detected [51, 55]. This cell line has proved popular, recently facilitating structure solution of NetrinG1 and NetrinG2 in complex with their respective ligands at 3.25 Å and 2.6 Å resolution respectively (PDB entries 3ZYJ and 3ZYJ) ; the human glutamate receptor GluR2 amino terminal domain at 1.8 Å resolution (PDB entries 2WJW and 2WJX) ; and the orphan domain of the membrane glycoprotein endoglin using small angle X-ray scattering (SAXS) .
Although not currently in mainstream use for the production of glycoproteins, two other expression systems are worth mentioning, namely cell-based production in the protozoan Leishmania tarentolae and cell-free synthesis using coupled transcription-translation systems.
The structure of human Cu/Zn superoxide dismutase has been recently published and represents the first structure solved using L. tarentolae as the expression host (PDB entry 3KH3) . The advantages of the L. tarentolae system (commercialized by Jena Biosciences as LEXSY) are that it is simple to use, gives high yields and is inexpensive compared with the higher eukaryotes. L. tarentolae has eukaryotic folding and post-translational machinery and has been used in a proof of principle experiment to express glycosylated human erythropoietin . These data show that use of L. tarentolae as a host system is applicable to the expression of glycoproteins for structural studies.
Cell-free synthesis systems have been used for structural studies for a number of years and are commonly based on lysates from E. coli, wheat germ and rabbit reticulocytes . Glycoproteins can be produced by supplementing lysates with microsomal fractions , or using extracts of eukaryotes such as insect cells , hybridomas  and mammalian cells , although the yields are often poor. These systems give glycosylation patterns native to the host which in the case of insect cells can be modified using endoglycosidases and in the case of mammalian cells can be blocked using inhibitors (Section 3.4.1). Recently both the E. coli lysate cell free synthesis system and the PURE system for in vitro translation using purified components of E. coli  have been adapted for production of glycoproteins . In this article, Guarino and DeLisa used the protein glycosylation locus from Campylobater jejuni to supplement both systems and produce glycoprotein with the GlcGalNAc5Bac (where Bac represents bacillosamine) glycosylation pattern of C. jejuni (Figure 5) . Cell-free systems have yet to yield a glycosylated protein for which a structure has been deposited in the PDB. However the ability to micro-engineer the components of the glycosylation pathway in an “open system” is an interesting and useful addition to the structural biologist’s toolbox.
4. Variable occupancy of glycosylation sites
In addition to the types of oligosaccharide chain attached to a glycosylation site, i.e. the glycoform, heterogeneity occurs in the occupancy of a glycosylation site. The occupancy of an N-glycosylation site depends upon the recognition sequon along with the structure of the protein in proximity to the glycosylation site. Some N-glycosylation sites have very low occupancy as a result of the local sequence composition, secondary structure and also distance to the C-terminus [67-69]. Bioinformatics programmes such as NetNGlyc and NetOGlyc (http://www.cbs.dtu.dk/services/NetNGlyc/ and http://www.cbs.dtu.dk/services /NetOGlyc/) [70-71] can predict occupancy of N-glycosylation and O-glycosylation sites respectively; however these are sequence based predictions and so need to be verified experimentally. After experimental identification of glycosylation site occupancy, variably occupied sites can be removed. Removal of variably occupied glycosylation sites should not affect the activity of a glycoprotein as these glycans are only present in a proportion of the sample and therefore must not be important for folding and solubility.
4.1. Identification methods
Analysis of glycosylation site occupancy is usually carried out using mass spectrometry. For a review of recent developments in glycoprotein analysis and glycomics see Zaia . In the most straightforward experiment, the occupancy of an N-glycosylation site can be determined using a combination of tryptic digest and PNGase F treatment, followed by analysis of the resulting peptide . The peptide contains asparagine at the proposed site if no glycan is present and, due to the action of PNGase F, an aspartic acid if sugars are attached. As glycopeptides are not readily ionized within a mass spectrometer, they can be enriched in the sample using hydrophilic interaction liquid chromatography (HILIC), thus increasing the likelihood of a glycosylation site-containing peptide being detected [74-75]. Mapping of glycoproteins, including both analysis of the glycosylation site and analysis of the glycans themselves, has been miniaturized and automated, such as methods using functionalized magnetic nanoparticles  and integrating chromatography steps  which allow for rapid acquisition of results.
Determination of O-glycosylation site occupancy is more difficult than for N-glycoslation sites as there is no endoglycosidase which universally releases O-linked glycans in the same way that PNGase F does for N-linked glycans. However, Halfinger et al. have developed a robust protocol using partial digestion with exoglycosidase and β-elimination using a mild alkylamine base followed by Michael addition and analysis using collision induced dissociation (CID) MS/MS .
Variable occupancy of glycosylation sites has also been detected by mutation studies. Here all the potential sites are mutated and the resulting protein analysed for activity. For example, Garman et al. analysed human IgE-FcεRIα and identified three glycosylation sites, N74, N135 and T142, which were not essential for protein folding, secretion and activity . These sites were later confirmed by mass spectrometric analysis to be variably occupied . However, mapping glycosylation occupancy by systematic mutation studies is labour intensive and is likely to result in many mutants that do not express or fold correctly as essential glycosylation sites have been removed.
4.2. Removal of glycosylation sites
Glycosylation sites found to be non-essential are often removed using site directed mutagenesis of the asparagine codon to a glutamine codon as this amino acid is similar in size and charge, but is not an attachment site for a glycan chain.
Formation of an N174Q mutant version of human ephrinA2 was essential in obtaining crystals of the EphA4:ephrinA2 complex which resulted in the structure being solved to 2.3 Å resolution (PDB entry 2WO3) . In contrast, previous attempts at crystallization using wild type ephrinA2 with EphA4 were not successful. Although the N174 was shown to be invariably glycosylated, this site is not conserved across the ephrin family, (Nettleship, Bowden, unpublished results), and therefore was presumed to be non-essential.
In the case of human alpha-N-acetylgalactosaminidase, which has five N-glycosylation sites, mutation of N201 to glutamine led to crystals giving diffraction to 1.9 Å resolution (PDB entry 3H53) as opposed to the 8 Å resolution data collected using wild type glycoprotein crystals .
5. Structural studies
5.1. Crystallization and X-ray crystallography
Glycoproteins crystallize in a variety of conditions and trials are usually set up using the same screens as non-glycosylated proteins; such as those found in kits sold by Hampton Research, Molecular Dimensions and Emerald Biosystems. A number of reviews have addressed the problems connected with crystallization of glycoproteins such as the increase in surface entropy associated with large post-translational modifications and the microheterogenity of glycans [82-83]. However, it is to be noted that the presence of glycans can be an advantage for crystallization as they can form essential intermolecular contacts in crystal lattices . Indeed glycoprotein crystallization has a success rate of around 50 % which is comparable to that for non-glycosylated proteins . Methods around glycoprotein crystallization have developed to include automation and miniaturization using microscale crystallization techniques with as little as 65 μg of protein sample .
The strategy which is unique to the crystallization of glycoproteins is manipulation of the glycoform as described above. Such manipulations affect the propensity of a glycosylated protein to crystallize and also the quality of the resulting crystals. As shown in Figure 6, the type of oligosaccharide attached to the protein affects crystal morphology. Interestingly, in the case of human IgE-FcεRIα, the Man5GlcNAc2 glycoform gave better diffracting crystals than the shorter Man1GlcNAc2 (Figure 6E and 6D respectively) and therefore setting up crystallization trials with more than one glycoform can be advantageous.
X-ray crystal structures of glycoproteins often only show the initial GlcNAc residue even if more sugar residues are known to be attached to the protein because the glycan chains are flexible and so electron density for further sugar residues is not present. In rare cases, more of the oligosaccharide chain is resolved due to it either forming crystal contacts or interacting with the polypeptide chain. For example Figure 7 shows that the oligosaccharide chain attached to N297 of human IgG1-Fc could be fully resolved (PDB entry 2WAH) .
In solving a glycoprotein structure, it is important that the carbohydrate chains are built into the relevant electron density taking into account the restrictions on sugar conformation, including torsion angles, and linkages found in nature [87-88]. Recently, Crispin et al. have described the biosynthesis of glycans with emphasis on how these pathways lead to the glycosidic linkages found in glycoprotein structures . If the production host gave sugars of a defined composition, for example HEK 293 GnTI- gives sugars of the form Man5GlcNAc2, modelling the glycans is relatively straightforward. For other cases, there are a number of databases where the structures of known glycans are deposited, with GlycomeDB (www.glycome-db.org) being the most comprehensive and unified carbohydrate database containing information from all public carbohydrate databases and the PDB [89-90]. Such databases are a useful resource for checking on the possible oligosaccharides and linkages likely to be attached to a glycoprotein produced using a given host cell. For building a three-dimensional model of an oligosaccharide chain, bioinformatics resources such as Sweet-II (www.glycosciences.de/spec/sweet2)  and Glycam Biomolecule Builder (glycam.ccrc.uga.edu/CCRC/biombuilder/biomb_ index.jsp)  are available. After modelling the glycan, Glycam Biomolecule Builder attaches the sugar chain to the protein structure at the glycosylation site, with glycans generated by Sweet-II being attached using the glyProt software (www.glycosciences.de/ modeling/glyprot/php/main.php) . Upon deposition of the glycoprotein structure into the PDB, pdb-care (PDB carbohydrate residue check) can help with problems found within glycan structures [94-95]. Overall, use of these web-based tools allows for the correct building of sugar chains into glycoprotein structures when electron density is available.
5.2. NMR spectroscopy
Nuclear magnetic resonance (NMR) spectroscopy can be used to study protein structure, but the technique faces some obstacles in terms of its use for glycoproteins, namely the 1H chemical shift overlap between carbohydrate and protein signal  and the difficulty of obtaining sufficient yield of labelled protein from eukaryotic systems. These challenges have led to the majority of glycoproteins structures solved by NMR being of the aglycosylated protein chains expressed in E. coli.
Labelling with 15N or both 15N and 13C has been demonstrated in mammalian cells , insect cells  and yeast . Such eukaryotic expression systems have been used to solve a few recombinant glycoprotein structures containing glycans such as fragments of human fibronectin (PBD entry 1E8B), human thrombomadulin (PBD entry 1DQB) and the extracellular domain of human cytotoxic T lymphocylte-associated protein (CTLA)-4 (PDB entry 1AH1) [100-102].
The problem of signal overlap between proteins and carbohydrates has been tackled by Slynko et al. who added unlabelled glycan chains using in vitro N-glycosylation technology  to AcrA from C. jejuni produced using E. coli with 15N and 13C labelling (PDB entry 2K32) . This allowed a NOESY (Nuclear Overhauser effect spectroscopy) experiment to be performed without confusion between protein and glycan resonances which often cause difficulty in structure determination of unlabelled or 15N-labelled glycoproteins .
In NMR structures, the protein chains and attached glycans are in solution and so are able to move around. Unlike in crystal structures, solution NMR allows the flexibility of both the oligosaccharide chain and the protein chain around a glycosylation site to be observed. In the case of AcrA (PDB entry 2K32), the flexibility was seen to be in the α-helical loop region containing the N42 glycosylation site (Figure 8A), with the heptasaccharide glycan having a well defined rod-like structure . Human CTLA4 (PDB entry 1AH1) contains two N-glycosylation sites, N78 and N111 which are occupied with partially deglycosylated glycans of the form Man1GlcNAc2 (Figure 8B). The glycan attached to N78 interacts extensively with the side chains of a nearby β-sheet and is therefore ordered, whereas N111 had limited interaction with the protein chain and so only the initial GlcNAc residue is well defined .
6. Conclusions and future prospects
This chapter has reviewed some of the recent developments in the field of glycoprotein structural biology. Increasingly, the expression system of choice for the production of recombinant glycoproteins is the mammalian cell, and in particular HEK in the presence of kifunensine which modifies glycan processing. This enables treatment of the purified glycoprotein with endoglycosidase to reduce the sugar “load” to a single GlcNAc at each N-linked attachment site. Experience shows that by preparing glycoproteins with different glycoforms the chances of obtaining diffraction quality crystals are significantly increased. Combining this approach with the inclusion of novel additives in the crystallization experiment, for example “smart materials”, such as the molecularly imprinted polymers (MIPs) recently reported by Sarkidakis et al. , adds another dimension to crystallization.
Future developments in simpler low cost expression technology as exemplified by Leishmania tarentolae system will also have an impact on the structural biology of glycoproteins. It is anticipated that the number of structures solved by X-ray crystallography and NMR for this important class of proteins will rapidly increase in the next few years.
The author would like to thank Ray Owens and Max Crispin for critical reading of the manuscript. The OPPF-UK is funded by the UK Medical Research Council and Biotechnology and Biological Sciences Research Council.