Application of Quality by Design Paradigm to the Manufacture of Protein Therapeutics

© 2012 Kontoravdi et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Application of Quality by Design Paradigm to the Manufacture of Protein Therapeutics


Introduction
N-linked glycosylation is of paramount importance for the biopharmaceutical sector given that approximately 40% of all approved therapeutic proteins and eight of the top ten selling biologics of 2010 contain N-linked oligosaccharides [1,2]. With specific cell productivities in biotechnological processes estimated to reach their theoretical maximum in the near future, and despite anticipated further increases in culture density, the scope for improvement of biotherapeutics is limited in terms of production volume [3]. As a result there is an increased focus on biotherapeutic efficacy in the development and production stages. Modification of the glycoform is a target for drug design as it can enhance efficacy, mode of action and halflife significantly [4]. Simultaneously, N-linked glycosylation plays a key role in the safety of biotherapeutics. Certain N-linked oligosaccharides bound to therapeutic proteins have been found to trigger undesired effects in patients [5][6][7] thus deeming them a safety concern during process development [8].These elements make N-linked glycosylation a key target for quality control both from the therapeutic efficacy and safety standpoints. A well-defined product may have consistent protein backbones but still a glycoform distribution of more than a hundred detectable isoforms [9]. Narrowing and targeting the glycan profile is expected to improve efficacy and safety significantly. An example of the variety of reported glycoforms found on the crystallisable fragment (Fc) of monoclonal antibodies (mAbs) is presented in panel D of Figure 1. It is worth noting that single monosaccharides (e.g. core Fucose) on the complex glycans present on the crystallisable fragment (Fc) of monoclonal antibodies (mAbs) may alter the therapeutic efficacy of these products. Figure 1. Classification of N-linked oligosaccharides. Panels A, B and C present a high-mannose, a hybrid and a complex tetra-antennary oligosaccharide, respectively. Highlighted by a box is the Man3GlcNAc2 core structure present in all N-linked glycans [10]. Panel D shows the most common glycans observed on the Fc of mAbs [11][12][13].

Interferon-β
In 2001, when the human genome was fully sequenced, it was found that it contains fewer genes than expected. However, this might be compensated by the diversity found at the level of protein architecture, structure, transcription patterns, and posttranslational modifications of which glycosylation is the most complex [14]. Glycan moieties have a very diverse number of roles in biological systems, making them relevant for biotherapeutics. Starting with the co-translational attachment of glycan structures, it was suggested by Petrescu et al. that glycan structures act as nucleation sites for remote aromatic parts in an amino acid sequence in order to begin the folding process [15]. Although glycans play an important role in local as well as global folding of proteins it has to be pointed out that they are often not essential to maintain structure [16][17][18], as is the case for Interferon-β (IFN-β), for example. This naturally occurring glycoprotein is produced in the human body in response to viral infections or other biological threats [19]. Human IFN-β contains a conserved Asn-80 glycan attachment site and as a biotherapeutic, is marketed for the treatment of multiple sclerosis. formation of the membrane attack complex, which disrupts the target cell's membrane and eventually leads it to lysis.  Antibody-dependent cell-mediated cytotoxicity (ADCC): In this mechanism the antibody Fab binds to a target cell, and through the Fc, binds to Fcγ receptors on the surface of natural killer cells (FcγRIII) and neutrophils (FcγII-A). Through this antibody-mediated interaction, the effector cells release cytokines and cytotoxic granules, which attack the target cell and take it towards apoptosis.
The biological in vivo mechanisms that are triggered by antibodies are highly dependent on the glycan moiety on the Cγ2 domains. It has been shown that the Asn297 glycans directly impact the three-dimensional structure of the Fc [30], thus greatly influencing the affinity FcγRs have for the IgG Fc [31,32]. The close proximity of both Cγ2 domains in the Fc generates considerable steric hindrance, which limits the oligosaccharide structures found there to the complex bi-antennary structures shown in panel D of Figure 1. Depending on the host cell line, the potential carbohydrate residues will have specific features such as bisecting GlcNAc residues or core fucosylation. Studies into the effect of bisecting structures present in antibody glycan moieties have shown that bisecting GlcNAc sugar residues lead to increased ADCC [33]. However, the presence (or absence) of other monosaccharides has been shown to have a more profound effect on in vivo modes of action. Previous reports show that absence of core fucose can increase ADCC up to 50-fold [34,35]. This finding has been translated into the development of third generation mAbs, which are currently undergoing many promising clinical trials [36]. On the other hand, galactose terminating structures in the Fc fragment of antibodies, (the most abundant terminating saccharides on the Fc glycans of human polyclonal antibodies [37]) increase the affinity of the IgG Fc for the C1q protein substantially. Their removal results in decreased complement activation [31]. Sialic acid terminating glycan moieties can also strongly modulate the efficacy of mAbs. Interestingly, only a small proportion of human IgG is naturally sialylated [38], which can be directly related to the spatial constraints in the Cγ2 domain of IgG antibodies [39]. Sialic acid can be found in either a α2,3-or α2,6-linkage to the galactose moieties of IgGs [40], and although not very abundant, these residues are crucial for immune response modulation (anti-inflammatory activity) in vivo, where the α2,6-linkage is preferred over the α2,3 [41]. In fact, anti-inflammatory activity has been abrogated after removal of IgGs containing α2,6linked sialic acid on their Fc glycan moiety. Similarly, a 10-fold enrichment in sialic acid containing IgGs induced anti-inflammatory response at 10-fold reduced doses. In contrast, IgG with α2,3-linked sialic acid terminating glycan structures failed to induce antiinflammatory response even at 4-fold higher doses, thus demonstrating the specificity of in vivo bioactivity not only to structural but also conformational differences in glycan moieties. In addition to modulating the in vivo mechanisms, glycan moieties can also have an effect on pharmacokinetics of biotherapeutic antibodies. It has also been reported that high mannose structures (oligosaccharides with five mannose residues or more) will increase plasma clearance and thus, decrease in vivo half-life with a significant negative impact on drug efficacy [42].

Fab region glycans
It is well established that the glycan moieties in the Fc region of therapeutic antibodies have considerable effects on in vivo mechanism and pharmacokinetics. However, the Fab region can also be used to enhance characteristics of therapeutic antibodies. It has been estimated that between 20-30% of antibodies also carry a glycan structure in the Fab region, where a potential glycosylation site can be found on both the heavy as well as the light chain [43,44]. Although the role of variable region carbohydrates is not clear, it is suggested that glycan moieties may influence antigen affinity, specificity, antibody solubility and stability hence, limiting aggregation [45,46]. A further biological role of Fab region glycans is believed to be in vivo half-life modulation. This has been tested by injecting mice with humanized IgG with a variety of glycan moieties at the Asn56 position of the heavy chain variable region [47]. The results indicated that while sialic acid and galactose terminating glycan moieties had very limited effect on clearance, exposed GlcNAc residues showed slightly faster clearance rates. It was proposed that the latter are recognized by Man/GlcNAc receptors and binding interactions may be sufficiently strong to allow for greater clearance. It was later determined that half-life was dependent on the tissue where the antibodies accumulated based on their Fab region glycans, thus providing further scope for improvement in targeting of mAb biotherapeutics. On the other hand, Fab glycosylation has also been reported to produce negative effects in patients. A study has shown that α-1,3 galactose residues on the Fab glycans of the commercial antibody Cetuximab generated anti-α-1,3 galactose IgE-induced anaphylaxis in patients treated with this drug [5]. This effect highlights the relevance of Fab glycans for mAb safety. It has to be noted at this point that the glycoform on the Fab region can have a much more complex structure than the equivalent Fc glycoform including increased sialylation and occurrence of tri-antennary structures [43]. Since the glycan tripeptide sequences will vary within a population of antibodies and its location within the sequence can change it may not come as a surprise that different glycoforms will arise at different positions due to changing accessibility [48]. It was shown that shifting the tripeptide sequence in the variable region of a mAb may change the glycan moiety from a high Man structure to a complex structure. This would suggest that while the glycan precursor is added co-translationally to the protein backbone, the local conformation around the glycan may not allow for further enzymatic processing of the attached high Man structure in the Golgi apparatus. Furthermore changing the position of glycan structure within the polypeptide chain can impact antigen affinity significantly by either contributing to it or by blocking it altogether. Thus, glycan engineering of antibody variable regions represents a strategy towards improving antigen affinity, antibody targeting as well as extending half-life.

Erythropoietin
A further glycoprotein of biopharmaceutical importance is Erythropoietin (EPO), a hormone that binds to the receptors of red blood cell precursors in the bone marrow leading to their survival, proliferation and differentiation, thus increasing the red blood cell count [49][50][51]. EPO is used in the treatment of anaemia associated with a number of disease states such as chronic renal failure, cancer and HIV infection [52][53][54][55]. EPO has a total of four glycan attachment sites, three N-linked sites at amino acid positions 24, 38 and 83,and one O-linked site at amino acid position 126 [56,57]. The four glycan moieties have been estimated to contribute approximately 40% to the total molecular mass of EPO and probably cover much of the surface of the molecule [58]. The hypothesis of full surface exposure is further supported by an analysis of the glycan moieties, which revealed predominantly fucosylated and tetraantennary complex glycans at the N-linked glycosylation sites [59] and, as such, suggest surface exposure during enzymatic processing. The glycan structures have been shown to play an important role in biological activity. For example, it is known that higher glycan antennarity leads to an increase in EPO's in vivo activity [60]. Interestingly, it has been demonstrated that all three N-linked glycan structures are necessary for biological activity in vivo while the O-linked glycan structure does not appear to be required for in vivo activity [61]. For recombinant human EPO, it has been shown that the N-linked glycan moieties are required for product secretion by the cell as well as ensuring solubility [61,62].

Desired characteristics dependent on application
Erythropoietin is also a very good example to demonstrate how desired safety and efficacy characteristics can be enhanced through glycan engineering of existing biotherapeutics. The plasma clearance of EPO is regulated by sialic acid containing carbohydrates [58,63], where an increase in sialic acid moieties is known to increase glycoprotein half-life and would also explain why higher antennarity is linked to increased plasma half-life. An investigation into EPO glycan structures and biological activity showed that desialylated structures showed increased in vitro activity but a much reduced in vivo activity, which is a consequence of increased EPO clearance by asialoglycoprotein receptors in the liver [64,65]. In order to increase protein half-life, increase drug efficacy and thus, decrease dosing rates in patients, a glyco-engineered darbepoetin was created [58]. Darbepoetin features two additional Nlinked glycosylation sites that were introduced by changing five amino acid residues through site-directed mutagenesis. The resulting biopharmaceutical commercially marketed as Aranesp® showed three-fold lower plasma clearance rate and results in increased in vivo potency over epoetin with three N-linked glycan sites only [66].

Current methodology and future application of quality by design initiative
Drug development begins with the discovery of molecules that have shown the biochemical potential to treat illnesses. Based on manufacturability and potential profitability, drug candidates are then selected for optimization and, eventually, for preclinical and clinical trials. To ensure that sufficient material is available for the different phases, manufacturing is concurrently scaled up during this stage in process development. Also throughout these stages, all the data regarding drug safety, efficacy and manufacture is reviewed for approval by the corresponding regulatory agencies. Approval requires that the drug product is produced consistently and that it is safe and efficacious for its indication. Despite decades of advances in drug product manufacturing, pharmaceutical process development and approval is still extremely lengthy, highly expensive and uncertain [67].
In order to reduce their losses, manufacturers have traditionally relied on the so-called quality by testing (QbT) approach for drug development and approval [68]. In QbT, product quality attributes (the ranges for drug substance properties that yield acceptable safety and efficacy) are linked with a specific manufacturing process and its corresponding set of inputs (raw materials and process parameters) during clinical phases of development. The process inputs that empirically show to yield acceptable product quality are defined and are often maintained unchanged after phase II clinical trials so that manufacturers avoid costs associated with additional testing for regulatory compliance [69]. During manufacturing, the process inputs are controlled to remain at their pre-defined set points, and at the end of each batch, the product is tested for compliance with the desired quality [68][69][70][71]. This black box approach does not require mechanistic knowledge that relates process inputs with product quality, and because of this, the QbT approach uncouples product end quality from the manufacturing process. The main drawbacks of QbT are [10,69,72]:  Development and approval of pharmaceutical processes under QbT has proven to be extremely time-consuming (between 7 and 10 years [73,74]) and very expensive (US$1.2 to 1.8 billion per approved drug when the risk of failure is included [67,75]). This could be attributed, in part, to limited understanding of the relationship between product quality characteristics, therapeutic mechanisms and the effect process inputs have on quality.  Process control is not established by mechanistic links between inputs and desired product quality. Due to this, the process is susceptible to generate off-spec product, and when this occurs, identifying the source(s) of failure is difficult.  The range of process conditions approved under QbT is narrow and changes to process conditions outside this range require additional approval, which eventually translates into further delays and expense. This discourages pharmaceutical companies from modifying or optimizing current processes and implementing innovative ones.  Overall, processes developed and approved under QbT generate a limited amount of knowledge because mechanistic relations between inputs and outputs are not apparent. This greatly restricts transferability of the knowledge gained from one process/product to the next. Moreover, the lack of generated mechanistic information increases the likelihood suboptimal process performance.
Therapeutic proteins currently undergo the same QbT development and approval process as their small molecule counterparts and suffer the same caveats. In the QbT context, bioprocess inputs are empirically defined so that the quality properties (e.g. aggregation, folding, methylation and glycosylation) of the protein lie within ranges that yield acceptable safety and therapeutic efficacy. As occurs with small molecules, QbT disjoins therapeutic protein quality from the bioprocess. This has led to inadequate understanding of the relationship between process inputs and product quality and has greatly limited the potential for bioprocess optimization.
The inefficiencies associated with the QbT approach along with more stringent regulatory requirements have led to manufacturers investing more in drug discovery than process understanding and optimization, which in turn, has translated into a decrease in the costeffectiveness, number and quality of innovative drugs and pharmaceutical manufacturing processes [67,69]. In order to overcome these limitations, regulators and industry specialists have proposed the implementation of Quality by Design (QbD) concepts to the manufacture of all new drugs, including therapeutic proteins, in the development pipeline [68,72,76].

Quality by design initiative in pharmaceuticals and how it is expected to affect biopharmaceuticals in the future
Pharmaceutical QbD is a conceptual framework for the development and approval of pharmaceutical manufacturing processes that aims to build quality (particularly with respect to safety and therapeutic efficacy) into the product at every stage of process development [10,71,72]. Application of QbD principles to pharmaceutical process development is outlined in the Process Analytical Technology (PAT) guideline "PAT -A Framework for Innovative Pharmaceutical Manufacturing and Quality Assurance" [77] by the US Federal Drug Administration (USFDA) and in the guidance documents "ICH Q8 Pharmaceutical Development" [71], "ICH Q9 Quality Risk Management" [78] and "ICH Q10 Pharmaceutical Quality Systems" [12] from the International Conference on Harmonization (ICH), which is an association constituted by the USFDA, the European Medicines Agency (EMA), the Pharmaceuticals and Medical Devices Agency of Japan (PMDA) and several experts from the pharmaceutical industry.
Implementation of QbD to pharmaceutical manufacturing processes is an informationdriven process where all available knowledge on the drug product including, but not limited to, its therapeutic mechanisms, its process of manufacture and potential sources of variability is used to define a range of manufacturing conditions that will ultimately ensure product safety and efficacy when administered to patients. More specifically, the ICH guidelines [12,71,78] define the following requirements for the implementation of QbD to pharmaceutical processes:

Definition of the quality target product profile (QTPP)
The QTPP is defined as the set of quality characteristics that would ideally be achieved to ensure that the drug product is safe and efficacious. Considerations to define the QTPP include the route of administration, dosage form, delivery systems, dosage strength, sterility, purity and stability [71].

Identification of the critical quality attributes (CQAs) of the drug product
A CQA is defined in ICH Q8 as "a physical, chemical, biological, or microbiological property or characteristic that should be within an appropriate limit, range, or distribution to ensure the desired product quality" [71]. As the definition implies, identification of CQAs requires thorough physicochemical and biological characterization of the drug product and in-depth knowledge on which of its properties have a higher influence on its safety and efficacy.

Identification of process inputs that affect product CQAs
Once CQAs are identified, it is necessary to determine not only which process inputs (raw materials and process conditions) impact CQAs, but also how these inputs interact to affect the drug product's CQAs. We must note that much of the required knowledge may not be available for certain processes (particularly novel ones) and should be established through the combination of prior knowledge, mechanistic modelling, experimentation and finally, a risk assessment so that the influence of material attributes and process conditions on CQAs is ranked according to likelihood and extent of impact. It is worth mentioning that, under QbD, there is special emphasis on design of experiments (DoE) so that the interaction between individual process inputs and their impact on product CQAs is represented. Crucially, all elements involved in this section directly couple manufacturing process conditions with the CQAs of the drug product in a robust, systematic and informationdriven manner.

Selection of the appropriate manufacturing process
With the ranking obtained through the risk assessment mentioned above, a multidimensional design space of allowable process input values (and combination thereof) is defined. ICH Q8 defines the design space as "an established multidimensional combination and interaction of material attributes and/or process parameters demonstrated to provide assurance of quality" [71]. Using the design space as a guide, the manufacturing process which is most capable of maintaining process conditions within the ranges that ensure product quality is defined. The selected manufacturing process must be robust such that it minimizes the risk of process conditions falling outside the design space, thus increasing the likelihood of the CQAs being within the range that ensures safety and efficacy.

Definition of a control strategy
With the appropriate manufacturing process in place, a strategy to mitigate the risk of materials and process conditions falling outside acceptable ranges must be established. By applying this risk management strategy, raw material specifications are monitored and controlled such that no impact is observed on the drug product's CQAs. In addition, the process parameters that influence the CQAs are controlled (ideally through online measurements and tight control systems) at every stage of the manufacturing process so that the desired product quality is met.
Conceptually underlying all the elements that constitute QbD implementation is process analytical technology (PAT). PAT is defined by the USFDA and the ICH as "a system for designing, analyzing and controlling manufacturing through timely measurements of CQAs and performance attributes of raw and in-process materials and processes with the goal of ensuring final product quality" [77]. The elements of PAT concerning process design and control have been presented in the description of QbD above. However, more discussion must be provided on the "timely measurement of CQAs and performance attributes" [77] throughout the implementation of QbD to manufacturing process development. It is clear that, from very early stages of QbD-driven process development, methods for accurately identifying and measuring the physical, chemical and biological properties of drug products are essential for defining QTPPs and CQAs. Further along in the implementation of QbD, it is also necessary to accurately measure the material attributes and process parameters that affect product CQAs. Finally, and as the definition of PAT states, once the manufacturing process is selected, appropriate analytical technologies are necessary to monitor it is necessary to measure process parameters in a timely fashion (ideally online) so that the generated data can be used for process control. From this, it is clear that analytical methods constitute a core element throughout process development under the QbD scope [79].
Adoption of the QbD in pharmaceutical process development aims to address all of the limitations described above. The three major regulatory bodies (USFDA, EMA and PMDA) are encouraging implementation of the QbD approach for the development of all new drugs in the pipeline [68,70,72,80]. QbD is expected to reduce process approval time and costs, reduce regulatory intervention and encourage optimization and innovation by building processes around the mechanistic relationships between inputs and product quality. Because these relationships should be based on sound science and engineering principles, process outputs are more predictable and require less regulation which, in turn, would considerably reduce approval time and development costs. In addition, the more ample design space created through QbD would allow inputs to vary more without the need for additional approval. Predictability would also translate into much tighter control systems that would dramatically reduce the likelihood of generating product with unacceptable quality. Finally, the wealth of knowledge generated by the QbD approach along with the broader design space and more flexible regulatory approval characteristics would encourage process optimization and could potentially contribute to the development of novel processes as well as the discovery and design of next generation drugs.
Since the guidelines for QbD were first drafted, the framework has been implemented in the field of small-molecule therapeutics (SMTs) with relative ease. In contrast, the implementation of QbD to protein therapeutics (PTs) has met more resistance. This is likely due to the fact that the physical and chemical processes underlying the manufacture of SMTs is better understood and the mechanisms relating process inputs with SMT quality are easier to define. Conversely, the mechanisms by which living organisms produce PTs are less well understood, and the structural complexity of PTs makes their isolation, separation, purification and overall quality control much more challenging. Despite this, the regulatory agencies and several authors believe that sufficient knowledge is available or can be gained through current experimental and modelling methodologies to elucidate mechanistic relations between bioprocesses and PT quality, thus allowing for implementation of QbD principles to the development of therapeutic protein manufacturing processes in the near future [68,72].
Implementation of the QbD paradigm in biopharma should increase knowledge on the therapeutic mechanisms of biotherapeutics considerably. This could lead to the improvement of previously existing products and may contribute to the discovery and development of new biologics. It will also generate a rich and systematized knowledge base relating manufacturing conditions with drug products which may lead to more robust process control, process optimization and, potentially, development of novel and efficient platforms for the manufacture of biological therapeutics. Implied in this is the considerable reduction in approval times which would heavily reduce costs of product and process development and eventually translate into lower costs for healthcare providers and patients.

Other critical quality attributes of protein-based therapeutics
Several drug characteristics other than glycosylation are considered critical quality attributes due to their impact on biological activity, pharmacokinetics or pharmacodynamics, and safety in terms of immunogenicity and toxicity. These can arise from variations in protein structure or from the presence of other adventitious molecules in the product formulation. Protein aggregation is a common protein-related CQA, which can occur at any stage of protein production or processing due to the protein structure, which may leave hydrophilic patches exposed, its concentration in the preparation or the process conditions. Temperature and pH extremes or physical stress can increase a protein's propensity to aggregate. Although protein aggregation is known to affect efficacy, the main concern is the immunogenicity of aggregates [81,82]. Protein aggregation may also be the result of modification reactions, such as oxidation, which is caused by increased levels of reactive oxygen species. This can additionally affect function, as, for example, in the case of antibodies where oxidation can alter the Fc structure and thus reduce its binding affinity [83]. Conformational changes can also occur at refolding steps of bacterial production platforms and are potentially immunogenic.
Other possible structural changes include protein fragmentation, due to proteolytic enzymes present in the cell culture supernatant or in human plasma, extreme pH or temperature conditions, or because of chemical disruption of peptide bonds, C-and N-terminal truncation and deamidation. The effect of fragmentation is product-dependent, however it is known that it can impact biological activity, serum half life and immunogenicity due to the generation of novel epitopes [84,85], whereas the remaining aforementioned changes do not appear to adversely affect product potency or safety [86][87][88][89], with the exception of deamidation within a complimentarity-determining region, which can affect biological activity [86]. Finally, glycation is another post-translational modification that involves the chemical addition of a monosaccharide on the side chain of a lysine residue. It occurs when a protein is incubated in the presence of reducing sugars, especially fructose and galactose and to a lesser extent glucose, in cell culture [90], and can affect its biological activity [91].
In addition to the product-related critical quality attributes described above, the host cell line, raw materials and process operation can introduce impurities or contaminants with adverse effects on the formulation's suitability for in vivo use. Host cell proteins are released mostly at later stages of the cell culture due to cell lysis and can be immunogenic [92], particularly when originating from microbial systems [93]. Additionally, host cell DNA poses considerable risk due to its potential integration and the possibility of a resulting carcinogenic effect. For this reason, the host cell DNA level cannot exceed 10ng per dose [94]. Impurities and contaminants can further be introduced from raw materials and lack of aseptic conditions. The most significant of these in terms of the risk they pose to product integrity are viruses, microbial cells and their products, such as endotoxins, which are highly toxic to humans [95]. Due to the severity of the effects of human injection with such contaminants, sufficient clearance must be demonstrated to regulatory authorities for approval to be gained. A thorough review of the above critical quality attributes and their effects on product safety and efficacy is presented in [96].

Challenges in implementing QbD in biopharma
Implementation of QbD to bioprocesses first requires thorough characterization of the drug substance (i.e. the therapeutic protein) in order to determine the attributes that define its safety and efficacy. Drug substance characterization, which in many cases is a challenging task, is usually done with liquid chromatography peptide mapping combined with mass spectrometry for amino acid sequencing [97][98][99], x-ray crystallography for three dimensional structure and different analytical chromatographic or electrophoretic techniques coupled to mass spectrometry for the analysis of N and O glycans which have been extensively reviewed by del Val et al [10]. The propensity of protein aggregation and fragmentation is usually measured through size exclusion chromatography [100][101][102]. By coupling these techniques to non-clinical (in vitro assays), preclinical and clinical trials, the CQAs for protein therapeutics are defined. However, despite the available methods for characterizing proteins and their CQAs, these are not always deemed sufficient to ensure identity. This is evidenced by current US and European legislation concerning Biosimilar products. Both the USFDA and the EMA have required phase I and phase III clinical trials in order to establish that a follow-on product is similar to its brand-named counterpart even when they have been shown to have equivalent CQA attributes according to the available analytical methods [103]. Lack of confidence in the current analytical techniques for drug substance characterization is a critical challenge that must be overcome for appropriate implementation of QbD to biopharmaceutical processes. Development of additional characterization methods are required to compare the product CQAs with the QTPPs defined during the early stages of process development. Furthermore, development of additional non-clinical studies for determining product safety and efficacy is required. In vitro assays for product safety and efficacy would dramatically reduce clinical trial associated costs and streamline data acquisition for determination of product CQAs.
The next step under the QbD scope is to define the manufacturing process, the process inputs that affect product CQAs and the mechanisms by which this occurs. In contrast to their small-molecule counterparts which are largely produced in vitro, therapeutic proteins are produced by living organisms. Cellular metabolism is extremely complex and several mechanisms by which cells produce therapeutic proteins are yet to be fully described. Moreover, very little information that quantitatively relates process conditions with cell metabolism, protein synthesis and product CQAs is available. According to QbD guidelines, quantitative mechanistic models are ideal tools for relating process inputs with product CQAs. The challenge of relating process inputs with product CQAs may be overcome by additional data generation through an iterative process of DoE-aided experimentation and mechanistic modelling. Through this, sufficient data would be generated such that the effects of process inputs on product CQAs are used to define an enhanced design space that could lead to higher assurance of product safety and efficacy.
After defining the design space, a control strategy must be defined so that process inputs are maintained within the range that ensures product quality. A crucial challenge in defining the control strategy is the ability to measure the process inputs that influence product CQAs. The QbD and PAT guidance documents suggest that control should be established through timely measurements which, ideally, should be performed online. This, again, is not trivial in a biological system. The interior of a bioreactor is a complex environment mainly composed of culture medium, cells, product and co-products of cell culture. The culture medium itself is a complex mixture of nutrients. Many of the common process parameters are readily measurable such as pO2, pCO2, temperature cell density and certain metabolite concentrations. However, many of the single components that influence product CQAs are difficult to measure in such a complex mixture. Several analytical methods that have the ability of tracking single components in bioreactors are currently being explored. Some of the most promising technologies are based on infrared spectroscopy, and have been reviewed recently by Landgrebe et al. [104]. Successful implementation of these techniques may lead to absolute online monitoring and process control. In parallel, methods to measure the intracellular concentration of key nutrients and metabolites online are being developed with promising results [105]. On the other hand, mathematical modelling efforts are being developed to describe complex biological processes such as N-linked glycosylation. This work has yielded encouraging results for bioprocess control and optimization, and could potentially aid in cell line development for third generation therapeutic proteins [106].
Almost by definition, QbD is a self-catalytic process because it relies on, and generates, a wealth of information. The more QbD is implemented in bioprocess development and approval, further understanding will be gained and fed into the development of newer processes which will eventually culminate in near-complete description of therapeutic mechanisms, drug product CQAs and therapeutic protein bioprocessing.

Overview of production organisms and manufacturing environment
The majority of approved biopharmaceuticals are produced in mammalian cell culture systems, since they are the sole means to deliver proteins with desired glycosylation patterns and thus ensure reduced immunogenicity and higher in vivo efficacy and stability [32,107,108]. However, mammalian cell culture delivers a heterogeneous mixture of glycan structures which do not all have the same properties. Product half-life and activity is therefore compromised, while higher doses are required for efficacy.
As in April 2012, there are 77 therapeutic glycoproteins out of the total 642 drugs approved by the European Medicines Agency (EMA). Host systems for their production include mammalian cells (65 drugs) and transgenic animals (2 drugs), while several are isolated from the blood plasma of healthy donors (10 drugs), as depicted in Figure 4A. Therapeutic classes of each glycoprotein drug are also presented in Figure 4B and involve mainly: hereditary diseases (Haemophilia A and B, Fabry disease, Gaucher disease, and others; 29% of EMA approved glycoproteins), cancer (leukemia; cancer in ascites; thyroid, stomach, breast, colorectal, etc. cancers or anaemia caused by chronic cancer; 26% of EMA approved glycoproteins), and autoimmune disorders (rheumatoid arthritis, multiple sclerosis, Crohn's disease, Lupus Erythematosus; 18%). Other therapeutic areas for which glycoprotein drugs are prescribed include infertility, acquired injuries/disease (tibial fractures, spondylolisthesis, myocardial infarction), immunosuppresants during transplantations, for hemostasis after surgery, for anaemia caused by chronic kidney disorders, postmenopausal diseases (osteoporosis) and one against the respiratory syncytial virus (containing a monoclonal antibody as an active substance: Synagis®).
Two other glycoproteins (Leukoscan® and Scintinum®) have been approved by EMA,but are used for radionuclide imaging rather than therapeutics and thus have not been included in the statistics shown in Figure 4A. Furthermore, denosumab a monoclonal antibody produced by Amgen is approved under two brand names Prolia® (for postmenopausal osteoporosis) and as Xgeva® (for bone metastasis in cancer), and hence only one has been included in final list of approved drugs. Two more mAbs have been approved for use in specific EU countries but are not approved by EMA and hence have also not been included (Orthoclone OKT3® and Reopro®). Finally, two drugs that contain the recombinant factor VIII as active ingredient are produced by Bayer Pharma AG using baby hamster kidney (BHK) cells as host cells from an identical fermentation procedure, but since they are purified through slightly different downstream processes, both have been included separately in the statistics shown in Figure 4. Vaccines are also not included in the list of glycoproteins because their production mainly involves the propagation of viruses and not specific glycoprotein production. Moreover, all recombinant vaccines are produced in microbial organisms and hence do not involve glycoprotein production, but amino acid sequences of antigens that are not glycosylated.
CHO (Chinese hamster ovary) cells are the dominant host cells for the production of glycoproteins as far as the approvals in European Union are concerned, with 47 out of the total 77 therapeutic EMA approved glycoprotein drugs using them as host cells. Five drugs produced in CHO cells are biosimilars of recombinant erythropoietin (with reference product being a recombinant epoetin produced from Janssen-Cilag GmbH with brand names Eprex® in UK and Erypo® in Germany). All aforementioned biosimilars were approved in 2007. Biosimilars are defined from the EMA as drugs similar to a biological drug already in the market that are used to treat the same disease and within the same range of doses. Hybridoma cells are the second most frequently used hosts for recombinant glycoprotein production (12 drugs), followed by glycoproteins purified from blood plasma from healthy donors (10 drugs). BHK cells, along with HT-1080 cells, a human sarcoma cell line, with three approved drugs from each cell line are fourth. All approved drugs produced in BHK cells involve blood factors. Regarding HT-1080 cell line, all of the drugs produced from it belong to Shire pharmaceuticals portfolio. The active substances of these drugs involve therapeutic enzymes, while these drugs are also classified as orphans. Orphan drugs, according to the EMA, are prescribed to treat diseases that do not affect more than 5 in 10,000 individuals in the European Union at the time that the drug was submitted for approval. Finally, the host system classification list is completed with transgenic animals. Currently, two drugs are produced from transgenic animals and involve two serpins (serine protease inhibitors): antithrombin III (Atryn®) and C1 esterase inhibitor (conestat alfa; Ruconest®) isolated from goat and rabbit milk, respectively.

Mammalian cell systems
Mouse myeloma cell lines SP2/0 and NS0 are frequently used for research purposes [110], while human embryonic kidney 293 (HEK293) cells are often utilized to produce material for pre-clinical trials or research [111]. CHO cells are the workhorse of industrial production because of their ability to do gene amplification, which increases the level of product specificity and the ability to grow in serum-free suspension conditions [112]. Similarly, the myeloma cell-derived rodent NS0 cells have a high efficiency in producing recombinant immunoglobulin proteins and they can be cultured in both serum or serum/protein free suspension which reduces manufacturing costs during large-scale protein production [113]. Since CHO and NS0 originate from different mammalian species, the amount and type of glycosidases and transferases they have varies. Studies have reported that NS0 cells can produce glycoproteins that are highly immunogenic to human [114].
Human cells, on the other hand, are alternative hosts. The fact that human cells generate recombinant products with PTMs similar to the native counterparts secreted inside the human body gives them an advantage over CHO and NS0, which lack some essential glycosylation enzymes, e.g. bisecting N-acetylglucosamine transferase and 2-6 sialyltransferase. HEK 293 cells is the most-widely used human cell line for research into recombinant protein production and studies show that HEK 293 cells are capable of manufacturing Xigris (activated protein C) with proper -carboxylation and propeptide digestion at its glutamate amino acid residues, which CHO cells fail to offer [115]. In addition, the main advantage of protein expression in human cells is the low level of immunogenic reactions, with limited expression of N-glycolylneuraminic acid (Neu5Gc)bearing erythropoietin (EPO) in human HT1080 fibrosarcoma cell line than the CHOderived ones [116]. Some new human-based expression cell lines have been developed and are increasingly adopted by industry. For example, the Per.C6 cell line derived from the human retina cells have shown to give a protein titre of more than 2 g/L in simple fed-batch culture [117]. Unlike CHO cells, which often require product selection, the Per.C6 cell line does not require any gene amplification or selection strategy, nor does it need a large gene copy number for stable protein expression [118]. Per.C6 derived-EPO is free of Neu5Gc thus limiting immunogenicity [119].  [109]. * Actual total number of drugs is 80, but two drugs are used for radionuclide imaging and hence do not consist therapeutic proteins and one active substance has been approved two times, for details see main text. † Hereditary over autoimmune diseases at some cases are difficult to be distinguished, herein as hereditary diseases those that involve being born with the symptoms of the disease have been accounted for. In this graph, also the drugs Prolia and Xgeva that contain the same active substance but are authorised to treat different diseases have been individually taken into consideration and hence, the number of EMA glycoprotein therapeutic approved drugs for this case are 78.

Microbial cell systems
Microbial cell systems from both eukaryotic and prokaryotic origin are commonly used for the expression of heterologous proteins. They have long been established for use in the production of recombinant proteins. This is due to their ease of use, the vast wealth of genetic and biochemical knowledge which has amassed, high protein yield and inexpensive production costs. The large availability of well-characterised host strains coupled with extensive customized expression vectors can provide manufacturers with functional protein of high yields.
However, these microbial expression systems generally lack the ability to perform humanlike PTMs. Prokaryotic systems have difficulties in forming disulphide bonds and many species, such as Escherichia coli, lack the machinery to perform glycosylation. Glycosylation was always felt to be exclusive to eukaryotic domain, however the discovery that the pathogenic Campylobacter jejuni, associated with gastroenteritis in humans, can perform N-linked glycosylation [120] has changed this notion. However, there are several notable and important differences between prokaryotic and eukaryotic N-glycosylation (Table 1).
Waker and collaborators identified the native glycosylation capability in Campylobacter jejuni and demonstrated that their N-linked glycosylation pathway was also functional when transferred to Escherichia coli [121]. The C. jejuni glycosylation machinery is encoded by the gene locus pgl, consisting of 12 genes encoding for various glycosyltransferases (GTase), enzymes involved in sugar biosynthesis and an oligosaccharyltransferase (OT), all of which share similarity to their eukaryotic counterparts [122]. As in eukaryotes lipidbound sugar chain is transfer en bloc onto an asparagine amino acid in a specific sequence. Their work shows that it is perhaps possible to clone a universal N-linked glycosylation cassette in E. coli.
A research team from Cornell University have used a bottom-up engineering approach to assemble a synthetic glycosylation pathway within E. coli to produce human-like glycoproteins. The challenge was to introduce a pathway which would lead to an Asn-GlcNAc linkage, then couple this with the mammalian glycosylation pathway which would generate an acceptable sugar moiety, but this was achieved using metabolic engineering ( [123] [124] [125]. The E. coli was then further engineered to synthesis a mannose3-N-acetylglucosamine2 (Man3GlcNAc2) glycan chain, a common core structure shared in all eukaryotes [126]. However, many challenges still remain. Most notably, the yield of glycosylated protein in this system is extremely low (~1%). In addition, the currently identified bacterial oligosaccharide transferases and known homologs do not to recognise the triplet eukaryotic consensus amino acid sequence, but rather a longer sequence which means protein engineering of glycoproteins will be required in order to use a microbial system. Finally, further sugar addition reactions are required to the core structure Man3GlcNAc2 to produce viable glycoproteins and each of these must be engineered separately into the host system. Genetic engineering of E. coli is an ambitious task because the species does not naturally contain any N-glycosylation machinery, meaning everything must be inserted from the ground up. Non-mammalian eukaryotes are also seen as attractive expression systems and possess an N-glycosylation pathway with a common core structure, although the endproduct is remarkably different. Although yeast can perform many PTMs, similar to higher eukaryotic cells, they produce a non-human N-glycosylation profile of high mannose content, which can elicit an immune response in humans. Therefore microbial expression systems are only employed in the production of therapeutic proteins if the protein is functional without PTM, such as human insulin which is produced in E. coli [128], or where the PTM is required for folding and stability but does not affect drug efficacy, such as in several vaccines which are produced in Pichia pastoris [129,130] (examples are presented in Table 2).
Recently steps have been taken to genetically engineer humanized N-glycosylation pathways in non-mammalian eukaryotic expression systems such as yeast (Pichia Pastoris) and insect cell line Sf9. A synthetic glycosylation pathway has been established within the methyltrophic yeast species Pichia pastoris [131][132][133]. The challenge has been to remove hypermannosylation and replace this with glycosylation machinery to produce a more human-like glycan profile. This has been achieved by the removal (knock-out) of α-1-6mannosyltransferase enzyme (OCH1 in P. pastoris) and replacing this with various mammalian GTases which will synthesise a human-like glycan chain.
The glyco-engineered P. pastoris developed by GlycoFi (a subsidiary of Merck & Co. Inc.) is at a more advanced stage than the E. coli expression systems. The humanized P. pastoris contains all the machinery to produce an N-Glycan chain of complex type, including the gene for the most complex step of human N-glycosylation, terminal sialylation. In total the engineered P. pastoris strain contains a set of 14 mammalian genes integrated into its genome. These include glycosyltransferases, enzymes involved in sugar biosynthesis and sugar transporters. The result has been the successful production of a glycoprotein with and an oligosaccharide of the human complex type a highly homogeneous oligosaccharide of the human complex type [131][132][133].
The N-glycosylation pathway in insects is more similar to mammalian eukaryotic species than P. pastoris. Insect cell lines do not hypermannosylate but trim N-glycan to the core structure before adding GlcNAc to a mannose sugar, therefore halting N-glycosylation maturation before mammalian cells, and not gaining a complex or hybrid glycan chain [134].
The glycoengineering of insect cell lines does not require all the mammalian genes added in P. pastoris to reach an oligosaccharide of the human complex type.
There are several examples of glycoengineered lepidopteran insect cell lines [135][136][137][138]. Specifically, a research group from University of Wyoming have transformed Spodoptera frugiperda (Sf9) cells with higher eukaryotic genes encoding for N-glycosylation pathway including gene products to enable terminal sialylation [136,139]. Unlike previous glycoengineered species, which have incorporated genes into the genome, the glycosylation genes are placed under inducible plasmids in order to reduce metabolic overload, negative impact on growth rate, and long-term instability [139]. The mammalian glycosylation genes are present on three vectors encoding six mammalian genes in total. These plasmids were transformed into Sf9 cell line which resulted in the sialylated N-glycosylation of a glycoprotein.

Manufacturing conditions
From the discussion relating glycan moieties of IFN-β, EPO and both the antibodies' Fc as well as the Fab region to in vivo function, it becomes apparent that exerting control over the glycoform of a biotherapeutics would be highly desirable. We know that many process conditions will affect the glycoform such as high CO2 concentration, which will lead to increased osmolality and can limit growth, antibody production and also affect the glycoform [140]. Similarly culture modes, growth phase, and temperature among other factors will have significant effects on the quality of biotechnology products. This has been extensively summarised by del Val et al. [10] and an overview is presented in Table 3.
Concentration of metabolites inhibits GalT and SiaT activity and mislocalizes these compounds. [143] Table 3. Effect of bioprocess conditions on therapeutic protein glycosylation. The upward arrow denotes increase and the downward arrow represents decrease.
From Table 3, it is evident that many process parameters influence protein N-linked glycosylation. These effects could initially be seen as potential sources for variability. However, if their underlying mechanisms were to be fully understood quantitatively, they could serve as variables for the modulation and control of glycosylation-associated quality attributes of therapeutic proteins.

Medium formulation
When formulating media for mammalian cell culture for the production of secreted glycoproteins, the carbon source is of paramount importance. The rate at which glycosyltransferases can process glycans is subject to changes in substrate concentrations. The substrates of the glycosyltransferase reactions are the antibodies as well as nucleotide sugar donors (NSD). In the case of biotherapeutics, the important NSDs are UDP-GlcNAc, UDP-Gal, GDP-Fuc CMP-NeuAc and CMP-NeuGc as well as UDP-GalNAc when murine cell lines are employed over CHO cells. High levels of CMP-NeuGc terminating carbohydrate moieties are unfavorable in a therapeutic context as they are oncofetal and potentially immunogenic [163,164]. The metabolism pathways and biosynthesis of sugars in mammalian cells are well known and are graphically represented in Figure 5 where the transport into the Golgi is also indicated. Expanding on this, the glycoform can be controlled and the concept of feeding strategies is based on the hypothesis that addition of specific metabolic intermediates of the nucleotide-sugar biosynthesis to culture medium will drive metabolic flux towards the desired NSD and eventually influence the glycoform through the increase of desired rates of reaction [165]. Attention must be paid though to a number of inhibitory mechanisms which naturally regulate the depicted metabolic network.
In contrast to supplementing the culture medium with glycosylation substrates to increase certain reaction rates, other strategies have consisted of adding glycosylation reaction inhibitors to achieve desired glycoforms. More specifically, non-reactive fucose analogues have been added to mammalian culture medium to avoid core fucosylation of mAb Fc glycans [166]. Others have added mannosidase inhibitors to prevent mAb Fc oligosaccharides form reaching more processed states [167].

In-process analysis of glycoproteins
In general, the complete analysis of oligosaccharide structures and linkages requires a series of steps; for example, enzymatic fragmentation, chromatographic separation steps and either mass spectrometry (MS) or nuclear magnetic resonance spectroscopy (NMR) to determine the chemical structure of the fractions [168]. The technology for these offline analyses has improved in terms of the amount of material required, the sensitivity of the method, and the speed of analysis, but there is still room for improvement in miniaturization, speed, and throughput when it comes to analytics for bioprocessing. Particularly if analyses are going to be used to inform manufacturing operations in realtime, then faster methods with lower sample requirements will be required. Preferably, these should also be automatable to reduce the requirement for human expertise. Early attempts at the high throughput analysis of post-translational modifications included a microtitre plate-based assay, where a series of steps including capture, desalting and reduction on beads, followed by elution, tryptic digestion, fractionation, and isolation of Nlinked glycopeptides were performed in low volume before MALDI-TOF MS was used to identify structures [169]. While this method had the advantage of small sample volume, it still was a lengthy procedure and while technically automatable, would be difficult to implement in-process.
One of the most important advances is the introduction of online methods for product analysis which will eventually enable real-time or quasi-real time control over the bioreactor environment in order to influence the product quality. Towards this end a number of new systems have been developed usually based on automated sampling followed by desalting via HPLC and analyses that have significantly shorter timescales from sampling to information.
For example,a two-dimensional system capable of analyzing up to 6 fractions from a separations operation was recently reported. Using a single HPLC system capable of running two columns simultaneously with independent gradients and switching between columns coupled to ESI-MS, fractions from size exclusion or ion exchange chromatography steps were analyzed for charge heterogeneity and size variation. An online concentration step was used to analyse dilute fractions to elucidate minor size variants. While unable to give complete peptide mapping of glycoforms, this represents a first step towards monitoring some of the important QbD properties online including glycoform heterogeneity and N-cyclisation [170]. In another recent report, a method for the rapid detection and differentiation of sialic acids by HPLC was developed that is capable of analysing the content of N-acetylneuraminic acid versus N-glycolylneuraminic acid in about five minutes [171] simply by using a shorter column.
In a further example, Mittermayr et al, made a serendipitous discovery which might lead to more rapid sample processing times when analysing glycans from the Fc region of mAbs. They initially sought to compare the analysis capabilities of a newly developed hydrophilic interaction chromatography method with capillary electrophoresis coupled to laser induced fluorescence analysis, but found that the techniques were actually highly complementary in their resolving power. Thus, using a combination of the two in a 2D analysis, separation time for each could be reduced to 20 minutes. However, some very similar structures could not be resolved from each other, making this method incomplete [172].
Since measuring glycoform profile in real-time will remain a challenge in the near future, another approach is to measure surrogate markers that correlate with the glycan structure (and ultimately make a link using metabolic modelling). This idea is very similar to finding biomarkers for diagnostic indicators for disease-in essence, one or more 'biomarkers' for particular glycan structures of interest (e.g. high mannose, highly branched, or high levels of sialic acid endcapping) should exist. Once these are identified a fluorescent, in vivo biosensor can be designed to monitor each 'biomarker' non-invasively in real-time. A variety of fluorescence monitoring equipment is available to accurately determine fluorescence levels in small volumes and in high throughput and this can then be exploited for process design, medium formulation, and cell line engineering experiments. We have demonstrated the utility of FRET-based biosensors for monitoring essential metabolites such as glucose and glutamine [173]. These can then be paired with metabolic models to predict the trajectory of glycoforms given the current nutrient availability.
Also interesting are a suite of tools developed initially for glycomic analysis of whole cell samples (often in the context of disease investigations), but which might be adaptable to monitoring proteins secreted during bioprocessing.
One of the most promising of these techniques is the lectin microarray. Lectins are naturally occurring carbohydrate binding proteins which show some degree of specificity and have been shown to detect glycoproteins in less than 1 pg amounts [174]. Previous work has used lectins conjugated to chromatography resin (e.g [175]) or to magnetic beads for recovering glycan containing proteins from a complex mixture, for example serum [176]. The process can be done in small volumes using microtitre plates and the glycoproteins can later be analysed by MS in a process that can be automated by the use of liquid handling apparati [177].
Lectins immobilised to beads and fluorescence quenching of quantum dots have also been used to quantify specific types of glycoforms, suggesting a proof-of-principle for lectin based analysis [178]. However, the main issue with lectin-based detection as a standard is that the specificity of individual lectins is broad, and therefore, specific detection of individual glycoforms is currently not possible. However, the possibility of using protein engineering or synthetic biology techniques to evolve panels of lectins with precise and varying specificity exists and could result in a platform amenable to high throughput inprocess detection [179].

Experimental strategies for cell line modification
Another strategy to reduce glycan heterogeneity is genetic modification of cell lines. This can be through the overexpression or knockout of the genes encoding for glycosyl transferase enzymes. Such attempts aim to produce mAbs with specific biological functions or to avoid the expression of mAbs with glycan structures that are potentially immunogenic to humans. Umaña's group overexpressed N-acetylglucosylaminlytransferase III and V (GnT III & V) in a tetracyclin-regulated manner, with an aim to introduce bisecting GlcNAc and tri-antennary structures respectively in CHO-DUKX cells. Despite the successful production of mAb with desired glycan structure, overexpression of GnT III & V enzymes greatly impeded cell growth [180]. In the same year, Weikert and collaborators suggested the possibility of genetically engineering CHO cells for terminal galactose or sialic acid addition in order to encourage CDC and modulate inflammation. This group showed that overexpression of human -1,4 galactosyltransferase (GT) or -2,3 sialyltransferase (ST) genes reduced the level of terminal GlcNAc and more than 90 % of IgG Fc-oligosaccharides were sialylated [181]. The genetic knockout of -1,6 fucosyltransferase (FUT8) gene achieved via constitutive expression of small interfering RNA (siRNA) produced mAbs where around 60% of them were defucosylated. This increased ADCC activity up to 100-fold in in vitro assays [182]. Genetic engineering is therefore a potential approach to produce mAbs under the QbD strategy, but optimisation is indeed required to minimise possible side-effects.
Chaperone engineering can be another key parameter in boosting productivity. ER chaperones are responsible for correct protein folding and co-overexpression of chaperone genes and other regulatory elements (e.g. ERp57, calnexin/calreticulin, and/or protein disulfide isomerise) exhibited position effects in Productivity. Disulfide isomerase in particular increased specific mAb productivity by 55% in transient gene expression, but there were no/negative effects in stable gene expression [33]. In addition, targeting the unfolded protein response (UPR) pathway is another approach to enhancing recombinant protein yield. The spliced form of X-box binding protein 1 (XBP-1s) is the spliced form of the parental XBP-1 protein and only exists upon the induction of ER-stress. Studies showed that overexpressing XBP-1s in mAb expressing-CHO-T cells under hypothermic condition increased total mAb concentration by 36 % [26].

In silico protein glycosylation studies
Mathematical modeling is a powerful tool for in silico studies of complex phenomena. A high-fidelity model of protein production and glycosylation would be useful for bioprocess design, culture media formulation, and the design of genetic engineering strategies. Early computational studies concerning intracellular glycosylation highlighted particular aspects of the biological machinery. Monica et al. investigated the role of diffusion in the trans-Golgi network on limited sialylation [183]. The model assumed an isotropic compartment with respect to both substrate and enzyme concentration and concluded that diffusion limitations are not significant with respect to the sialyltransferase-catalysed reactions. This finding is particularly significant with respect to other glycosyltransferase reactions as sialic acid is known to be at the lowest abundance compared to other nucleotide sugar species. Shelikoff et al. presented a first approach towards the mathematical modelling of macroheterogeneity in glycoproteins [184]. The work focused on the attachment of the glycan precursor to the Asp-X-Ser/Thr tripeptide sequence, which takes place in the ER. While site occupancy currently receives little attention in the production of mAbs, this aspect of glycosylation may be of greater significance in the near future as cellular antibody productivity keeps increasing and thus, placing more strain on the ER, which may lead to increased macroheterogeneity.
The first mathematical investigation of glycosylation microheterogeneity was carried out by Umaña et al. in 1997 as part of a study into the effect of glycosyltransferase overexpression in a mammalian cell line in order to exert control over the glycoform [185]. As part of this study, a Central Reaction Network (CRN) to monitor a total of 33 species comprising mannosidases, GlcNAc-transferases (GnTs) and terminating upon the addition of the first galactose, which prevents further processing through GnTs or ManII of the glycan structure [186], was proposed. The calculations of species abundance were based on enzyme concentrations as well as distribution, kinetic constants of reaction, protein half-life in Golgi, the Golgi volume and finally the specific glycoprotein productivity. The underlying mode of operation for the Golgi is assumed to be the vesicular transport model, which states that vesicles will bud off their respective Golgi compartment at their bulk concentration and fuse with the next compartment in series. This mode of operation can be idealised and viewed as four continuously stirred tank reactors (CSTRs) in series, representing the cis-, medial-, trans-Golgi cisternae and the trans-Golgi network. The work paid particular attention to GnTIII, which catalyses the transfer of a bi-secting GlcNAc to an agalactosylated glycan moiety upon which no further GnTs can act and thus, capping antennarity. The authors of the study examined the overexpression of the particular glycosyltransferase in a number of in silico experiments and confirmed their hypothesis that antennarity was reduced and hybrid glycan content increased. This study provided an important first insight into the power of mathematical modelling as an approach towards glycan engineering.
Krambeck and Betenbaugh extended the above described work through the inclusion of further glycosyltransferases and thus, expanding the number of structures from 33 to 7,565 structures resulting from a total of 22,871 reactions, which accounts for core fucosylation, galactosylation and sialylation of carbohydrate moieties [187]. The model was also not specific to a single glycan site and viewed the Golgi compartments as four CSTRs in series. Building on previous work, enzyme dissociation constants from experimental investigations were employed for each glycosyltransferase and competitive product inhibition was taken into account. Furthermore, the model was evaluated and fitted against experimental data, where the glycoform data was obtained from recombinant human thrombopoietin (TPO) in which an average of 5.4 occupied N-glycan sites have been reported per molecule [188]. Model optimisation was based on an averaged TPO glycan site, where enzyme concentrations were altered to give a closest fit to experimental data. Krambeck and Betenbaugh argue that while dissociation constants and kinetic data for the glycosyltransferases exist in literature, the enzyme concentration in the Golgi is cell line-dependent and will be subject to change based on culture conditions. Golgi-resident enzyme concentrations were changed to match data and resulted in improved model simulation results.
Krambeck et al. extended the model further in a follow up study, where the model was tailored to analyse glycoforms from mass spectrometric data [189]. Mass spectrometry is based on mass to charge ratio of species and, thus, will not be able to distinguish between different structures of same molecular mass. The presented model attempts to resolve the issue through the prediction of alternative structures content for the same mass spectrometric data peak and eventually screen for glycan disease markers in humans. The model was extended through the addition of further glycosylation enzymes to makeup a total of 19 glycosyltransferases and subsequently enzyme activities were adjusted to match normal and malignant human monocyte N-glycan mass spectra. Through application of limiting conditions based on prior knowledge, glycans of probable negligible abundance can be omitted resulting in a total of 10,000 -20,000 structures. The model gives valuable insight into changes in enzyme activity as a result of different diseases, but is rather limited in its application to the development and production of biotherapeutics, where large specificity with respect to the glycan structure and accessibility to the protein backbone is required for a highly accurate predictive model.
Hossler et al. attempted to improve the predictive ability of glycan distribution models through the variation of reaction-related variables [190]. This model was the first to discriminate between reaction mechanisms for different glycosylation enzymes. ManI and ManII were modelled assuming Michaelis-Menten kinetics with substrate competition; the remaining transferases were modelled using a rapid equilibrium, random, Bi-Bi mechanism. While all previous models assumed the vesicular transport regime, which is modelled as a series of CSTRs, this study explored the hypothesis of the Golgi maturation model, which states that each compartment undergoes a maturation process to transform from early cisternae to late cisternae. In an idealised case this can be described by a plug flow reactor (PFR) and, thus, travelling through a tubular reactor, representing the Golgi cisternae. Modelling the Golgi apparatus as a single reactor of constant enzyme concentration showed that a long enough residence time will lead to highly processed glycan structures and changes in glycosyltransferase concentration can lead to a targeted glycoform for most glycan species. However, the authors of the study argued that a total of four reactors will be required to accurately simulate changes in the enzyme concentration along the length of the Golgi apparatus. The results show a greater appearance of under-processed glycan structures in the final product and generally less deviation from the data obtained from the CSTR-in-series model. A decrease in protein residence time was shown to have a much larger impact on the four PFR model than the CSTR model, with many more under-processed glycans were observed for the PFR case. It was further shown that modifications of the enzyme concentrations for the PFR-in-series model could lead to the most targeted glycoform and thus, demonstrating enzyme localization to be a very potent approach in glycan engineering. Hossler et al. concluded that while the actual biological mechanism will be less idealised than assumed in the study, the PFR-in-series model will give a more true distribution as demonstrated by comparison with experimental data. Recently, Jimenez del Val et al. developed a model specifically tailored to the glycan found on the Asn297 position of an IgG antibody constant region [191]. The model includes a cisternal maturation approach and expands on rate expressions for various enzymes to include Michaelis-Menten, sequential Bi-Bi and random order Bi-Bi kinetics for specific glycosyltransferases as reported in previous literature for each enzyme. The model considers the Golgi apparatus to be a single PFR of constant diameter, no axial dispersion within the compartment, constant flow and no mass transfer limitation where enzyme recycling along the biological reactor length leads to changes in glycosyltransferase concentrations. As a result it proposes a novel representation of enzyme concentrations along the length of a PFR as normal distribution functions. The unknowns of the three parameter normal functions for the spatial distribution of the enzymes where found through optimization-based methods, where the minimum amount of total enzyme necessary to achieve terminal oligosaccharide processing, including 50% sialylation, was sought. A further extension of previous mathematical models was the incorporation of proteinmediated nucleotide sugar donor transport into the Golgi cisternae. Again, due to the assumption of Golgi-resident protein recycling, a distribution of transport proteins is expected along the length of the PFR, which was estimated using an optimisation-based method. The optimisation was based on the assumption that the rate of by-product dephosphorylation is much faster than nucleotide sugar donor accumulation and, therefore, parameter values for minimum transport protein concentration were determined. Further, it was argued that while the above distributions should not change significantly, the dissociation constants will differ for individual glycoproteins as well as glycan sites within a glycoprotein. This accounts for steric hindrance and much reduced sialylation in the antibody Fc region and, hence, by taking dissociation constants for various commercial mAbs from literature, a glycoform was obtained. A comparison with experimental data, the Krambeck and Bettenbaugh, as well as the Hossler et al. model showed that the hereby obtained mathematical tool presented the closest fit to experimental data for most glycan species analysed as shown in Figure 6. Furthermore the model was demonstrated to show good fit to experimental data under gene silencing as presented in Figure 7.

Conclusions and outlook
The main goal of QbD is to ensure product quality by building it into the manufacturing process. The initiative provides practical incentives, such as shorter approval times and higher flexibility towards changes in manufacturing process conditions, for industrial production. Despite the volume of data required for QbD, the benefits far outweigh the effort. The first goal of QbD for biopharmaceuticals should be to narrow the glycomic profile of glycoprotein-based drugs based on existing knowledge of the desired structures for the application at hand and of the effect of manufacturing conditions and media/feed formulation on the availability of nucleotide sugars and the resulting glycan profile. This can be achieved more effectively through the combination of fundamental biological techniques for cell engineering, methodologies allowing rapid glycan analysis (in particular in vivo biosensors for 'biomarkers' of the desired glycan structure), and rational engineering design of manufacturing conditions. Rapid analytical tools will allow us to examine more samples in-process with the aim of controlling and potentially optimising conditions in real time. An enabling tool is mathematical modelling, which, given its tremendous progress in successfully simulating the modification of complex glycoproteins, could in the future allow us to collect in process information about a fermentation run arising from various analyses at line and use it to infer the current state of the system and design improved operation strategies in terms of supplementation of nutrients or precursors, or adjustment of DOT, culture pH or other key conditions. At the same time, while CHO cells clearly remain the dominant host system, research on other, more prominent cell lines that provide better glycosylation, or higher yield remains active. Yeast, insect, plant cells, and transgenic animals are amongst the more likely host systems to replace CHO cells. However, a significant amount of research effort regarding further bioprocess optimisation, as well as how they can be engineered to produce complex therapeutic molecules, such as heparin is still ongoing for CHO cells. Given the continued investment in CHO research, the number of existing production platforms, and pending patents as well as the stringency of the pharmaceutical regulatory agencies, they will likely remain a relevant industrial production host for at least another few decades.
That being said, the ideal expression system would achieve a high level expression of recombinant protein at a low cost. This implies that microbial hosts would lead to a significant cost advantage. Advances in humanising the PTM machinery in microbial hosts, P. pastoris in particular, offer significant promise. However, the aforementioned research developments have been achieved in lab-scale fermentation. To reap their benefits, the glycosylation profile needs to be demonstrated to be consistently homogeneous in largescale fermentations. Overall, it is clear that the QbD initiative dictates a unified engineering and scientific approach to potentiate control over the glycomic profile of cell culture-derived protein-based drugs.