Preparing Proteoforms of Therapeutic Proteins for Top-Down Mass Spectrometry

A characteristic of many proteoforms, derived from a single gene, is their similarity regarding the composition of atoms, making their analysis very challenging. Many overexpressed recombinant proteins are strongly associated with this problem, especially recombinant therapeutic glycoproteins from large-scale productions. In contrast to small molecule drugs, which consist of a single defined molecule, therapeutic protein preparations are heterogenous mixtures of dozens or even hundreds of very similar species. With mass spectrometry, currently high-quality spectra of intact proteoforms can be obtained only, if the complexity of the mixture of individual proteoform-ions, entering the gas phase at the same time is low. Thus, prior to mass spectrometric analysis, an effective separation is required for getting fractions with a low number of individual proteoforms. This is especially true not only for recombinant therapeutic proteins, because of their huge heterogeneity, but also relevant for top-down proteomics. Purification of proteoforms is the bottleneck in analyzing intact proteoforms with mass spectrometry. This review is focusing on the current state of the art, especially of liquid chromatography for preparing proteoforms for mass spectrometric top-down analysis. The topic of therapeutic proteins has been chosen, because this group of proteins is most challenging regarding their proteoform analysis.


Introduction
The analysis of proteoforms, often also termed protein species or isoforms, is the next level in proteomics. The first comprehensive definition of this subgroup of proteins was published by Jungblut et al. [1] and Schlüter et al. [2], using the term "protein species". In 2013, Smith and Kelleher [3] introduced the term "proteoform", which today is widely accepted in the community of proteomics experts. The concept of "proteoform" is nearly identical with the concept of "protein species". The only difference is that the proteoform concept is gene-centric and the proteinspecies-concept is chemistry-centric.
For developing methods for comprehensive analysis of proteoforms, the group of therapeutic proteins is a suitable training area. Therapeutic proteins are known to be rich in the number of proteoforms. Although a therapeutic protein product is containing only trace amounts of impurities like host cell proteins, which are difficult to detect because of their very low concentration, the analysis of their proteoforms is very challenging because of their large number, their similarity and their low concentration compared to the main proteoform.

Analysis of proteoforms: challenges
The most common method in proteomics is the bottom-up or shotgun approach. It relies on the proteolytic cleavage of proteins by proteases like trypsin. The resulting peptide mixture is subjected to liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) analysis. Proteins are identified from the LC-MS/MS data by comparing the peptide fragment spectra against in-silico fragment spectra generated from a protein database [4]. As a rule of thumb, a protein is claimed to be identified, if at least two unique peptides are identified representing parts of the sequence. Thus, often a sequence coverage of 100% is not obtained. Consequently, if this is the case, it can be only stated that a product or several products (proteoforms) of a defined gene has been identified. No information about the identity of the underlying proteoform is obtained. It can even be assumed that the identified tryptic peptides may be products of several different proteoforms. For the characterization of a therapeutic protein, bottom-up proteomics is a standard method. The signals in the LC-MS chromatograms represent tryptic peptides of all proteoforms of the therapeutic protein. A defined tryptic peptide, which is present in all proteoforms, will form one single monoisotopic signal. Its signal intensity represents the sum of this peptide from the different species. The presence of an individual proteoform only can be detected, if this proteoform will yield a tryptic peptide, a defined phosphor-peptide, which is unique for this proteoform. However, it cannot be excluded, that there are several proteoforms containing that peptide. As a result, bottom-up proteomics is helpful for getting LC-MS chromatograms which can be used as fingerprints of a therapeutic protein, but will give no information about the number and composition of proteoforms within the therapeutic protein product. The detection of a low abundant proteoform is especially difficult, since a unique tryptic peptide of such a proteoform is present in a low amount and thereby the signal in a bottom-up proteomics LC-MS chromatogram will have a low intensity. Thus, if the detection of different proteoforms is of interest, top-down mass spectrometry (TDMS) is the method of choice, because it utilizes the intact proteoform for analysis instead of proteolytic peptides.
For performing a TDMS analysis, a purified individual intact proteoform is transferred into the MS. From the MS spectrum of the intact ions, the molecular weight can be determined. Various techniques are available for fragmentation of the intact proteoform such as HCD, CID, ETD, ETHcD, ECD, UVPD and IRMPD, yielding different types for fragments, which complement each other [5]. After fragmentation, the proteoform can be identified by interpreting the fragment spectrum. There are several software tools available for analyzing the TDMS intact data [6][7][8]. The review of Schaffer et al. is recommended as an introduction into TDMS [9]. Robust protocols for mass analysis of intact proteins with TDMS were recently published by Donnelly et al. [10]. TDMS is requiring sample mixtures of low complexity for obtaining high quality spectra of proteoforms. Aebersold et al. estimated the number of proteoforms being present in the human organism in the range of approximately a billion [11]. Thus, very efficient purification steps prior to the TDMS are required to tackle the huge number of individual proteoforms in cells and tissues of body fluids. Beside the excessive number of individual proteoforms, their dynamic range is a further challenge.

Analysis of proteoforms of recombinant therapeutic proteins: challenges
Similar challenges are associated with recombinant therapeutic proteins. The importance of therapeutic proteins has been continually increasing over the past years [12,13]. Currently, several types of therapeutic proteins [14] are available in the market including monoclonal antibodies (mAbs), erythropoietin (EPO), insulin, human growth hormone and many more. Therapeutic proteins market is dominated by the monoclonal antibodies with sales of approximately $123 billion in 2017 and will be seen increasing with the upcoming biosimilar market [13]. Therapeutic proteins possess several advantages over small molecule drugs due to their higher specificity towards drug targets, which are in most cases also proteins [15]. This makes therapeutic proteins able to target specific key steps in disease pathology [16].
This group of man-made proteins has presumably a significantly higher number of proteoforms per gene than proteoforms per gene in vivo, causing a huge number of proteoforms within a single recombinant therapeutic protein (rTP) product. The heterogeneity is developing during the production of an rTP mainly in the upstream processing. The first event increasing the heterogeneity is alternative splicing [17][18][19]. The second critical step is the protein biosynthesis at the ribosomes, in which errors can occur. Proteolytic cleavage may happen at any stage after the protein has left the ribosome, not only within the host cell, but also extracellularly, if host cell proteases have not been removed by purification of the target protein.
Many therapeutic proteins like conventional monoclonal antibodies or erythropoietin [20] are posttranslationally modified by glycans. Especially, the glycan chains are adding an additional factor multiplying the heterogeneity of proteoforms. An example of a therapeutic glycoprotein is Etanercept, which is decorated with Oand N-glycans. Commercial preparations of Etanercept used as drugs show a very high degree of complexity [21]. It can be assumed that therapeutic fusion proteins applied to patients like etanercept are containing even hundreds of species, which differ in their exact composition of atoms. In addition to glycans, all other forms of posttranslational modifications are possible, depending on the nature of the protein and the type of the host cells and the upstream parameters.
Why is the heterogeneity of recombinant therapeutic proteins much higher than the heterogeneity of gene products in-vivo? Host cells used for the production of recombinant therapeutic proteins are optimized to synthesize a large excess of recombinant proteins [22]. However, increasing the expression of proteins does not usually correlate to increase in the correctly processed bioactive form of the recombinant proteins [22]. Consequently, the probability is increasing, that these overexpressed recombinant proteins are underlying errors during synthesis, side reactions of enzymes and spontaneous chemical reactions. As a result, the number of recombinant species, which have a low quality, is much higher than in a native cell in an intact organism [23]. It was reported that overexpressing recombinant therapeutic proteins is also accompanied by an increase in high molecular weight aggregates and misfolded forms [24]. Thus, it can be assumed that the cellular systems, which usually remove low-quality or incorrectly processed proteins, are swamped by these inadequate proteins [25] and thereby these species will not be processed in the cell or be eliminated. Beside the enzymatic reactions mainly taking place in the upstream-processing, chemical reactions which modify the recombinant therapeutic proteins, can occur during the whole production process including even the final product fill and finish or storage [26,27]. A very common reaction is the oxidation of methionine, which can happen on nearly every stage of the production and can affect the efficacy of the product.
Is any risk associated with the large number of species? Fortunately, severe side effects associated with species, which are not exactly identical with the target protein, have been reported very seldomly. An unfortunate case with dramatic consequences for a few patients was reported from Seidl et al. [28]. In this case, tungsten ions, a contamination which got into the glass vials during the production of the vials, induced the dimerization of erythropoietin. As a result, a few patients developed autoantibodies against erythropoietin, thereby destroying the remaining cells in these patients, which were producing the native hormone. Since a therapy with erythropoietin was not possible any more, these patients had to get blood transfusions for survival. Non-human glycan structures bound to therapeutic proteins, which can occur when producing them in mouse cells, can induce hypersensitivity reactions [29,30].
More common than severe side effects is the phenomenon that , showing even small differences in their composition of atoms compared with the target species, make the species less potent than the target species. For example, deamidation, causing a + 1 Da shift of the molecular weight, can decrease the efficacy of a therapeutic protein [31], as observed with recombinant human interleukin (rhIL)-15 [32]. Deamidation converts asparagine or glutamine to aspartic acid or glutamic acid, respectively. As a result, the polar, uncharged amides are changed into negatively charged carboxylic acids, impacting protein surface-charge density and surface hydrophobicity, thereby explaining the change of the efficacy of a therapeutic protein. Deamidation of asparagine can occur spontaneously at physiological pH of 7.4 [32]. A further important modification of proteins is the disulfide bond (S-S), which is formed by the oxidation of thiol groups (SH) between two cysteine residues resulting in a covalent bond [33], which is decreasing the molecular weight of a protein by 2 Da. Disulfide-bonds have an impact on protein stability as well as on activities [33]. Du et al. stated that during the manufacturing process, extensive reduction of antibodies has been observed after harvest operation or Protein A affinity chromatography and multiple process parameters correlate to the extent of the reduction [34]. The topic "disulfide bonds of therapeutic proteins" is in depth discussed by Lakbub et al. [35].
More details about sources and effects of microheterogeneity are described in the excellent reviews of Beyer [36] and Ambrogelly [37].
How large are the differences of the individual proteoforms of a therapeutic protein? Proteofroms can vary in all chemical properties known, such as size, isoelectric points (pI) [38] and hydrophobicity [39]. The pIs of recombinant erythropoietin varies from pH 3.5-6 [38,40]. Therapeutic proteins are characterized by the presence of size variants arising from the manufacturing process or storage conditions when exposed to chemical, physical or conformational stress [41]. These size variants may include the N terminus clipped proteins, truncated forms, fragments representing sub molecular weight species or improperly assembled therapeutic proteins. The formation of dimers or multimers, in which more than two monomers are forming a complex, is a problem, which many therapeutic proteins are associated with [42]. Such aggregates can induce adverse immune responses in patients [43]. The proteoforms of recombinant erythropoietin are varying within a range of 4-6 kDa [20]. Beside these larger differences in size, the composition of atoms of many proteoforms derived from one single gene can be very similar within subtypes of proteoforms such as the family of acidic proteoforms. As a result, the separation of charge variants by ion exchange usually is successful but the composition within a single fraction might not only contain one single but also multiple proteoforms [44].

Separation of proteoforms of therapeutic proteins with liquid chromatography
Liquid chromatography (LC) is the most common for purification and fractionation of therapeutic proteins [37]. The proteoforms are either separated by sizeexclusion (SEC), making use of different path lengths through chromatographic particles related to the size of the proteins, or by adsorption chromatography. The latter is applying the principle of separation of molecules by their different velocities during crossing a column filled with chromatographic particles. The velocities are proportional to the affinities of the molecules towards the stationary phase of the stationary phase. Depending on the chemistry of the functional groups of the stationary phase, different forms of liquid chromatography are possible based on adsorption to the stationary phase, highlighted in bold in Table 1. Table 1 is giving an overview about the different types of separation methods and their frequency of application with a focus on therapeutic proteins and in addition with respect to proteoforms. The numbers of column 2 compared with column 3 clearly show that the topic of proteoforms is not yet addressed very often. The selected reviews will give deeper insights into the different separation methods.
Affinity chromatography using chromatographic material derivatized with protein-A is the most common and effective method for the purification of recombinant monoclonal antibodies [45]. For the separation of proteoforms of recombinant monoclonal antibodies, it is not very relevant.
Ion exchange chromatography (IEX): charge variants of therapeutic proteins such as acidic or basic species can be separated with ion exchange chromatography (IEX) [46]. IEX of proteins can be performed with oppositely charged ionic group on the stationary phase as either anion exchange or cation exchange chromatography. Elution buffers are decreasing electrostatic interactions of the proteins with IEX material thereby decreasing the affinity of the protein towards the stationary phase. Elution can be either pH or salt based [47]. Salt-based elution is used for IEX with ultra violet (UV) online detection. Coupling IEX directly with MS is only possible if the elution buffer system is volatile [48]. Acidic species are often related to PTM's like sialic acid or deamidation on asparagine, while basic variants are formed by aspartate isomerization, succinimide formation, variants of C terminal lysine and N terminal glutamine [49]. IEX is giving relative quantitative information about charge variants which can be important for the qualification of manufacturing batches [50].
Hydroxyapatite-chromatography (HAP) is based on a material consisting of the crystals of calcium hydroxyapatite, described by the formula Ca 5 (PO 4 ) 3 (OH). HAP can be described as mixed-mode chromatography. The Ca 2+ −ions can act via electrostatic interactions as anion-exchanger. Also, metal coordination bonds of carboxylic groups can be formed with the Ca 2+ −ions. With the anionic phosphate groups of HAP, positive-charged molecules will be adsorbed by electrostatic interactions. Phosphate-, chloride-ion-, and calcium-ion-gradients are common as well as multi-component gradients [39]. Therefore, finding appropriate eluents is more difficult than with anion-exchange chromatography. However, screening systematically appropriate parameters of eluent systems should offer the chance to separate proteoforms. As indicated in Table 1, HAP is not very often applied for the chromatography of therapeutic proteins, which may be associated with the fact that it is more complex to find optimal elution systems.
Hydrophilic interaction chromatography (HILIC) is making use of high affinities of polar and hydrophilic molecules to hydrophilic stationary phase [51,52]. Usually the sample application buffer has a high content (>80%) of an organic solvent like acetonitrile. Thus, it is working well for glycans. However, proteins under these conditions may precipitate. If proteoforms will not precipitate, HILIC is an interesting alternative for other forms of adsorption chromatography, especially, if precipitation proteoforms will be removed from the proteoforms of interest.
Hydrophobic interaction chromatography (HIC) is yet another method which can be used for separating different proteoforms of a therapeutic protein. These separations rely on the varying hydrophobicity profiles due to change in conformation of the protein HIC separations use reverse salt gradients and can operate in nondenaturing mode [53]. HIC was presented as a reliable method for monitoring oxidation of tryptophan residues in complement determining region (CDR) of recombinant mAbs [54]. HIC is effective in resolving the proteoforms of antibody drug conjugates varying in drug to antibody ratio [55]. Charge variants coeluting with IEX can be resolved with HIC in the second dimension of separation. Douglas and colleagues demonstrated the separation of carboxy terminal variants, isomerization variants with HIC which could not be resolved at the IEX level [56]. Quantitative information of the succinimide variants was given by HIC with TSKgel butyl-NPR column [57]. Similar application can be also found in detection of impaired disulfide bonding. Typical HIC buffers like ammonium sulfate are requiring desalting of the proteins prior to the MS [58]. Recently, direct coupling of HIC with MS for detailed characterization of mAbs was demonstrated by applying a volatile ammonium acetate buffer [53].
Immobilized metal-affinity-chromatography (IMAC) is widely used for enriching recombinant proteins with histidine tags from a protein extract from host cells. For production of therapeutic proteins, IMAC is not very often used, because metal ions are bleeding into the product. Metal ions like nickel or copper are critical for patients. For the separation of subgroups of proteoforms for analytical purposes also IMAC is an option.
Mixed mode chromatography (MM) is performed with stationary phases which consist of at least two different functional groups [59], like hydroxy apatite (see above). Consequently, a MM material offers two or more types of chromatography. HAP is combining anion exchange (AEX), cation exchange (CEX) and IMAC. Also, with SEC mixed mode chromatography is possible, as described by Schlüter et al. [60]. In that study the electrostatic interaction induced by anionic sugars, which are part of a dextran polymer, were used to separate vanillylmandelic acid, glycine and phenylalanine from each other with a SEC column, which is usually applied for the separation of proteins in the range of 10-100 kDa. Mixed mode chromatography is not very often described for the chromatography of therapeutic proteins ( Table 1), but it has a huge potential for the separation of proteoforms. For successful separations a rational screening of appropriate parameters is recommended.
Size exclusion chromatography (SEC) is a gold standard for monitoring the presence of aggregates of therapeutic proteins. SEC uses porous stationary phase material wherein the size variants are separated based on the differential access to the pores of the SEC material resulting in different path lengths in relationship to the size [61,62]. SEC is effectively separating low molecular weight and high molecular weight species in mAbs [63]. SEC has found many applications like stability testing [64], quality control during manufacturing [65], in depth characterization of antibody-drug-conjugates (ADC's) [66] and assessing aggregate content in biosimilarity studies [67]. However, resolution of SEC is rather poor to clearly distinguish individual size variants. Non-specific adsorption to the SEC material can result in peak broadening thereby decreasing resolution. This problem may be minimized by use of organic modifiers in mobile phase or adjusting the pH in relation to the pI of therapeutic protein [61]. Advances in the chemistries of stationary phases incorporating very small core-shell particles or the use of sub-micron particles are improving the resolution of SEC columns [61].
Reversed phase liquid chromatography (RPLC) mainly exploits the differences in hydrophobic properties of molecules for their separation. Sample application onto RPLC columns is performed with eluents having a high content of water, supporting a high affinity of the molecules in the sample towards the stationary phase, which is hydrophobic. Elution is achieved with gradients increasing the concentration of organic solvents in the eluent. Coupling RPLC with the high sensitivity detectors can provide qualitative and quantitative information of the cleaved, modified proteoforms along with main form [68]. Ambrogelly et al. reported RPLC as a method giving a first-hand check of the product quality to help in optimizing the purification strategy [69]. When coupled to high resolution mass spectrometric detection, RPLC also allows distinction of the major glycoforms. More than a decade ago, Dillion presented RPLC not only for determining the intact mAb glycosylation profile but only with the use of high temperature and organic solvents with high eluotropic strength coefficients [70]. Many advancement to conventional RPLC columns have come up in recent times to improve the separation of large therapeutic proteins at milder conditions [71].
The major concern in the use of RPLC for protein separations is the presence of organic solvents, which may precipitate proteins. Since precipitation will occur on the column, it is very difficult to recognize. In the case of proteoforms, it can be assumed that some may be more prone to precipitation than others. As a result, the chromatogram, in which signals from some but not all proteoforms are present, may be misinterpreted since the chromatogram is giving no information about the proteoforms which got lost by precipitation. TDMS protocols often apply RPLC for the analysis of proteoforms, because those species, which elute, are present in a liquid, which is optimal for electrospray ionization (ESI). Because of the problem with precipitation of proteins in RPLC in all TDMS approaches the question is how representative the TDMS chromatogram is regarding the original composition of proteoforms or vice versa how many proteoforms got lost during RPLC.
Elution modes of liquid chromatography: beside the different types of stationary phases, different elution modes are existing, which have an impact on the separation of molecules, namely isocratic elution, gradient elution (GE) and displacement elution (DE). DE is typically using the same sample application buffer and adsorption chromatography materials as gradient elution. In contrast to GE, DE is not using a salt gradient with an increasing concentration of a salt having a low affinity towards the stationary phase, but the elution buffer of DE is consisting of the sample application buffer, into which the displacer is added. The displacer ideally should have an affinity to the stationary phase higher than any of the sample components. After the sample application onto the column is finished, the eluent containing the displacer is immediately pumped onto the column. At the beginning, the displacer molecules are binding strongly to the top of the column, thereby displacing the sample component with the highest affinity. These sample components then displace the sample components with a lower affinity and so on. By this process, bands are formed moving down the column, driven by the displacer. The DE is finished, as soon as the displacer has saturated the stationary phase of the column completely. Within a band a high purity of the component is achieved [72]. DE has been shown to be suitable for separation of complex mixture of tryptic peptides [73][74][75] and proteins [76][77][78][79]. One of the characteristics of DE is that DE has a different selectivity compared with GE [77]. This is one important argument for using DE for the separation of proteoforms. Thus, it is not surprising, that DE has been applied to the separation of proteoforms of therapeutic proteins successfully [46,[80][81][82][83][84].
Rational screening of parameters of liquid chromatography is recommended for optimal results of the separation of proteoforms. The first method describing multi-parallel high-throughput screening for parameters of liquid chromatography Preparing Proteoforms of Therapeutic Proteins for Top-Down Mass Spectrometry DOI: http://dx.doi.org /10.5772/intechopen.89644 was published 2002 by the group of Cramer [85]. In this case, the authors screened for displacers for ion-exchange systems. In the following year the group reported a multi-parallel high-throughput screening for displacers based on batch chromatography [86]. Thiemann et al. published a similar approach termed proteinpurification parameter screening system (PPS), which was not focusing on the identification of appropriate displacers but more general on any kind of parameters for adsorption chromatography, independent of the elution mode [87]. The PPS was successfully applied for purification and identification of an angiotensin-II generating enzyme [88], and for screening for parameters for optimal displacement chromatography of proteins [78,79]. Rational screening was also used for developing a displacement chromatography of proteoforms of a recombinant protein with HIC [89].

Separation of proteoforms of therapeutic proteins with capillary electrophoresis
Compared with liquid chromatography, capillary electrophoresis (CE) offers better resolving power. CE techniques such as capillary zone electrophoresis (CZE), capillary gel electrophoresis (CGE) and capillary isoelectric focusing (CIEF) have been adapted for the separation and characterization of proteins [90,91]. These are basic techniques routinely used for quality control [91]. With CGE, the size of proteins is characterized, while in CIEF, proteins are separated according to their isoelectric point (pI). CIEF is using pH gradients formed by carrier ampholytes in a capillary [92]. It is important to note that pH plays a major role in CZE and should be well maintained [93]. Considerable protein adsorption must be considered when performing CIEF and CZE. The interaction of the analytes with the surface of the capillary may compromise the resolution, peak widths and shapes when using conventional bare fused-silica capillaries. Minimizing adsorption can be done by using better coating material or using reagents that reduce adsorption [94]. A penetrated surface layer protein A from bacteria was reported as capillary coating. The coating could be used for over 100 injections without loss of separation performance [95]. Another study reported that adsorption still happened when using LPA-coated capillary [96].
CZE and CIEF are more often used for separations of charge variants induced by C-terminal lysine truncation, N-terminal pyroglutamate formation, sialylation and deamidation [97].
The direct coupling of CE with MS is technically challenging regarding the CE-MS interface [98]. A study demonstrated a successful attempt to directly couple CIEF with mass spectrometry for characterization of transtuzumab, bevacizumab, cetuzimab and infliximab by optimizing the reagent, liquid composition and enhanced sample mixture by glycerol to reduce non-CIEF electrophoretic mobility and band broadening [99]. A CZE method was developed for the intact analysis of recombinant human interferon-β1 (rhIFN-β1). The charged species due to deamidation and sialylation were sufficiently separated. In contrast to dynamic polymeric coatings, such as polybrene or hydroxypropyl-methylcellulose, they covalently coated the bare-fused silica capillary with cross-linked polyethyleneimine (CPEI) to get positively charged surface, thus reducing the possibility of protein interaction with the coating. They then coupled this CZE to ESI-MS/MS and identified 138 proteoforms, of which, 55 were quantified.
For the in-depth characterization of the composition of proteoforms of a therapeutic protein CE online-coupled to MS is a good option, if prior to the CE, the mixture of proteoforms has already been fractionated by LC using separation mechanisms orthogonal to the CE separation mechanism.

Conclusion
A huge progress has been made in the field of TDMS, allowing the identification and comprehensive analysis of the composition of atoms of proteoforms, especially if they are smaller than 30 kDa. TDMS analysis of larger proteoforms still is more challenging. However, until today the most critical point is the purification of a proteoform towards near homogeneity or at least the significant reduction of complexity of the sample, which is desorbed and ionized into a tandem mass spectrometer for TDMS. A low complexity of the composition of a protein mixture entering the MS still is mandatory for getting high quality spectra. Thus, efficient separation methods are needed for obtaining fractions with low complexity. For developing strategies for separating proteoforms, therapeutic proteins are well suited, however challenging because of their heterogeneity. In depth separation of the proteoforms of a therapeutic protein requires the combination of fractionation techniques based on orthogonal mechanisms. In addition, the combination of gradient chromatography and displacement chromatography will add further opportunities for successful separations.

Conflict of interest
The authors declare no conflict of interest.