Open access peer-reviewed chapter

Proteoforms: General Concepts and Methodological Process for Identification

By Jucélia da Silva Araújo and Olga Lima Tavares Machado

Submitted: May 23rd 2019Reviewed: September 25th 2019Published: December 23rd 2019

DOI: 10.5772/intechopen.89914

Downloaded: 105


The term proteoform is used to denote all the molecular forms in which the protein product of a single gene can be found. The most frequent processes that lead to transcript modification and the biological implications of these changes observed in the final protein product will be discussed. Proteoforms arising from genetic variations, alternatively spliced RNA transcripts and post-translational modifications will be commented. This chapter will present an evolution of the techniques used to identify the proteoforms and the importance of this identification for understanding of biological processes. This chapter highlights the fundamental concepts in the field of top-down mass spectrometry (TDMS), and provides numerous examples for the use of knowledge obtained from the identification of proteoforms. The identification of mutant proteins is one of the emerging areas of proteogenomics and has the potential to recognize novel disease biomarkers and may point to useful targets for identification of therapeutic approaches.


  • post-translational modifications
  • top-down
  • mass spectrometry
  • proteomic experiments
  • clinical application of proteoform

1. Introduction

A surprise from the human genome project was the identification of 23,000 genes, far fewer than the estimated 100,000. Some events create distinct proteins that articulate various biological processes from cell signaling to genetic regulation. Thus, a single gene by allelic variations, alternative splicing and other pre-translational mechanisms, such as post-translational modifications (PTMs), conformational dynamics and functioning, may generate specific molecular forms of proteins, named “proteoforms,” with different structures and different functions. Proteoforms or protein species as previously defined [1] could be identified by proteomics experiments, which include quantification of protein abundance, investigation of changes in protein expression, characterization of post-translational modifications (PTMs), identification of protein-protein interactions, a measure of isoform expression, turnover rate and subcellular localization [2]. Frequent modifications that produce proteoforms are presented in Figure 1.

Figure 1.

Types of proteoforms: RNA splicing and mutations.

2. Proteomic experiments

The advance of genomics enabled the sequencing of the genes of an organism, but this does not inform which proteins may be present or how they are modified in specific situations. The proteomics analyses begin with the combination of multidimensional separation included the chromatographic and gel electrophoresis techniques and the ability of mass spectrometry to identify and to precisely quantify the proteins. Many different technologies have been and are still being developed to get the information contained in proteoforms. The high-precision mass spectrometric measurements as the tandem mass spectrometry (MS/MS or MS2 (peptide mass fingerprinting)) can provide structural information on molecular ions that can be isolated and fragmented [3, 4]. Mass spectrometry-based proteomics can be carried out in a bottom-up or top-down approach.

2.1 Bottom-up proteomics

The bottom-up proteomics also termed “shotgun proteomics,” when the bottom-up analysis is performed on a mixture of proteins, has traditionally been used. In this approach, proteins that could be a simple or complex mixture are digested by chemical or enzymatic digestion to generate peptides that are analyzed by way of MS2. It is generally applied to identify and characterize many peptides in a mixture and deduce the identity of the protein to exist in the sample [5, 6]. In strategy bottom-up, the peptides mixture resulting of the digestion is fractionated and subjected to multidimensional liquid chromatography, which consists of prefractionating of peptides first according to their net charge using strong cation exchange chromatography and second, according to their hydrophobicity by reversed phase liquid chromatography (RP-LC) coupled online with a mass spectrometer [7]. The peptides fragmented within the mass spectrometer will provide product-ion mass spectra which are compared with in silico-generated MS/MS of the same mass encoded in a protein database. Proteins present in the sample are then inferred from the identified peptides [8].

This approach brings up several disadvantages: the protein inference process can be complicated because proteins often contain homologous sequence regions and the peptides cannot be either uniquely assigned to a single protein, the same peptide might have originated from multiple protein isoforms and/or from distinct functional pools of the same protein [2, 9]. The digestion of proteins can cause loss sequence variations or information regarding the original amino acid sequence and loss information relationship between the amino acid sequence and the PTMs belonging to specific proteoforms; thus, it is not capable of identifying proteoforms [6, 10]. Introducing the intact protein into the mass spectrometer eliminates these problems, the strategy used by the top-down mass spectrometry.

2.2 Top-down proteomics

Differently, of the bottom-up proteomics, the “top-down” approach involves direct separation and MS analysis of intact proteins, without previous proteolytic digestion. By this method, proteoforms can be characterized since the relationship between the amino acid sequence and the PTMs is preserved, and thus, such characterization provides a proteoform-specific understanding of biological phenomena [10]. In top-down proteomic, a specific proteoform of interest can be directly isolated and, subsequently, fragmented in the mass spectrometer by MS/MS strategies to map both amino acid variations to obtain information on protein masses [11]. The masses describe the complete amino acid sequence, including all post-translational modifications, structures, for successful identification [12, 13].

In this proteomic analysis, proteoforms are identified using precursor mass and fragmentation data. A precursor mass spectrum (MS1) of intact proteins is recorded; the most intense peaks are selected for fragmentation; and mass spectra (MS2) of the resulting fragment ions are acquired. On this account, both its intact and fragment ions’ masses are measured, and the precursor masses and their isotopic distributions present a complex but detailed set of information. Upon fragmentation, terminal fragments represent potential cleavage site(s) or truncated proteoforms, while (internal) fragment ions can indicate modifications and, depending on the achieved sequence coverage, possible location. This approach routinely allows for 100% sequence coverage and full characterization of proteoforms [6, 13]. Top-down mass spectrometry has become the approach handy for the analysis of single proteins or simple mixtures of significant biological interest. Complexes proteomic samples require that they are fractionated before to introduction to the mass spectrometer. Many separation strategies can be applied before mass spectrometer only the last step, usually, the RPC coupled to mass spectrometry [6, 14].

Proteomic top-down may be denaturing or native. Denaturing top-down proteomics (dTDP), the procedure denaturing provides a powerful technique for characterizing individual proteins <30 kDa. In these studies, the proteins are denatured prior to their introduction into the mass spectrometer [15]. In this approach, protein interactions and quaternary conformations are disturbed by means of substance such as organic solvents, reducing agents, strong detergents, non-physiological pH, and/or physical method as heat and pressure [16]. TDP is the most disseminated and the scoring system used for identification and characterization of proteoforms. It has naturally been developed and tested using datasets derived from denatured top-down mass spectrometry experiments [17].

Native top-down proteomics (nTDP) has been used to characterize intact, non-covalently bound protein complexes biologically relevant as non-covalent protein-protein and protein-ligand interactions, providing stoichiometry and structural information since tertiary and quaternary structures of proteins are maintained [6, 18]. The technique utilizes non-denaturing and non-reducing-buffer conditions during the electrospray ionization process which helps preserve the primary and quaternary compositions of proteins and their complexes for MS [19, 20]. In native proteoform approach, analytical platforms for high-resolution and liquid-phase separation of protein complexes are required prior to native mass spectrometry (MS) and MS/MS [21]. During, Escherichia coli proteome analysis, 144 proteins, 672 proteoforms, and 23 protein complexes were identified, coupling the size-exclusion chromatography and capillary zone electrophoresis-MS/MS [21]. Other separation techniques have been combined; coupled off-line ion-exchange chromatography or gel-eluted liquid fraction entrapment electrophoresis (GELFrEE), which demonstrates the compatibility of native GELFrEE with native and tandem mass spectrometry [21, 22]. An illustrative scheme of the elucidation of the proteoform structures by the bottom-up and top-down techniques is presented in Figure 2.

Figure 2.

Comparative scheme between top-down and bottom-up proteomics, showing the best indication of top-down use for the identification of proteoforms in complex protein mixtures.

3. Separation techniques applied to proteoforms

The number of proteoform species in a proteome could be vast. Separating proteoforms is essential because many high-resolution mass spectrometers due to limited charge capacity have a finite ability to detect proteoforms. High-resolution separation techniques for complex protein samples are significant challenges of top-down proteomics. To optimize proteome coverage, separation, and multidimensional combinations, strategies are employed, thus, to reduce the complexity of the samples [23]. The strategies of separation of proteoforms are based on your intrinsic characteristics and physicochemical properties, such as mass/size, isoelectric point and hydrophobicity. Advances in instrumentation, chromatographic and electrophoretic separation strategies have been developed to separate intact proteins [20] since polyacrylamide two-dimensional gel electrophoresis (2D-PAGE) [24, 25] to the development of gel-eluted liquid fraction entrapment electrophoresis (GELFrEE), capillary zone electrophoresis (CZE) [20, 24, 25, 26]. Specific columns are also developed for classical methods of separation such as hydrophobic interaction chromatography (HIC), hydrophilic interaction chromatography (HILIC), reversed phase liquid chromatography (RPLC), chromatographic ion exchange (IEX) and size exclusion (SEC) [21, 22, 23, 27]. Some separations can be on-line with a mass spectrometer as separations chromatography and capillary electrophoresis, but many others can be applied off-line only [26]. Off-line separations approach, independent of the mass spectrometer, is flexible and diversified, allowing the use of diverse techniques of separations, although it is more laborious considering the time of collection and treatment of the fractions. Off-line separations system consists of three steps: separation of the sample compounds in the first dimension; a collection of different fractions for subsequent sample treatment; and injection of each of the fractions in the second dimension to be subject to analysis [20]. On-line separations, coupled directly to mass spectrometry, allow increased throughput and substantially reduce sample handling, but have limitations to sample loading, data acquisition and separation conditions [6, 26]. Many techniques of fractionation and separation can be combined to reduce the complexity of samples, and an off-line approach coupled with an on-line separation may be necessary since most proteomic samples have such complexity that they need multiple separation steps combined in multidimensional separations [27].

3.1 Sample preparation for top-down proteomics

The preparation of samples is one of the most critical steps for top-down proteomics. Conventional buffers for protein extracting as the detergents sodium dodecyl sulfate (SDS) and Triton X-100 are not compatible with MS [12]. Many methods for lysing samples use saline buffers, reducing agents, and protease and phosphatase inhibitors to extract proteoform avoiding alteration or degradation. Post extraction is needed to remove or replace nonvolatile salts that suppress MS signal by forming adducts to protein ions and increase the chemical noise [26]. The most strategies of proteins solubility for proteoforms studies, despite preserving many covalent interactions, denature proteins prior to MS and destroy important interactions such as protein-protein. In general, these interactions are essential for many different cellular processes [28]. In procedures for top-down native, the pH must be kept neutral and isolating, and fractionating of proteoforms complexes cannot contain denaturing agents such as strong detergents, reducing agents and organic solvents [29]. For these proteins to remain in their native states, a buffer is generally used for maintaining physiological ionic strength and neutral pH of the sample. To minimize noise associated with common buffer and others numerous interfering components, top-down sample cleanup methods should be applied; for example, protein precipitation and molecular weight cut-off ultrafiltration. Donnelly’s et al. guide is one of the best-practice protocols for MS of intact proteins from mixtures of varying complexity [30].

3.2 Two-dimensional gel electrophoresis (2DGE)

2D-PAGE is an electrophoretic separation technique still used to separate intact proteins; the protein separation in 2D-PAGE is based on the isoelectric point and molecular weight (MW) of the proteins. This technique was introduced by O’Farrell [31], and it separates cellular proteins under denaturing conditions and enables the resolution of hundreds of proteins. In the first dimension, the separation is based on the proteoforms net electric charge (isoelectric point) of each protein and in the second dimension, in the presence of sodium dodecyl sulfate (SDS), proteins will be separated according to their molecular mass [31, 32]. The denaturing conditions introduced to O’Farrell [31] for first dimension comprise conducting a sample preparation, using high concentration molar of urea (9 mol/L), nonionic detergent (Nonidet NP-40) and a thiol reagent (2-mercaptoethanol), obtaining in this way an efficient separation of the proteins contained in the complex sample [33]. The use, in the first dimension, of tube gels and ampholytes to establish the pH gradient was replaced by the introduction of immobilized pH gradients (IPG strips). A significant advance on 2D-PAGE occurred with the development of the IPGs by [34] available in various ranges of pH and size. The IPG in polyacrylamide gels allows an efficient and reproducible separation of the proteins. In IPGs, the carrier ampholytes are attached to acrylamide molecules and cast into the gels to form a fixed pH gradient and covalently bound to a film backing. In this case, the buffering groups are grafted to the acrylamide gel matrix, the gradients cannot drift, and the gel slabs can be cut to narrow, usually 3 mm wide. Using IPG strips, the first-dimension separations are more reproducible and have high throughput and high resolution. The IPGs are much easier to handle, and there is the convenience provided by commercial production of IPG strips made [33, 35, 36].

3.3 Differential in gel electrophoresis (DIGE)

Conventional 2D gels were revolutionized with the introduction of the differential gel electrophoresis (DIGE), which allow the accurate and reproducible quantification of multiple samples by the relative intensity of fluorescent-dyed protein spots that are quantified within the same gel. Difference gel electrophoresis enables the accurate quantification of changes in the proteome, including proteoforms [37]. It is a strategy that has been developed for the quantitative analysis of intact proteins, and provides important information about changes caused by events such as truncation, degradation, genetic code variation, alternative splicing, post-translational processing and PTMs [37, 38]. The proteins in each sample are covalently tagged with different color fluorescent dyes, known as CyDye DIGE fluorescent. Fluorescent labeling of proteins is performed prior to 2D-DIGE, and then minimal labeling is often performed, such that <5% of proteins are labeled, thus reducing interference with downstream mass spectrometric analysis [39]. 2D DIGE involves the use of a reference sample, known as an internal standard, which comprises equal amounts of all biological samples in the experiment [39]. The major advantages of DIGE are the high sensitivity and linearity of the dyes utilized, its straightforward protocol, as well as its significant reduction of inter-gel variability, which increase the possibility to unambiguously identify biological variability and reduce bias from experimental variation [39, 40].

3.4 Gel-eluted liquid fraction entrapment electrophoresis (GELFrEE)

GELFrEE is a type of approach based on protein array developed to overcome the difficulties related to gel-based. It is one robust strategy that promotes size-based separation of proteoforms (applied to proteins 10–250 kDa) in the liquid phase with high resolution. GELFrEE is a electrophoresis to accommodate broad mass range separation of proteins, and the separation can be performed under denaturing or also been adapted for native-state size separations, where the tertiary and quaternary structures of the proteins are maintained [16, 22, 41]. In GELFrEE, the gel column is used to achieve electrophoretic separation of proteins, analogous to SDS-PAGE. The proteins are loaded onto the top of a tube containing polyacrylamide gel; for separation to occur, a voltage is applied between the anode and cathode reservoirs which are then eluted into the liquid-phase for manual collection, securing that higher molecular weight proteins are not continually diluted and dispersed across many fractions [6]. The detergents incompatible with MS can be removed using organic solvent precipitation before online LC-MS. Spin columns are coupled in-line matrix removal platform to enable the direct analysis of samples containing SDS and salts detergents used in native mode [6, 42, 43]. This technique has the advantage of separating proteoforms over a wide mass range in short time and at high load, but there is the disadvantage of loss of resolution in the detergent removal stage; an acid labile surfactant may be an alternative to SDS [44]. Many combinations on-line or off-line GELFrEE with other fractionation techniques have been applied for optimal workflows for large-scale intact protein analysis. The fractions obtained from the electrophoretic step GELFrEE (for molecular-weight-based fractionation) are submitted to a second separation dimension. Li et al. [18] identified 30 proteins in the mass range of 30–80 kDa from Pseudomonas aeruginosa, fractionated by GELFrEE, analyzed by CZE-ESI-MS platform. However, the workflow of additional separation procedure most commonly performed is using a GELFrEE-LC-MS/MS [45, 46, 47].

3.5 Capillary zone electrophoresis

CZE has been the most common CE mode applied to the mass spectrometry of intact proteins. It is a method of proteoforms separation based on electrophoretic mobility differences that do not require a stationary phase [20, 48]. This approach provides fast and efficient separations. The sample is injected into an electroosmotic flow generated by the potential difference between two ends of a capillary filled with an aqueous solution, and the molecules are separated by the electrophoretic mobility difference. Capillary zone electrophoresis (CZE) offers alternative and high-capacity separation of proteoforms based on their sizes and charges, and are useful for the separation of high-mass proteoforms by not having stationary phase [49, 50]. CZE has been an alternative to RPLC; for example, CZE-MS interfaces have better sensitivity to detection, and it can produce more protein identifications from complex proteome samples than typical RPLC-MS [51]. The combination of methods CZE has led to efficient separation and highly sensitive detection of intact proteoforms with the benefit of low sample amount needed, inclusive for native proteomics, but some challenges still need to be overcome [20, 52].

3.6 Liquid chromatography systems

Liquid chromatography is the main proteomic approach used for protein separation in the mono- or multidimensional modes, which is ideally suited for proteomics because it can be interfaced with MS. The basic principle of chromatographic separation is the different affinity of analytes for the stationary and mobile phase. The LC-based separation methods have the advantage that they can be coupled directly with MS [53, 54]. Various orthogonal separation techniques using different stationary phase with different types interactions selectivity, two-dimensional LC separation (2D LC), and multidimensional LC separation (MDLC) are often combined to improve intact protein separation and proteoform coverage and to increase the dynamic range of detection. Multiple orthogonal separations include reversed phase (RP), ion exchange (IEX), size-exclusion (SEC), hydrophilic interaction liquid chromatography (HILIC) and hydrophobic interaction liquid chromatography (HIC).

3.6.1 Reversed phase liquid chromatography separation (RPLC)

The separation of the proteoforms in RPLC is based on their hydrophobicity using a non-polar stationary phase and a polar mobile phase; the analytes are subsequently eluted using increasing concentrations of organic solvents. The RPLC approach is widely used for complex intact protein sample separation and fractionation, and when coupled online with MS, it is the most prevalent approach for studying complex intact protein samples in top-down proteomics [13, 55]. Efficient separations to improve peak capacity have been achieved with the use of longer columns’ smaller particle sizes in ultra-high pressure LC systems such as long column ultrahigh-pressure liquid chromatography (UPLC). Particle generally, either silica-bonded or polymeric-bonded octadecyl (C18), octyl (C8), or other shorter alkyl chains stationary phases are used such as C4 and C5 for intact protein separation [56, 57, 58]. The separation of the proteoforms in RPLC is based on their hydrophobicity using a non-polar stationary phase and a polar mobile phase; the analytes are subsequently eluted using increasing concentrations of organic solvents. Effective separations to improve peak capacity have been achieved with the use of longer columns smaller particle sizes in ultra-high pressure LC systems such as long column ultrahigh-pressure liquid chromatography (UPLC). Particle generally, either silica-bonded or polymeric-bonded octadecyl (C18), octyl (C8), or other shorter alkyl chains stationary phases are used such as C4 and C5 for intact protein separation [55, 56, 57]. Due to extreme complexities, limited sample loading amounts, and large dynamic ranges of intact protein samples, RPLC alone may not provide sufficient proteome coverage for top-down proteomics. One common way to increase peak capacity in RPLC and increase the proteome coverage is to include 2D RPLC or multiple orthogonal separation steps during analysis. Some high-resolution techniques combined with RPLC, for separation of proteins and proteoforms, for example, IEX-HILC-RPC/MS, high-pH and low-pH RPLC 2D (2D pH-RPLC-RPLC), are used for mass spectrometry compatible [58, 59].

3.6.2 Ion exchange chromatography (IEX)

Ion-exchange chromatography is a LC technique for proteins separation for top-down proteomics based on differences in charge of the analyte. IEX can be applied in cation- and anion-exchange modes. Increasing the ionic strength of the mobile phase is used to elute analytes from the charged stationary phase. The efficiency in the separation of proteins in IEX is related to conditions to salt concentrations and pH elution process applications that can be well versatile. This approach is often employed to carry out the first dimension followed by RP chromatography in the second dimension to 2DLC, or 3DLC strategy using, for example, IEX-HILIC-RPC/MS.

3.6.3 Size-exclusion chromatography (SEC)

Size-exclusion chromatography is part of the intact protein analysis workflow. The fractionation occurs in the difference in the accessibility of proteins to the intraparticle pore volume of the resin, in the non-adsorptive mode of solute interactions with the stationary-phase surface. The proteins migrate through a porous polymeric column and are separated by their hydrodynamic volume, with more abundant proteins eluting before smaller ones due to their lower accessibility to the interior of the packing materials. The selectivity is provided by the column, defined by the size of the intraparticle pore diameter; thus, the efficiency in the SEC separation is mainly governed by the particle diameter [60, 61]. SEC has the advantage that can be realized in several types of solutions; however, it is not a high-resolution separation method, in addition to promoting the dilution of the sample. To increase the performance of SEC, different aspects with respect to column technology and instrumentation have been addressed. Huang et al. [62] developed a simple and efficient method SEC-based separation of proteins using RP columns (RP-based SEC performed). They have applied high concentrations of acetonitrile with trifluoroacetic acid as an acid modifier which prevented interactions between proteins and the stationary phase and allowed the RP column to act as an SEC column to separate proteins based on their molecular weight. This innovation showed that the RP-based SEC performed better than conventional SEC. Cai et al. [63] innovated the SEC-based separation. They developed a serial size exclusion chromatography (sSEC) strategy to enable high-resolution size-based fractionation of intact proteins. They combined SEC with different pore sizes in series and an increase in sufficient separation length, providing an extension of fractionation range and higher-resolution separation of proteins pool. This strategy of sSEC coupled to RPLC quadrupole-time-of-flight mass spectrometry provided improved proteome coverage [63].

3.6.4 Hydrophobic interaction chromatography (HIC)

HIC is a technique that separates proteins based on hydrophobicity with high resolution for the separation of intact proteins, main native conditions, and is an alternative MS-compatible LC if appropriate salt is used in the mobile phase [1459]. In this approach, protein’s tertiary structure binds to a hydrophobic surface material in the presence of salt and then elutes in order of increasing surface hydrophobicity. The stationary phases used for HIC generally feature low density and moderate hydrophobic ligands, and resins that are less hydrophobic as compared to their counterparts used in RPLC, being the most.

3.6.5 Hydrophilic interaction chromatography (HILIC)

Hydrophilic interaction chromatography is a technique successfully applied to the separation of proteins and proteoforms. HILIC has the ability to retain and resolve highly polar compounds, based on a complex retention mechanism, involving hydrophilic partitioning and polar interactions; in other words, the analytes are eluted based on their hydrophilicity [64, 65]. In HILIC, the stationary phase is polar and often consists of a silica support that can be unmodified or modified with a polar surface chemistry, such as zwitterionic sulfoalkylbetaine, amide, diol, and aminopropyl; and the mobile phase consists of water and 60–95% of an aprotic, miscible organic solvent, usually acetonitrile (ACN) or acetone, with at least 3% of water. An organic solvent is used in loading HILIC columns to drive hydrophilic portions of proteins to interact with a hydrophilic stationary phase. Elution using a gradient from an organic solvent to an aqueous buffer allows desorption and elution of proteins from the column [66, 67, 68]. HILIC is MS-compatible LC technique for protein analysis. Therefore, coupling HILIC techniques in online or off-line two-dimensional LC workflows has increased the efficiency on the LC-MS analysis of complex protein samples, HILIC to be complementary and orthogonal to RPLC [65, 69]. Gargano et al. [70] implemented a capillary HILIC-MS method that can be used as a high-resolution approach to separate complex mixtures of proteins using wide mobile-phase gradients. Salt-free pH-gradient IEX-HILIC was used as the second dimension for separating differentially acetylated/methylated intact protein isoforms in histone family and combined this separation with RPLC online in the first dimension to better separation and characterization of intact histones [71].

4. Mass spectrometry

Proteomic experiments, MS based on comprehensive and total characterization of proteoform from a biological system, besides efficient separation, employ a combination of sensitive detection and accuracy of intact proteins. The technology for identification by MS to top-down proteomics has gained impulse. The accuracy of mass spectrometric characterization of polypeptides involves improvement on ionization, fragmentation and detection conditions. Tandem MS can confirm the protein identification based on the daughter ions and characteristics of the obtained peptide map and primary structure, which thereafter provide exact localization of post-translational or other modification sites. Data-independent acquisition (DIA) methods have been alternatively used to analyze proteoforms particularly suited to the study of PTMs [72]. DIA focuses on the identification and quantitation of fragment ions that are generated from multiple peptides contained in the same selection window of several to tens of m/z, that is, the fragmentation spectra of all the peptides are acquired in each cycle time without any preselection of the precursor ions [73].

The mass spectrometers are compounded basically into a sample inlet, an ion source, a mass analyzer and a detector [74, 75]. Although MS appeared more than a century ago, its application to protein analysis began in the 1990s, because existing ion sources only allowed the ionization and analysis of inorganic molecules and small organic molecules and proteins are not easily transferred to the gas phase and ionized by the size [76, 77]. Advancement of mass spectrometry technology occurred with the new instrumentation ionizer, matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI) [78, 79, 80]. The development of the mass analyzer applied to analyze intact proteins contributed to the mass spectrometry identification of the proteoforms. Mass analyzers with a high level of resolving power and sensitivity as time-of-flight (TOF), Orbitrap, Fourier Transform Ion Cyclotron Resonance (FT-ICR), or the combination of multiple mass analyzers in series, created a powerful tool for top-down MS characterization of proteoforms [78, 81, 82]. Most top-down proteomics (TDP) studies have used some form of tandem-MS fragmentation techniques, for intact proteins sequencing with greatly resolving power and high mass accuracy as: collisionally activated dissociation (CAD), collision-induced dissociation (CID), electron transfer dissociation (ETD), electron-capture dissociation (ECD), higher-energy collisional dissociation (HCD), infrared multiphoton dissociation (IRMPD) and ultraviolet photodissociation (UVPD). These examples of fragmentation strategies can provide additional information on the amino acid sequence and PTMs for identification of proteoforms [74, 75, 79, 83]. The mass spectrometer sample introduction can be through the traditional RPLC-MS, by CZE-MS or embedded in a matrix on a target plate [74, 84]. Mass spectrometers that use different types of analyzers for the first and second stages of mass analysis (hybrid MS instruments) are employed to maximize proteoform characterization top-down MS-based. Still, software tools for the identification and quantification of proteoforms need to be continuously developed to keep up with a demand to quickly and automatically analyze the data generated. Many a comprehensive proteoform software tools for proteoform identification and construction of proteoform families are freely available: MASH Suite, MetaMorpheus, MSPathFinder, Proteoform Suite, TDPortal, TopMG and TopPIC [13, 20] that can be implemented into current top-down workflows consecutive at complete and accurate databases.

A common material used is either surface-modified silica or polymeric particles coated with short aliphatic groups n-alkyls (propyl, butyl, hexyl, or octyl chains), phenyl and others [61, 85]. HIC separation methods have been evaluated and optimized as complementary selectivity to RPLC, which offer efficient separation for highly orthogonal HIC-RPLC for top-down proteomics [14, 27].

5. Clinical applications for proteoform identification

Several studies are carried out aiming to find markers for pathophysiology process of Alzheimer’s disease (AD), cancer [86], type 2 diabetes, and chronic alcohol abuse, among other diseases. The identification of proteoforms associated with different diseases will undoubtedly be an essential dividing mark for early diagnosis, prevention and treatment. Some examples for proteoform identification applications as apolipoproteins proteoforms, B-type natriuretic peptide (BNP), disorders of glycosylation, detection of structural changes in transthyretin, hemoglobin proteoforms, cystatin C-truncated proteoforms, C-reactive protein, vitamin D-binding protein, transferrin and immunoglobulin G (NISTmAb) were discussed [86, 87]. In the last 30 years, since the MALDI and ESI approaches were developed, only about a dozen of mass spectrometry protein identification tests have been described. Here, we present studies involving Alzheimer’s disease and alterations in the levels of apolipoproteins associated with lipid metabolism.

5.1 Alzheimer’s disease

In the diagnosis of Alzheimer’s disease (AD), quantification of total Tau protein (T-tau), threonine-phosphorylated Tau181 form (P-Tau181), and the 42 amino acid peptide, alpha-amyloid isoform (Aβ) are well established as markers present in cerebrospinal fluid (CSF). However, there is a constant need for new diagnostic markers to identify the disease at a very early stage [87]. A review about the role of proteoforms in the pathophysiology process of Alzheimer’s disease was described in [88]. The mass spectrometry performance of three canonical proteins, clusterin, secretogranin-2, or chromogranin A, was presented. Variations on the levels of Apo A-1, a protein with antioxidant and anti-inflammatory properties, in the serum or in CSF, are also indicated as a potential marker for AD diagnosis and progression. Apo A-1 exhibits [86] and inhibits the aggregation and neurotoxicity of an amyloid-β peptide in AD [89]. The possible association between apolipoproteins increased Apo A-1 levels that were correlated with decreasing risk of dementia [87], raising the possibility of a novel role of Apo A-1 in protection against neurological disorders [87, 89].

5.2 Apolipoprotein and lipid metabolism

Possible correlations between apolipoprotein levels (Apo C-III, Apo C-I and Apo C-II) with dyslipidemia and cardiovascular disease were presented in [86]. Apolipoproteins function as the structural components of lipoprotein particles, cofactors for enzymes and ligands for cell-surface receptors. Apolipoproteins exhibit proteoforms associated with nucleotide polymorphisms (SNPs) and post-translational modifications such as glycosylation, oxidation and sequence trunked [86]. The human apo Cs are protein constituents of chylomicrons, VLDL and HDL. The protein APO C-III has 79 amino acids and can be glycosylated in the residue of Threonine 48. Initially, four APO C-III isoforms were identified by mass spectrometry and later 12 proteoforms. These proteoforms differ by absence of glycosylation (APO C-III Oa), glycosylation (APO C-III Ob), addition of one or two sialic acid residues (APO C-III 1, APO C-III 2) or addition of fucose at glycosylation sites. There are also truncated proteoforms due to amino acid substitution. Increases in APO C-III2 levels are associated with a reduction in TG and LDL levels, and perhaps this is a possible mechanism for dyslipidemia processes and reduced risk of cardiovascular disease (CVD) [86].

5.3 Cancer disease

The identification of novel biomarkers for early clinical-stage cancer detection, targeted molecular therapies, disease monitoring and drug development could impact on the future care of cancer patients. A systematic study of cancer samples using omics technologies, oncoproteomics, is in progress. He et al. summarize the advantages and limitations of the critical technologies used in (onco)proteogenomics [90]. In other studies, Zhan et al. [91]compared MALDI-MS, LC-Q-TOF MS and LC-Orbitrap Velos MS for the identification of proteins within one spot. They described the importance of the development of stable isotope labeling coupled with 2DE-LC/MS in a large-scale study of human proteoforms. This powerful technique platform identified in Blue-stained 2DE spots at least 42 and 63 proteins/spot in an analysis of a human glioblastoma proteome and a human pituitary adenoma proteome, respectively. A critical study to detect new proteomic markers of medullary thyroid carcinoma, combining MALDI-MSI and nLC-ESI-MS/MS were developed by [92]. They identified proteins as moesin, veriscan and lumican and intratumoural amyloid components, including calcitonin, apolipoprotein E, apolipoprotein IV and vitronectin with a potential role in medullary thyroid carcinoma pathogenesis [92].

6. Conclusion

In conclusion, the proteoform identification using a proteomic approach can be an advance in diagnostic routines and development of precision/personalized medicine. Efforts should be concentrated on clinical studies and then on, and one aspect that precludes is the cost and complexity of these tests. Therefore, studies to simplify sample preparation steps and MS platforms need to be performed to reduce cost per test.

Conflict of interest

The authors declare no conflict of interest.

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Jucélia da Silva Araújo and Olga Lima Tavares Machado (December 23rd 2019). Proteoforms: General Concepts and Methodological Process for Identification, Proteoforms - Concept and Applications in Medical Sciences, Xianquan Zhan, IntechOpen, DOI: 10.5772/intechopen.89914. Available from:

chapter statistics

105total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Preparing Proteoforms of Therapeutic Proteins for Top-Down Mass Spectrometry

By Siti Nurul Hidayah, Manasi Gaikwad, Laura Heikaus and Hartmut Schlüter

Related Book

First chapter

Overview of Current Proteomic Approaches for Discovery of Vascular Biomarkers of Atherosclerosis

By Lepedda Antonio Junior, Zinellu Elisabetta and Formato Marilena

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us