Identification of Peptides and Proteins in Illegally Distributed Products by MALDI-TOF-MS

An analytical strategy based on matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) mass spectrometry (MS) for identification of peptides and proteins in illegally distributed products is presented. The identified compounds include human growth hormone (hGH), human somatoliberin, anti-obesity drug (AOD), growth hormone releasing peptides (GHRP-2 and GHRP-6), Glycine-GHRP-2 and Glycine-GHRP-6, ipamorelin, insulin aspart and porcine, delta sleep-inducing peptide (DSIP), thymosin β 4, insulin like growth factor (IGF), mechano growth factor (MGF), human chorionic gonadotropin (hCG), melanotan II, bremelanotide, dermorphin and body protecting compound (BPC 157). The identification of proteins was mainly based on peptide mass fingerprinting, i.e., bottom up approach, while the smaller peptides were identified through de-novo sequencing. In cases when a reference standard was available, complementary identification was performed by capillary electrophoresis in double-injection mode (DICE), where a suspicious product was compared with the reference standard through two consecutive injections within the same electrophoretic run.


Introduction
A broad range of proteins and peptides, for various purposes of enhancement, such as human growth hormone (hGH), i.e., somatropin, can be obtained from the illicit market. These products are mainly marketed as lyophilized formulations in small glass containers often without labelling. The customers are exposed to a range of potential harms, besides from the active components, including bacterial and fungal or viral infections which may arise from the fact that they are administered parenterally. Figure 1A illustrates the total number of injection vials containing white lyophilized product cake being seized by the Swedish Customs during nine years in the past, i.e., 2010-2018. A large proportion of these samples, i.e., 64%, contained human growth hormone or melanotan II. About a third of the seized vials, i.e., 27%, did not contain any active peptide or protein, while the remaining 9% of the vials contained other compounds ( Figure 1B).
The concept of a proteolytic peptide pattern, i.e., protein peptide mapping (PPM), being characteristic of a protein was first demonstrated by SDS-PAGE [1]. In 1989, peptide sequencing by automated Edman degradation had a cycle-time of nearly one hour per amino acid residue. Samples of interest often contained complex mixtures of proteins, which usually required separation by SDS-PAGE followed by electroblotting onto a polyvinylidene fluoride (PVDF) membrane [2]. However, a more rapid approach to peptide sequencing is "peptide mass fingerprinting" (PMF). By PMF, proteins are enzymatically cleaved in a predictable manner and the sizes of the generated peptide fragments are specific for different proteins. Identification of Peptides and Proteins in Illegally Distributed Products by MALDI-TOF-MS DOI: http://dx.doi.org /10.5772/intechopen.95335 Subsequent analysis of the obtained peptides by mass spectrometry (MS) generates mass-to-charge ratio (m/z) values in the mass spectrum which in turn give rise to a characteristic "peptide mass fingerprint" of the protein [3,4]. The fingerprint serves to identify the protein by comparison with in silico digests, i.e., search engines attempt to match peptides from in silico digested proteins to those measured by the mass spectrometer [5][6][7][8][9]. Peptide mass fingerprinting with MS, which was first demonstrated with fast atom bombardment ionization in 1981, provides the possibility of identifying a protein at nanogram-level [5,[10][11][12]. Trypsin is a commonly used proteolytic enzyme for PMF, since it is relatively cheap, highly selective, and generates peptides with an average size of about 8-10 amino acids which are ideally suited for analysis by MS. It cleaves principally on the C-terminal side of arginine and lysine with the exception of Arg-Pro and Lys-Pro [2]. Limitations to protein identification by PMF include; I) The protein sequence must be present in a database for a successful protein identification. II) Proteins with extensive post-translational modifications may fail to yield good matches [13]. III) Different isoforms of a protein or alternatively spliced proteins may not be distinguished if the unique sequence regions are not observed in the peptide map. IV) Incomplete proteolytic digestion and differences in peptide ionization provide an incomplete mass fingerprint of the protein. Therefore, a complementary approach to PMF for protein identification is the use of tandem mass spectrometry (MS/MS), whereby tryptic peptide ions from the first stage of MS are dissociated along the backbone and then separated and detected in a second stage of MS to identify primary amino acid sequences [14][15][16]. Tandem mass spectrometry in conjunction with PMF provides even more specificity, thereby facilitating the identification [17,18].
Since the innovation of sensitive commercial instrumentation based on MALDI-TOF MS in 1992, the technique has been widely used for protein identification due to its high sensitivity and mass accuracy, speed, extremely low material consumption, absence of multiple charge mass signals and relatively high tolerance toward additives and contaminants such as salts, matrix components and excipients [19][20][21][22][23][24][25][26]. Furthermore, MALDI is a micro-destructive analytical technique and the remaining material on the MALDI target plate can be archived for later analysis. The high sensitivity of MALDI implies that only a small aliquot of the digested protein is required for mass analysis, and the remainder can be used for alternative measurements. MALDI provides additional information regarding the primary structure of the protein by sequencing of selected tryptic peptide ions in post source decay (PSD) mode [27][28][29][30][31][32][33][34]. MALDI in-source decay (ISD) is another attractive method which generates partial sequence information of intact proteins with up to 20-50 amino acid residues [35] (Figure 2).
The sequence information from MALDI-PSD or MALDI-ISD analyses can be used to validate protein identification. The singly charged ions generated by MALDI-TOF-MS are a mixture of b-, y-and a-ions accompanied by ions resulting from neutral loss of ammonia or water [36][37][38][39].
PMF-based protein identification is accomplished by searching a protein sequence database using different search engines such as ProFound [40], Mascot [41], or SEQUEST [15]. A value-based scoring system has been developed that facilitates the identification without accompanying amino acid data [42,43]. Parameters which are considered to be important for the identification include; molecular mass, protein sequence coverage and the number of matching peptides [42]. However, presence of a signature peptide, being unique for a protein, facilitates the PMF-based identifications [44]. Prior reports suggest that a minimum of four matching peptides and a sequence coverage of at least 20% is necessary for positive PMF-based protein identification [45,46]. The other alternative strategy for protein identification is the top down approach, where intact molecule ions are subjected to gas-phase fragmentation [47].
Proteins with posttranslational modifications, such as glycosylation, present additional challenges since the masses of the modified peptides are different and thus do not contribute to the identification. In such cases, the protein can be analyzed by capillary electrophoresis (CE), in order to explore the heterogeneity of the protein followed by comparison of its electropherogram with that of the corresponding reference standard [13,48].

Sample preparation
MALDI-TOF-MS is very tolerant to salts and sample matrices, hence it is seldom necessary to desalt the sample. However, sometimes it is necessary to use a C 18 micro-column in order to fractionate a complex sample or enhance the target analyte concentration.
The sample to be analyzed is mixed with a matrix solution (1:1, v:v), e.g. sinapinic acid (SA) or alpha-cyano-4-hydroxycinnamic acid (ACHCA). One μl of the mixture is deposited on the MALDI target plate and allowed to air-dry (i.e., the dried-droplet method) before being placed in the mass spectrometer [19,49].

Proteolysis
The analyte to be digested is dissolved in ammonium bicarbonate (50 mM, pH 7.9). The intact sample is directly analyzed by MALDI in order to determine the molecular mass of the analyte. Then, 200 μl of the solution is digested by addition of 2-10 μl trypsin (200 μg/ml in 10 mM HCl). The reaction is carried out at room temperature or at 37°C for 30 minutes up to 24 hours, depending on peptide or protein in question. It has been found that 30 minutes digestion of somatropin at room temperature generated enough tryptic fragments for the MALDI analyses [50]. For more complex proteins, such as human chorionic gonadotropin, the required time  period for proteolysis is found to be 24 hours at 37°C. Insulin porcine is digested at 37°C for 12 hours, while other peptides are digested at 37°C for 4 hours. In order to enable alkylation of the cysteine residues in a protein or peptide, it is reduced by using DTT or 2-mercaptoethanol (ME) followed by labelling of the free thiol groups with 2-iodoacetamide. The alkylation is carried out through the following procedure: 1. 2.5 μl 100 mM ME is added to 10 μl of the protein solution.
2. The protein is then incubated at 50°C for 15 minutes to reduce the S-S linkages.
3. 2.5 μl 2-iodoacetamide (100 mM) is added into the mixture to interact with free sulfide groups of the cysteine residues at +4°C for 15 to 60 minutes in darkness.
4. 2.5 μl (10 μg/mL) trypsin is added to the mixture for the digestion. The reaction is performed at room temperature or at 37°C [13,50].

Apparatus and operating conditions
MALDI-TOF analyses are performed using either an Autoflex or an Autoflex Max (Bruker Daltonics, Bremen, Germany) reflector type time-of-flight mass spectrometer, equipped with a pulsed nitrogen laser working at 337 nm and a smartbeam II laser working at 355 nm, respectively. The Autoflex instrument is operated in the positive ion mode with delayed extraction at an accelerating voltage of 20 kV and a variable voltage reflectron. The parameter settings are optimized to analyze peptides in reflectron mode. Before analysis, the instrument is externally calibrated with Bruker Daltonics standard peptide or protein mixtures. Peptide mass peaks occurring due to autolysis of trypsin (porcine) such as 842. 51

Results and discussion
Illegally distributed lyophilized or liquid products being suspected to contain pharmacologically active peptides were seized by the Swedish customs. The analyte to be identified is analyzed in both reflectron and linear modes in order to determine its molecular mass (Figure 3). Large peptides and proteins are then exposed to trypsin digestion in order to obtain peptide-mass map upon MALDI analysis in reflectron mode. Small peptides are, on the other hand, analyzed in reflectron mode and/or PSD mode directly. This strategy was applied to the identification of the following peptides and proteins (Figure 4 and Table 1).

Identification of somatropin (hGH)
Recombinant hGH or somatropin consists of 191 amino acids with two disulfide bridges (Cys53-Cys165 and Cys182-Cys189) and promotes proteinogenesis as well as The sample to be identified is analyzed in both reflectron and linear modes in order to determine the molecular mass of the analyte. Depending on the size of the molecule it will be exposed to enzymatic digestion in order to be identified through PMF. Small peptides used to be identified by de novo sequencing in PSD mode. fat mobilization and oxidation [51][52][53]. Recombinant hGH is used as a prescription drug to treat children's growth disorders and adult growth hormone deficiency. In the belief that the beneficial impact of somatropin on the growth can be extrapolated to healthy individuals, it is abused by bodybuilders and athletes [54]. However, many users are unaware of the correct dosage and how to prepare the solution for giving an injection. It has been demonstrated that supra-physiological dosages can have fatal consequences [55]. Apart from the undesired consequences following the abuse of somatropin, our investigations have shown that the illegally marketed products contained high levels of impurities such as endotoxins [50]. Endotoxins are associated with Gram-negative bacteria which can cause severe immune response and diseases in humans [56,57]. Somatropin was identified through PMF and MALDI-ISD (see Figure 2)

Illegally distributed peptides and proteins that have been analyzed by MALDI-ToF-MS and DICZE. The monoisotopic mass (M mass ) of the analytes and the employed analytical methodology is indicated.
standard has made it possible to apply double injection capillary zone electrophoresis (DICZE) for both identification and impurity determination of somatropin products [50,58,59]. The DICZE-method provided complementary information on the native protein, providing a side by side comparison between the electrophoretic patterns of the reference standard and the analyte to be identified [50].

Identification of human somatoliberin
Human somatoliberin, growth hormone-releasing hormone (GHRH), constitutes of 44 amino acids without any post-translational modification or disulfide bridge. Somatoliberin was first isolated from two pancreatic islet cell tumors, and subsequently from normal human hypothalamus [60][61][62]. The MALDI results from determination of the molecular mass, PMF and amino acid sequence revealed that the Asn 8 (N), Gly 15 (G) and Met 27 (M) residues have, respectively, been replaced by Gln 8 (Q ), Ala 15 (A) and Leu 27 (L) during the synthesis (see Figures 4 and 5). The peptide was successfully identified by PMF and de-novo sequencing of three of the tryptic peptides.

Identification of an anti-obesity drug (AOD)
The AOD peptide is a fragment of the C-terminus of human growth hormone (fragment 177-191) where a tyrosine is added at the N-terminus. It is a cyclic peptide consisting of 16 amino acids with a disulfide bridge between cysteine residues at positions 7 and 14 in the peptide chain [63] (Figure 4 and Table 1). The fragment is the minimum length of the hGH sequence that retains the lipolytic and antilipogenic properties of hGH [63][64][65]. The molecular peptide masses of its tryptic peptides complied with the peptide map of hGH fragment 177-191. The existence of the disulfide bridge between C 7 and C 14 was confirmed upon analysis of the non-reduced tryptic sample (Figure 6). This peptide has also been employed as a signature peptide for the identification of hGH [48,50]. The amino acid sequences of three selected tryptic peptides were also confirmed.

Identification of insulin porcine and insulin aspart
Insulin regulates the cellular uptake, utilization, and storage of glucose, amino acids, and fatty acids and inhibits the breakdown of glycogen, protein, and fat. Since more than one decade ago the illegal use of insulin has been noticed  [78]. However, the misuse and wrong administration of insulin could cause the, so called, dead in the bed syndrome [79]. In bodybuilding, insulin works such as testosterone or hGH to consolidate muscle tissue. Insulin also prevents breakdown of muscles and vanishes rapidly from the body, since it has a very short half-time (t 1/2 ) [80].   Several illegal products containing insulin porcine or aspart have been analyzed. Insulin is composed of two peptide chains, i.e., A and B, which are joined by two inter-chain disulfide bonds. The A chain also contains an intra-chain disulfide bond (Figure 4). The results summarized in Table 3, demonstrate the applied strategy for the identification of porcine and insulin aspart. The insulin molecules were reduced using a potent reducing agent, i.e., 2-mercaptoethanol (ME). MS-analysis of the reduced samples resulted in a mass spectrum consisting of several signals from both reduced A and B chains. The A and B chains generated three and four signals, respectively, corresponding to the ME-modified peptide as described in Table 3. It is to be noted that the amino acid residues P and A at positions 28 and 30 in the B-chain, respectively, have been replaced by D and T in insulin aspart. Therefore, these insulin molecules are distinguished upon these differences. The tryptic digestion of the B chain yielded three peptide fragments of different sizes (Figure 8 and Table 3). The molecular masses of these peptides were determined accurately, and the amino acid sequence of the tryptic peptides were determined in PSD-mode.

Insulin
Theoretical These peptides originate from insulin aspart, see Figure 8. Table 3.

MALDI-TOF-MS analysis of insulin porcine and aspart.
Double-injection capillary electrophoresis has also been applied for the identification of insulin molecules [81].

Identification of delta sleep-inducing peptide (DSIP)
The nonapeptide delta DSIP was first isolated from the cerebral venous blood of rabbits in an induced state of sleep during the mid-70s [82]. It was primarily believed to be involved in sleep regulation due to its apparent ability to induce slowwave sleep in rabbits. However, it has been demonstrated that short-term treatment of chronic insomnia with DSIP is not likely to be of major therapeutic benefit [83]. The peptide is marketed illegally presumably for the treatment of insomnia. The peptide was directly exposed to the PSD analysis in order to confirm its molecular mass and amino acid sequence (Figure 4 and Table 1). Table 3.

Identification of thymosin β 4
Synthetic thymosin is a peptide consisting of 43 amino acids with artificial acetylation of the N-terminus (see Figure 4 and Table 1). Thymosin has the potential of playing a significant role in tissue development, maintenance, repair, pathology and other important biological activities [84]. Some important biological activities of thymosin are related to the peptide sequence L 17 KKTET 22 [85]. Illegally distributed thymosin products are claimed to promote a variety of beneficial biological functions, such as muscle building. The peptide was identified through PMF and de-novo sequencing of the tryptic peptides ( Table 4).

Identification of human chorionic gonadotropin (hCG)
Human chorionic gonadotropin (hCG) is a glycoprotein hormone consisting of α (92 amino acids) and β-subunits (145 amino acids) being noncovalently associated [86]. These subunits are, however, highly cross-linked internally through disulfide bridges, i.e., the α-subunit has five disulfide bridges [87], while the β-subunit has six [87,88]. The protein is heavily glycosylated where oligosaccharides are attached to the protein backbone through asparagine and serine residues and constitute approximately 30% of the molecular mass [89]. The protein has been identified using MALDI-TOF-MS and DICZE [13,50]. Approximately 40% of the amino acid sequence of hCG was confirmed upon PMF ( Table 5) [13].  The identification was confirmed by DICZE analysis of illegal samples together with the corresponding reference standard [13,50].
Skin-tanning products that claim to contain MII are being advertised and sold on the illicit drug market. Injection of MII can result in systemic toxicity and rhabdomyolysis [90]. Bremelanotide (formerly PT-141) is an active metabolite of MII ( Table 1).
These peptides were identified through the top-down approach by MALDI in PSD mode as illustrated in Figure 9.

Identification of dermorphin
Dermorphin is a μ-opioid receptor-binding peptide that causes both central and peripheral effects [92] (Figure 4 and Table 1). This peptide, being originally isolated from the skin of the south American tree frog Phyllomedusa sauvagii, is classified as one of the strongest mammalian endogenous analgesic opioids [93,94]. Dried frog skin containing dermorphin, has been used as a therapeutic agent by the Matses tribes of the upper Amazonian basin, to treat cuts during hunting expeditions [95]. The analgetic effects of dermorphin has been demonstrated in rat, horse, dog and white sea cod [92,94]. It has been used illegally in horse racing as a pain killing agent, allowing horses to run even if injured.
This peptide, which was detected in several samples, was identified by MALDI in the PSD mode (Figure 10). The molecular structure was confirmed by NMR spectroscopy.

Identification of body protecting compound 157 (BPC 157)
BPC 157 being a partial sequence of body protecting compound (BPC) (M mass = 40 kDa) is a synthetic peptide, which is composed of fifteen amino acids  ( Figure 4 and Table 1). BPC was discovered and isolated from mouse gastric juice in response to stress stimuli in the gut mucosa [96]. BPC 157 is also known as Bepcin and PL. 14,736 or PL 10 [97]. This peptide fragment was speculated to be responsible for the BPC's physiological and protective effects [96]. However, it is unclear whether this peptide is endogenous to humans. BPC 157 is suggested to aid in tendon, ligament and muscle healing, and therefore its use as a quick injury healing in the sporting world is appealing. However, no proper clinical trials in human subjects have yet been performed to investigate the healing capability and the harmful effects of this compound [97].  BPC 157 was recently identified in several confiscated vials for injection. The identification was carried out by MALDI in both PSD and reflectron modes (Figure 11). The amino acid sequence of the peptide was confirmed by NMR spectroscopy and LC-QTOF-MS.

Conclusions
The proposed methods, based on PMF by MALDI-TOF-MS as well as analysis with DICZE, provided an efficient procedure for the identification of peptides and proteins in illegally distributed samples. The use of trypsin as a proteolytic enzyme generated peptide fragments which covered 40 to 80% of the amino acid sequences of the analyzed proteins. The presence of a signature peptide in the peptide map facilitated the analyte identification considerably. MALDI-TOF-MS was also applied in the PSD mode for the amino acid sequencing of selected tryptic peptides as well as small peptides, such as ipamorelin.
The double-injection CE method provided complementary information on the native protein in the presence of a reference standard. This provided the possibility of performing a comparison between the electrophoretic patterns of the reference standard and the analyte to be identified. In addition, the double-injection based identifications were carried out by comparing the corrected migration time of the analyte and the observed migration time of the reference standard.