Determination of Gluten Peptides Associated with Celiac Disease by Mass Spectrometry

Gluten is a big protein network composed of monomeric fraction (prolamins) and poly- meric fraction (glutelins), occurring in many cereal-based products, especially in those containing wheat. Gluten peptides can trigger food allergies and intolerances, including inflammatory reactions as the celiac disease, an autoimmune disorder of the small intestine characterized by mucosal degeneration and villous atrophy. The treatment is the permanent exclusion of gluten from diet. However, gluten analysis is a very difficult task, due to the high complexity of polypeptides and the lack of consensus on the most appro- priate analytical method. Proteomics approaches, combining liquid chromatography and mass spectrometry in tandem (LC-MS/MS), have been pointed as the most promising non- immunological techniques for gluten detection. LC-MS analyses associated with bioinformatics and specific-prolamin database can solve methodological limitations since it is based on the accurate molecular mass of peptide biomarkers. One of the major contributions of proteomics has been the identification of epitopes of gluten peptides responsible for wheat-related diseases. Recent works have defined grain-specific gluten peptides and also the lowest concentration at which peptides could be confidently detected. Proteomic application for gluten quantification should support not only regulatory limits in processed foods, but also the safety of consumers about food labeled as gluten-free.


Introduction
Gluten is defined as a complex protein network present in the cereal endosperm, responsible to confer viscoelasticity to pasta. It is composed by the cereal storage proteins, divided into two protein fractions: monomers, formed by alcohol-soluble prolamins, and polymers, formed by alcohol-insoluble glutelins [1]. This insoluble complex occurs when the gluten proteins are hydrated and submitted to mechanical force. Dry gluten is composed about of 75-85% proteins and 5-10% lipids, the rest being residual starch and non-amylaceous carbohydrates [1].
The wheat gluten network presents exclusive rheological properties as viscosity, extensibility, and elasticity conferred by the storage proteins: gliadins and glutenins [2]. An appropriate proportion of both protein fractions in dough is essential to guarantee the viscoelastic properties and end-product quality [1]. Due to these properties, wheat is recognized as the most suitable raw material for bread and pasta-making. Vital wheat gluten is a raw material widely added in gluten-based food products to improve quality and sensory properties and can be obtained from washing the viscoelastic dough, removing the water-soluble components [3,4].
Besides the technological aspect, the gluten proteins can trigger food allergies and intolerances, including inflammatory reactions in patients with celiac disease (CD). CD is a gluten-sensitive enteropathy defined as an immune-mediated disorder triggered by gluten in genetically predisposed individuals.
The family of storage proteins of gluten occurs in wheat grains (Triticum spp.; gliadins and glutenins), barley (Hordeum vulgare; hordeins), rye (Secale cereale; secalins), and oats (Avena sativa; avenins). In the context of gluten intolerance, one of the most common definitions of gluten is provided by the European Commission Regulations: "protein fraction from wheat, rye, barley, oats or their crossbred varieties and derivatives thereof, to which some persons are intolerant and which is insoluble in water and 0.5 M sodium chloride solution" [5].
The gluten proteins are present in various types of cereal-based food products, mainly in wheat-based products. However, due to the incorporation of gluten as an ingredient in foods that traditionally does not contain wheat proteins, there is also a growing concern about gluten allergenicity in hidden sources of gluten, incorrect labeling or cross contamination in manufacturing, transportation, and storage [3]. Hence, because of its nutritional and economic importance, there is a big effort to characterize these proteins. Since the treatment for gluten sensitivity is the exclusion of gluten from diet, the detection and quantification of these proteins are extremely important, not only due to its direct effect on the food quality but also for food safety reasons.
Nevertheless, the gluten analysis in food products is a very difficult task, due to the need to properly extract the proteins before analysis and to the high complexity and homology of polypeptides. Hence, the first point to be addressed is the appropriate protein extraction, whose steps involve sequential buffers to perform prolamin extraction and the reduction of disulfide bonds of glutenins, formerly insoluble, releasing their polypeptides [6].
The second point is about the lack of consensus on the most appropriate analytical method to identify and quantify gluten in food. The most commonly used methods are based on enzymelinked immunosorbent assay (ELISA), PCR, and also electrophoresis, but these methods differ in terms of sensitivity and present several drawbacks. The main faced problem is related to the lack of certified reference material [7]. In fact, the immunological methods are based on the use of developed antibodies for the detection of gliadins and, therefore, are not suitable for all classes of gluten proteins. In addition, current methods are unable to distinguish the source of cereals.
The protein composition of the grain varies among different species and varieties, and it leads to methodological difficulties in the allergenic food analysis. In this context, modern proteomic approaches based on sensitive and reliable techniques combining liquid chromatography (LC) coupled with mass spectrometry in tandem (MS/MS) have been pointed as the most promising non-immunological techniques for identification and quantification of gluten proteins, even in trace level [7][8][9][10].
The most representative species of this class are rice, wheat, rye, barley, and corn. Wheat is one of the most important and most consumed cereals in the world and is considered the most suitable raw material for baking and pasta-making. Its production and consumption have remained constant over the years, being the second most produced and consumed cereal (the first one is corn and rice is the third one) [13].
Rye, barley, and oats also have significant production and consumption, and they are mainly used for baking, especially in the case of rye; barley malt is an important ingredient for beer production but can also be found in the form of meal, flakes, or flour, whereas bran and other oat-based products are largely available for immediate consumption [14].
The cereal proteins are classically divided according to Osborne [15], in four groups consistent with its solubility, being albumins soluble in water; globulins in diluted saline solutions; prolamins in alcoholic solutions; and glutelins in diluted acids or bases. Albumins and globulins are metabolic proteins, which represent 20% of total protein content and participate in important functions in plant development and responses to environment [16], while prolamins and glutelins, cumulatively referred to as gluten, represent the major class of storage proteins (i.e., 80% of total protein), which function is to store nutrients, providing nitrogen during seed germination [12].
The gluten proteins present common structural characteristics. The primary structure of these proteins is subdivided into distinct domains that may have repeated sequences of some specific amino acids [2]. These proteins are unique in terms of amino acid composition, characterized by high levels of proline (P) and glutamine (Q) 1 and low levels of amino acids with charged side groups. Glutamine generally predominates (15-31%), followed by proline in the case of wheat, rye, and barley (12-14%) [18]. Cysteines represent only 2% of the amino acids of the gluten proteins but are extremely important to the structure and functionality of gluten [1]. The nutritionally essential amino acids tryptophan (0.2-1.0%), methionine (1.3-2.9%), histidine (1.8-2.2%), and lysine (1.4-3.3%) are also present only at very low levels [18].
Breeding and genetic engineering have been successfully applied to improve the content of essential amino acids, such as the case of high-lysine barley and corn. However, these approaches may be used to develop celiac-safe wheat; this remains a formidable challenge due to the complex multigenic control of gluten protein composition, besides the requirement of acceptable technological properties for bread and pasta-making [19,20].
The cereals present variable levels of Osborne's fractions (albumins, globulins, prolamins, and glutelins). The amino acid composition of prolamins can be correlated to the botanical genealogy of cereals, where wheat, rye, and barley belong to the subtribe Triticeae and oat to Aveneaea [21]. The amino acid composition is similar in wheat, rye, and barley, whereas in oats, the prolamin composition is intermediate between the Triticeae and other cereals. The amount of glutamine in oat prolamins is similar of the Triticeae, while the amounts of proline and leucine in oat prolamins are smaller and larger, respectively, to those found in Triticeae [21].
Gliadins are the group of monomeric proteins present in wheat gluten, whose molecular weight (MW) ranges from 30 to 75 kDa. Gliadins are regrouped based on its electrophoretic mobility and structural similarity: α/β-gliadins, γ-gliadins, and ω-gliadins. As the other cereal prolamins, they are all soluble in alcohol, a characteristic of this group [22]. The α/βand γgliadins are smaller (30-60 kDa) than the ω-gliadins (<75 kDa) [2]. The first ones have very similar primary sequences and present N-terminal domain with repetitive sequences with 7-11 amino acids (P/Q) and C-terminal homologue domains, with 6-8 cysteines able to form intrachain disulfide bonds [17]. The ω-gliadins show the highest levels of proline and glutamine, with repetitive sequences of 8-10 of these amino acids.
The wheat glutenins are formed by a heterogeneous mixture of polypeptides with high molecular weight, which can reach until 1 million Da. They are considered one of the biggest proteins found in nature [23]. Depending on the polymerization degree, these polymers remain insoluble even in denaturating buffers such as SDS, leading to a difficult solubilization. Glutenin polymers are formed by monomeric glutenin subunits (GS), subdivided according to the MW and stabilized by interchain disulfide bonds. The high-molecular-weight glutenin subunits (HMW-GSs) present MW ranging between 65 and 90 kDa and can be subdivided into x-type and y-type, while the low-molecular-weight glutenin subunits (LMW-GSs) present 30-60 kDa and are subdivided into B, C, and D groups according to electrophoretic mobility [22,24]. 1 Typical of all cereal flours is the fact that glutamic acid almost entirely occurs in its amidated form as glutamine.
In other cereals, HMW group contains HMW secalins and D-hordeins, respectively, in rye and barley. They comprise polymers (glutelins) possessing around 600-800 amino acid residues, MW of 70 and 90 kDa, and a high content of glutamine, glycine, and proline, which represent around 60% of residues [18]. HMW and MMW proteins are missing in oats. The MMW group consists of monomeric ω-secalins and C-hordeins, including 300 and 400 amino acid residues and MW around 40 kDa. They are characterized by high contents of glutamine, proline, and phenylalanine, which together account for 80% of residues.
The LMW group not only includes monomers such as γ-40 k-secalins, γ-hordeins, and avenins of oats, but also polymers including γ-75 k-secalins and B-hordeins. They have between 200 and 430 amino acid residues, with MW ranging from 23 to 50 kDa, and its amino acid composition is dominated by glutamine and proline and by relatively high levels of hydrophobic amino acids, leucine and valine [25].
Wheat gluten is of great importance in the food industry because it promotes the dough ability to retain carbon dioxide produced during fermentation, resulting in the rising of dough that presents good gas-holding properties. Barley and rye flours are also able to form gluten because of its chemical composition, whose proteins are similar to gliadins and glutenins. However, the gluten network formed by them is more fragile since these proteins are present in a smaller amount than in wheat flour [21]. Due to the unique viscoelastic characteristics conferred by the wheat gluten proteins, wheat flour becomes an essential ingredient for the food production [3].

Celiac disease (CD)
CD is an autoimmune disorder of the small intestine characterized by mucosal degeneration and villus loss, mainly affecting the capacity of nutrient absorption. Its origin is related with the presence of genes human leukocyte antigen (HLA)-DQ2 or HLA-DQ8, and both genotypes cause the predisposition for the disease [26], but 95% of CD patients exhibit the DQ2 serotype class [25]. In predisposed individuals, it can manifest in any stage of life, since that the contact with the protein fraction of wheat, barley, or rye was established [27].
Diagnosed patients cannot consume foods containing gluten or its traces, because even a minimal amount of this protein can trigger the reaction, causing the most varied symptoms, ranging from abdominal pain, bloating, and diarrhea to osteoporosis and infertility in long term. The severity of the reaction can be due to the degree of intolerance of each individual [28,29].
Current knowledge about the pathogeneses of CD has been associated with the long chain and amino acid composition of the peptides generated during gastrointestinal digestion of the gluten proteins [20]. Due to the lack of lysine and arginine residues in gluten proteins, the action of the proteases, such as trypsin, but also chymotrypsin and pepsin, is very difficult, making the proteolysis practically ineffective. Because of its hard cleavage, those proline-and glutamine-rich polypeptides act as mediators of immune reactions in the intestinal epithelium cells of the predisposed subjects [25].
The most celiac-active T-cell epitopes are present on the α-gliadins, but T-cell epitopes derived from either γor ω-gliadins as well as from HMW and LMW-GS have been reported in Refs. [19,30]. However, T-cell epitopes from hordeins and secalins have been also described; it can be explained by their high homology to those found in wheat [30]. While the consumption of wheat, rye, and barley has been proved to cause harm to CD patients, there is still a discussion about the safety consumption of oats by CD patients.
In this context, there are controversies about the reactivity of oat gluten, since only a few numbers of celiac patients have demonstrated to be affected by oat consumption [28,31]. Recent reports suggest a tolerated oat consumption for a great part of celiac patients, showing a safe long-term feeding [32,33]. Although some authors consider oats a gluten-free cereal, the main problem is the risk of cross contamination by gluten-based cereals during harvest, milling, or industry processing [5,34,35]. For this reason, this cereal cannot be completely discarded as CD trigger, and its consumption by celiacs is still considered unsafe [36,37].
The Codex Alimentarius proposed in 2008 a standard international labeling, where products labeled as "gluten-free" must not exceed the limit of 20 ppm of wheat, barley, or rye gluten, which corresponds to approximately 1 mg of gluten in 50 g of food [38]. The maximum amount of gluten tolerated by celiac patients is not completely known, because of the variable reactivity of gluten among different species and also the unpredictable sensitivity among individuals. However, several studies have indicated that 10 mg of gluten daily are well tolerated, while intestinal mucosa damage has been observed with doses around 50 mg (as reviewed by Ref. [39]).
The difference in the amino acid composition of prolamins and glutelins from each cereal has been pointed as responsible by the different reactivity associated with the CD [11,21]. Compared to other cereals, grains belonging to subtribe Triticeae (wheat, barley, and rye) contain significantly higher levels of glutamine and proline than others, being these amino acids the principal responsible for triggering the immune response in celiac disease [25]. A direct correlation between the immunogenicity of the different oat varieties and the presence of specific peptides with differential reactivities has been proposed as the origin of the wide range of variation of potential immunotoxicity of oat cultivars [40].
Triticum species exhibits an important genetic variability, resulting in different toxicities, what can be a promising alternative for obtaining suitable varieties for consumption by celiac patients [19,41,42]. Higher levels of immunogenic peptides related to CD were attributed to a modern Canadian wheat when compared to old varieties of common wheat and tetraploid wheat [43]. Despite the importance of genotypic variation within species and cultivars, specific knowledge about CD, especially regarding the structure of the allergens and the immunoreactive epitopes is not fully known and requires new information.

Gluten detection techniques
Several methods have been developed to guarantee the safety of foods labeled or expected to be gluten-free for celiac patients. However, there is no consensus about the analytical method considered more appropriate to identify and quantify gluten in foods [37]. The main used methods are based on different techniques for the detection of DNA sequences, related proteins, such as the enzyme-linked immunosorbent assay (ELISA) and the polyacrylamide gel electrophoresis (PAGE) methods or more recently the detection of digested peptides by means of liquid chromatography and mass spectrometry (LC-MS).
These methods differ widely from each other, especially in terms of sensitivity, specificity, and cost. Other reasons for this divergence can be related with food processing (heat or hydrolysis steps); matrix type; polymorphic variants of wheat, rye, and barley; type of extraction; and possible cross-reaction with other prolamins.

Enzyme-linked immunosorbent assay (ELISA)
Currently, the ELISA method is the most common and recognized approach for detection of gluten, because it presents low cost; it is easy to perform and promotes results quickly. It is the technique recommended by the Codex Alimentarius for the detection of gluten in industrialized foods [44]. This technique is based on the immunological reaction between known toxic peptides from gluten proteins and mono-or polyclonal antibodies.
There are two variations of the method, the R5 ELISA sandwich and competitive R5 ELISA. In ELISA sandwich, samples containing the antigens are incubated to form an antibody-antigen complex, and then a labeled antibody is incubated and conjugated to another antigen epitope, forming two layers of antibodies. This method requires at least two binding sites (epitopes) for the antibody and is only suitable for large peptides or intact protein quantitation, being unfeasible to detect partially hydrolyzed gluten (e.g., fermented foods).
The competitive ELISA only requires one epitope and is indicated for detecting minor antigens, present in partially degraded gluten. In this method, a competitive binding process performed by original antigen (sample antigen) and the added antigen, leading to the competition of the antigens by the limited number of epitopes, occurs. When available, quantification can be done through calibration curves with reference proteins [45].
Some ELISA-based studies were successfully applied in the detection of wheat, barley, and rye contamination, with confirmation of the results by MS and PCR [34,35]. However, measurements by commercial ELISA kits are inconsistent and require standardization of results due to the lack of certified reference material and the diversity of kits using different test conditions [7,46].
Current methods are based on the use of antibodies that are not accurate and may have falsenegative results. These antibodies were especially developed for the detection of gliadins and therefore are not suitable for all classes of gluten, especially in matrices that are difficult to analyze [7,[47][48][49]. The accuracy of ELISA method is also compromised since the result is converted into gluten by multiplication by two, assuming that the gliadin/glutenin ratio is constant. Moreover, the current methods are not able to distinguish the cereal source (wheat, barley, rye) or cultivar [50,51].
The development of standardized gluten material represents significant progress toward the accurate analysis of gluten in low levels. However, this is a challenging task due to polymorphism of gluten proteins, which vary from sample to sample [7,46,48]. When comparing the use of modern techniques such as LC-MS and ELISA, previous studies show no correlation between ELISA results and the relative content of peptides determined by MS [48]. The authors concluded that ELISA methods are no longer sufficient for gluten quantification and should eventually be replaced by MS-based methods.
In this context, methods based on MS have been alternatively proposed for gluten quantification, since it can detect specific and comprehensive peptides with good sensitivity and precision, due to the high-throughput data analysis capacity [10,46]. A progressive number of approaches using MS have been developed, offering great potential in this area [9,37,46,52].

Proteomic tools for gluten detection
Proteomics is the large-scale analysis of the set of proteins encoded by the genome responsible for controlling almost all biological processes in a particular biological system at a certain time.
Proteomics includes not only the structural and functional knowledge of proteins but also the study of their modifications, interactions, localization, and quantification. The proteome of an organism is dynamic; it will reflect the momentaneous response of those cells to determinate stimulus. It means that a single genome can give origin to infinite different proteomes [53].
The most practical application of proteomics refers to the analysis of target proteins as opposed to the entire proteomes [53]. The use of proteomics in food analysis has become a key technological tool for characterization and quantification of proteins and peptides, especially when it comes to the evaluation of biological markers [54].
The protein composition of cereals is variable between different species and varieties, leading to methodological difficulties for food allergen analysis and also for selection of genotypes. The high similarity of amino acid sequences of the different prolamins, together with limitations on the available methodologies, makes the exact identification of the allergens and immunoreactive epitopes related to CD, as well as its genotypic frequency, variability, and stability, difficult [55].
In this context, proteomic approaches based on reliable and sensitive techniques such as highresolution LC-MS reveal themselves as important tools for the identification, quantification, and also discrimination of gluten proteins, since it is based on accurate molecular mass of peptide biomarkers.
In the last years, MS techniques have overcome some limitations associated to antibody-based methods, such as cross-reactivity and discriminating capacity of gluten protein sources in a single run [46]. Recently, label-free MS experiments have been improved in order to quantify specifically CD epitopes [43].
This type of research is very important, since accurate quantification and identification of the cereal source and protein type of contamination is critical to the health and well-being of celiac patients [8]. Furthermore, labeled "gluten-free" food products have shown contamination with gluten-containing protein fractions above the acceptable (20 ppm) [56].
One of the major contributions of proteomics related with gluten sensitivity diseases, especially CD, has been the identification of epitope sequences of gluten peptides of known immunogenic action. A number of gluten T-cell epitopes restricted by CD associated HLA-DQ molecules have been characterized over the last few years, and a compiled list of epitopes from gluten peptides able to activate the immune system was proposed (Table 1) (as reviewed by Ref. [30]). It is interesting to note that the identified sequences were not only from prolamins but also from glutelins. A website dedicated to these epitopes was created to update the list, but until now presented no recent inputs [30]. More recently, a database (ProPepper™) built from in silico results was proposed to assist the identification of epitopes, peptides, and prolamins associated with DC and other types of wheat and cereal disorders [55]. This database contains sequences of specific peptides, in silico digested, from prolamins available in public databases (UniProtKB, NCBI GenBank), and currently presents 37,914 peptides and 833 epitopes.

Liquid chromatography coupled with mass spectrometry (LC-MS)
LC-MS is an analytical technique that consists in the separation process based on differential interaction of sample components of a mixture, combining a powerful technology of the generation of molecular ions (ionization), which are separated and detected based on their mass/charge ratio (m/z) [57].
In nowadays, tandem designs (also referred to as MS/MS) make up most of the instruments in research laboratories. In this configuration, high energy is applied to produce fragments from precursor ions; hence, the selected peptides are then submitted to fragmentation in order to elucidate the amino acid sequence, allowing the confirmation and identification of sequences differing from one single amino acid [53,58]. LC-MS/MS is considered a gold standard for the analysis of biomolecules in complex samples, due to high levels of sensitivity and specificity, and has been used in food analysis and forensic science [59][60][61].
The main current strategies to identify gluten markers use both discovery (known as shotgun analysis) and targeted-based proteomic approaches. Basically, combined strategies can be applied based on primary fractionation of gluten proteins using RP-HPLC or SE-HPLC followed by a multi-enzymatic-based digestion of the protein resulting fractions and highresolution MS or MS/MS measurements [7][8][9]. The investigated gluten marker peptides can be identified by comparison via theoretical (in silico) and experimental results (e.g., de novo peptide sequencing), using current protein databank (NCBI, UniProtKB) or specific cereal prolamin epitopes involved in CD pathogenesis [55].
For the selection of gluten markers, the main used MS technique is the selected or multiple reaction monitoring (SRM or MRM) that allows targeted analysis, especially for quantification even in trace levels. The MRM method uses a mass spectrometer of triple quadrupole type (QqQ), where the precursor ions will be selected and focused on the first quadrupole (Q1). The second quadrupole (q2) is actually a collision cell, where the injection of a collision gas (usually argon) leading to ion fragmentation occurs. The third quadrupole (Q3) is the mass analyzer, responsible for defining which the fragments in the collision cell according to their m/z are generated [62].
In recent studies, some authors evaluated the presence of gluten peptide markers in beers by using MRM techniques [48]. These authors revealed the superiority of LC-MS in relation to the ELISA method when comparing analytical methods to quantify low levels of gluten peptides, since MS quantification is undertaken using peptides that are specific and unique, enabling the quantification of individual hordein isoforms.
Looking for more reliable results for celiac patients, other studies have sought to define glutenspecific peptides in an attempt to validate the MS as high-sensitivity analytical method for gluten detection. Fiedler et al. [9] applied MS to identify grain-specific peptide marker for wheat, barley, rye, and oats, to assess gluten contamination in various types of commercial flours. Martinez-Esteso et al. [7] identified a set of unique wheat gluten peptides and proposed its use as markers for the presence of gluten related to CD manifestation. The same authors reinforce the idea that this strategy can be applied to other food allergens and may be considered the first step for developing certified reference materials and defining a new methodology, more sensitive than ELISA, to detect gluten in foods.
For complex samples, such as gluten proteins, multiplex methods of acquisition, called dataindependent acquisition (DIA) or MS E , allow to recover sample of all the ions and minimize data loss (e.g., non-fragmented precursors) [10,63]. In MS E methods, all the ions generated at ionization source are transmitted to the collision cell, which alternates between high and low energy (c.a. from 15 to 55 eV), sending to the TOF analyzer, simultaneously, the precursors, and fragments of the peptides [64].
Modern technologies can be applied to surmount cross-reactivity problems associated to antibody recognition that are particularly challenging in gluten analysis due to high level of homology between different prolamins. For a consistent analysis of primary structures, showing a high degree of homology, it is also possible to separate peptides applying the ion-mobility system (IMS) that consists of an orthogonal separation technique, where for each value of m/z, a spectrum of drift time (dt) is added. The dt corresponds to the time taken by the ion to cross the ion-mobility cell, full of an inert gas, allowing the determination of cross shock sections [65].
The integration of IMS into MS E workflows provides an additional dimension of separation, improving system peak capacity while concomitantly reducing chimeric and composite interferences; ions can be distinguished by size, shape, and charge, besides to the m/z [66]. MS E is also able to provide absolute quantitative analysis by examining the signal response of a known internal standard spiked into the sample [10]. Developing MS E methods to quantitatively measure gluten peptides could support advancement in understanding the natural variability in protein expression of clinically relevant wheat grain allergens. Proteomic application for gluten quantification should support not only regulatory limits in processed foods but also the safety of consumers about the food labeled as gluten-free.