Protein homology of Bet v 1.
Bet v 1 is a highly immunogenic protein, which is the main cause of sensitivity to birch pollen and is described as the main birch allergen. Despite the structural similarity, Bet v 1 homologs show different properties and immunoreactivity. Here, the bioinformatic algorithms were applied for known Bet v 1 homologous nucleic acids sequences to find homology and conserved regions. Genomic sequences of PR proteins of two different fruit species, which allergens belong to PR proteins of the same type as Bet v 1, were selected to design degenerate primers. Subsequently, screening of the presence of Bet v 1 conserved genomic sequence was performed in 45 clinically relevant plant species.
- genomic sequences
- Bet v 1
- conserved region
- degenerate primers
- PCR screening
Genomic knowledge about major birch pollen allergen is very well known for quite a long time. In the last 30 years, many of different homologs for Bet v 1 have been cloned, and many of their products were characterized from the allergenic point of view. Molecular profiling of allergic sensitization has helped to elucidate the immunological connections of allergen cross-reactivity, whereas advances in biochemistry have revealed structural and functional aspects of allergenic proteins in the last decades . Bet v 1 has been identified as existing in three subfamilies, based on the sequence similarity. The most precise identification is actually done for major birch allergen Bet v 1 that was firstly identified in Betulla verrucosa . Bet v 1 is reported in vascular plants as common ones. The first class, pathogenesis-related protein family (PR-10) is expressionally connected to the pathogen attack or abiotic stress. Highest concentrations of PR-10 proteins were found in reproductive tissues (pollen, seeds and fruits)  and were described with a high level of similarity with the human lipocalin 2. Birch Bet v 1 and human lipocalin 2 possess specific structures that allowed them to bind iron. Bet v 1 turns to in the situation when it is not binding iron. This subsequently affects Th2 cells of the human immune system . The ribonuclease activity of PR proteins is known to be activated under the function in antiviral pathway . The other subfamilies of Bet v 1 allergens are reported as major latex proteins and ripening-related proteins in the latex of opium poppy [6, 7]. The last one is reported to be proteins containing members with S-norcoclaurine synthase activity and is involved in alkaloid biosynthesis .
Bet v 1 belongs to panallergens, specifically to PR-10 proteins. Location of Bet v 1 for IgE recognition is the result of the protein chain composition, coming in close proximity of molecules that are spaced apart from the stretched chain. This conformation is disrupted by heat that is why Bet v 1 is defined as thermolabile and it became nonallergenic by cooking or heat processing of fruit. Different denaturation temperatures of the Bet v 1 allergen exist for different individual isolated homologs and their isoforms. A pH value and other thermodynamic and physicochemical properties have a denaturing effect beside the temperature alone . In general, all the PR-10 proteins are labile proteins when comparing them to most of other food allergens . The naturally occurring Bet in 1 consists of several isoforms with a molecular weight of about 17.5 kDa. These isoforms share a high percentage of the same sequences but may have a very different allergenic potential [11, 12]. There are currently more than 20 isoforms found on the IUIS Allergen Nomenclature subcommittee website (
Bet v 1 homology is referred to as birch-fruit-vegetables-nuts syndrome. The most common are plants from the family Rosaceae and Apiaceae. Similarities in amino acid sequences were found in different plants and foods [20, 21] but a fruit similarity prevails (Table 1). Most often, allergens are located in fruit pulp. With respect to homology to the main birch allergen Bet v 1, it can be noted that in areas where the incidence of birch is not quite typical, for example, in southern Europe, sensitivity to Bet in 1 homologs occurs in trees that are similar to alder, hazel, beech and grass allergens . Pomegranate, edible chestnuts, raspberries, spices may also be mentioned. Hrubiško et al.  mentioned the cross-reactivity of birch pollen with walnut, almonds, avocados, cherry, plum, peas or asparagus.
|Plant||Bet v 1 homolog||Protein similarity|
|Apple||Mal d 1||56–63%|
|Hazelnut||Cor a 1||67%|
|Peach||Pru p 1||70–73%|
|Kiwi||Act d 8||53%|
|Carrot||Dau c 1||37%|
|Apricot||Pru ar 1||56%|
|Cherry||Pru av. 1||59–70%|
|Pear||Pyr c 1||57%|
|Peanuts||Ara h 8||46%|
|Celery||Api g 1||41%|
|Soy||Gly m 4||45%|
|Strawberries||Fra a 1||53%|
|Raspberries||Rub I 1||55%|
Silver Birch is native in most of Europe, northwest Africa and western Siberia, but absent in the southern parts of Europe. It is the most common tree found in Scandinavia and the Alps and a potent pollen producer in those areas. In all of those areas, birch is the most relevant spring pollen allergen relevant during the period from March to May (Figure 2).
Atmospheric concentrations of birch pollen grains were monitored  and the matched major birch pollen allergen Bet v 1 simultaneously across Europe. The major birch pollen allergen Bet v 1 was determined with an allergen-specific ELISA. The average European allergen release from birch pollen was 3.2 pg. Bet v 1/pollen and the average allergen release in 2009 did not differ substantially between countries. However, a>10-fold difference between daily allergen releases per pollen was detected in all countries. Results of aeropalynological observations in Kiev were reported  to be carried out with a gravimetric method. The most abundant pollen types were as follows: Betulaceae (21%), Chenopodiaceae (10%), Ambrosia (10%), Artemisia (9%) Pinaceae (8%) and Poaceae (6%).
A real-time PCR method based on SYBR GREEN technology was developed to analyze the different Bet v 1 expression level . The expression of Betv1 allergen gene was analyzed upon various growth places around Kiev of tested birch pollen samples. Sample from forest growth condition was chosen as a calibrator for expression analyses. qRT-PCR showed a variation in the abundance of allergen transcripts among the samples from different places of growth (Figure 3). In samples from urbanized area was the expression of Betv1 allergen in average 1.5× higher (ranged from 0.77 up to the 2× higher) when comparing to the forest sample served as a calibrator. In samples from borders of the urbanized area was the expression of Betv1 allergen only 0.55× higher when comparing to the forest sample.
These findings are interesting when comparing them with those findings  that reported that extracts from pollen collected in urban areas had higher chemotactic activity on human neutrophils compared to pollen from rural sites, although the allergen content remained unchanged. Questions about the exact correlations between the expression level and allergenic potential need are to be answered in further research.
Actually, different primary genomic data are available for Bet v 1 isoforms originated from birch (Table 2) and only limited information exist about its transcriptomic characteristics.
|Bet v 1variant||GenBank accession||Bet v 1variant||GenBank accession|
|Bet v 1.0101||X15877; Z80098; Z80099||Bet v 1.0115||Z72438.1|
|Bet v 1.0102||X77266; X77270||Bet v 1.0116||AJ001555.1|
|Bet v 1.0103||X77267||Bet v 1.0117||AJ006908.1|
|Bet v 1.0104||X77268; X77274||Bet v 1.0118||AJ006914.1|
|Bet v 1.0105||X77269||Bet v 1.0119||DQ296603.1|
|Bet v 1.0106||X77271||Bet v 1.0201||X77200|
|Bet v 1.0107||X77273||Bet v 1.0202||X77265|
|Bet v 1.0108||Z80100||Bet v 1.0203||X77272|
|Bet v 1.0109||Z80101||Bet v 1.0204||X81972; X82028|
|Bet v 1.0110||Z80102||Bet v 1.0205||Z72431.1|
|Bet v 1.0111||Z80103||Bet v 1.0206||AJ001556.1|
|Bet v 1.0112||Z80104||Bet v 1.0207||EU526193.1|
|Bet v 1.0113||Z80105||Bet v 1.0301||X77601.1|
|Bet v 1.0114||Z80106||—||—|
Beside the Bet v 1 – basic allergen component of birch pollen pelvis, minor components exist as well and some of them are clinically relevant too. Allergens of molecular weights of 29.5, 17, 12.5 and 13 kDa had been isolated form birch pollen. The following allergens have been characterized (except of Bet v 1): Bet v 2, a 15 kDa, a profiling; Bet v 3, a 24 kDa calcium-binding protein; Bet v 4, a 9 kDa calcium-binding protein; Bet v 5, a 35 kDa isoflavone reductase-related protein; Bet v 6, a 30–35 kDa protein, phenylcoumaran benzylic ether reductase; Bet v 7, a 18 kDa protein, a cyclophilin and Bet v 11 (
2. Bet v 1 genomics and in silico analysis
From a theoretical point of view, in nature, a protein similarity or analogy to the protein antigen exists to any not only in the plant but also in the animal kingdom. Evidence for this is antigens, particularly those with allergenic potential. From a practical point of view, there is such a similarity for almost every protein and is called homology. Proteins that are similar are referred as the protein family/superfamily. There are a huge number of protein families, many of which have been confirmed to be with allergenic activity . In homology, it is the result of a common evolutionary origin. Homologous genes can be characterized as two or more genes derived from a common original DNA sequence . When identifying genes in the model species and related species, it is often important to distinguish genes mutually linked directly by the species and genes that have been duplicated independently from them. These are two types of homologous genes, orthologs and paralogs with many definitions of them. A status where homology is the result of gene duplication, that is, the two copies remain side by side during the body’s past (e.g., alpha and beta hemoglobin), that is why, the genes are called paralogs (para = parallel, analogous, concurrent). In a situation where homology is the result of speciation, that is, the process of generating species, and the past of the gene reflects the past of the species (e.g., alpha hemoglobin in both humans and mice), it should be about orthologs (ortho = exact). Orthologs are genes that are associated with a common origin, genes of different species that have evolved from a common ancestral gene, are called “true” homologs. These genes tend to maintain the same function as the gene they developed from during development process. The identification of orthologs is crucial for a reliable prognosis of gene function in novel genes. Paralogs are genes associated with duplication in the genome. They develop new features even when they are associated with the original function. They deviate from each other within the species. Unlike orthologs, the paralogs gene is a new gene that has a new function. During gene duplication, one copy of the gene is mutated to produce a new gene with a new function, although the function often relates to the role of the generic gene [27, 28, 29]. Paralogs may result from different types of gene duplication, unequal crossing-over, transposon-mediated duplication or polyploidy, that is, increase in the number of chromosomes in the cell nucleus above the normal diploid state . In the case of the molecular systematics of organisms, it is desirable that the studied sequences are homologously specific - they are called orthologous .
Orthologs exist in genomes in a single copy that performs the same function in all organisms examined. The series of evolutionarily conserved genes are paralogs during the evolution; they were done with one or more duplications, followed by the separation of the structure and the functions to the loss of some copies. In some organisms (e.g., in higher plants), the determination of orthologs and paralogs is problematic, their genomes have undergone a series of gene duplications and loss of individual copies of genes. Gene duplication is understood as the source of new genes with new features, but it is not always a fundamental transformation of gene function. Duplicated genes often retain a certain degree of functional overlap that is in certain conditions can be manifested as redundancy.
Different Bet v 1 isoforms are relevant as to be naturally existed—a, b, c, d, e, f, g, j and l. When describing the process of the induction of type I allergy, they differ in reaction mechanism with the IgE from patients, and it is reported in  that comparison of in vitro and in vivo IgE binding activity is influenced by the six amino acid residues at different positions of the Bet v 1 molecule. Betula verrucosa (pendula) Bet v 1 is well known on the nucleotide level, too and 47 isoallergens sequences are stored in the NCBI database for its mRNA with the different level of their sequence identities. Dendrogram of phylogenetic similarity of the Bet v 1 isoallergens sequences with the gene coding Betv1 (X15877.1) is illustrated in Figure 4.
3. Technical approaches and methodologies for PCR screening of Bet v 1 sequences in plants
Bioinformatics provides an interdisciplinary tool, that is used to manage and analyze biological data and known sequences of nucleic acids. Many features of nucleic acids can be used in bioinformatic algorithms as motifs for description of their genomic variability and their better understanding. Individual sequence motifs are recognized by their order and nucleotide preference, and many motif discovery algorithms have been used in different molecular or bioinformatic studies [31, 32, 33, 34].
Here, the bioinformatic algorithms were applied for known Bet v 1 homologous sequences what makes them suitable for applying bioinformatic tools such as BLAST  to find homology or conserved regions. The first step was to align the individual isoforms and their variants with each other. First, isoforms that exist in two or three variants in the database were compared to each other, namely Bet v 1.0101, Bet v 1.0102, Bet at 1.0104 and Bet at 1.0204. Table 3 shows results of the sequence alignment of the variants of the Bet isoform at 1.0101. All three isoforms are linear mRNAs, differing only in the number of base pairs. Records Z80099.1 and Z80098.1 have the same number of base pairs. Their overlaps and query cover are up to 100% and the identity 99%.
|Name, type of nucleic acid, number of nt||Accession||Query cover||Identity|
|Birch mRNA for pollen allergen BetvI,|
mRNA linear, 691 bp
|B. verrucosa mRNA for pollen allergen Betv1 (clone 224), mRNA linear, 483 bp||Z80098.1||Z80099.1||Z80099.1|
|B. verrucosa mRNA for pollen allergen Betv1 (clone 2230), mRNA linear, 483 bp||Z80099.1||x||x|
Bet v 1.0102 can be found in the NCBI under the names of B. verrucosa Bet v 1d mRNA (mRNA linear and 677 bp) and B. verrucosa Bet in 1 h mRNA (also linear mRNA with 677 bp). They possess a 100% query cover and 99% identity using when using the megablast algorithm. Similar, 100% query cover and 99% identity exist for of the Bet v 1.0104 (B. verrucosa Bet v 1f mRNA and B. verrucosa Bet in 1i mRNA, both with 572 base pairs). Both searches for Bet v 1.0204 in NCBI are mRNA linear, B. verrucosa mRNA for the Bet v 1 m isoform has 687 bp, unlike B. verrucosa mRNA for Bet v1n, isoform of birch pollen allergen with 737 bp. Their overlap is 91% with 99% match. After a previous comparison, the consistency of the individual sequences can be as very high, so the variants of the isoforms with the highest number of base pairs were used in the next part of the biological analysis.
Using the BLAST algorithm, individual isoforms corresponding to genomic DNA or mRNA sequences were aligned to each other. The following isoforms are DNA sequences: Bet v 1.0115, Bet v 1.0116, Bet at 1.0119, Bet at 1.0205, Bet v 1.0206, Bet at 1.0207. These accessions have a different number of base pairs. An exception from the point of view of the source exist - Bet v 1.0207 (EU526193.1), with the source organism Betula lenta, bust stiff. The rest of the aligned sequences have the source organism Betula pendula (syn. B. verrucosa). As isoforms of one allergen, they are very similar to each other (Table 4).
|NCBI accession||Query cover % / Identity %|
|Bet v 1.0115 (Z72438.1)||100%|
|Bet v 1.0116 (AJ001555.1)||99%|
|Bet v 1.0119 (Q296603.1)||94%|
|Bet v 1.0205 (Z72431.1)||100%|
|Bet v 1.0206 (AJ001556.1)||99%|
|Bet v 1.0207 (EU526193.1)||94%|
Number of nucleotide differences among Bet v 1 isoforms for the conserved part based on the NCBI data are summarized in Figure 5.
The aim for the design of degenerate primers and their subsequent application in the analysis is the basic description and molecular classifications of allergens; finding of correlations between sequence and structural similarities and cross-reactivity between homologous allergens. Genomic knowledge of allergens also helps to define their common properties and will be helping to clear possible factors that cause allergenic potential in the future . Basis necessity in degenerate primer designing is an alignment of selected nucleotide sequences . Here, Bet v 1 was used as a model to analyze functionality of degenerate primers in clinically relevant cross species screening of genomic sequences of allergens (Table 5).
|High rate cross-reactions||Low rate cross-reactions||Supposed cross-reactions|
|Species||Genomic data||Species||Genomic data||Species||Genomic data|
|Corylus avellana||DNA/mRNA||Carpinus betulus||mRNA||Ulmus spp.||N/A|
|—||—||Fraxinus excelsior||mRNA||Artemisia absinthium||mRNA|
|—||—||Fagus sylvatica||mRNA||Secale cereale||mRNA|
|—||—||Quercus robur||N/A||Triticum aestivum||N/A|
|Malus domestica||DNA/mRNA||Prunus armeniaca||DNA/mRNA||Litchi chinensis||N/A|
|Prunus avium||mRNA||Prunus domestica||mRNA||Mangifera indica||mRNA|
|Prunus persica||mRNA*||Pyrus communis||mRNA||Citrus sinensis||mRNA*|
|Prunus persica v. nucipersica||N/A||Actinidia chinensis||mRNA||Castanea sativa||DNA/mRNA|
|Apium graveolens||mRNA||—||—||Capsicum annum||mRNA*|
|Daucus carota||mRNA||—||—||Spinacia oleracea||mRNA*|
|Petroselinum crispum||DNA/RNA||—||—||Pastinaca sativa||N/A|
|Corylus avelana||DNA/mRNA||Arachis hypogaea||DNA/mRNA||—||—|
|Juglans regia||mRNA*||Prunus dulcis||DNA/mRNA||—||—|
|—||Foeniculum vulgare||N/A||Matricaria recutita||N/A|
|—||—||Coriandrum sativum||N/A||Black pepper||N/A|
|Solanum tuberosum||mRNA*||—||—||Glycine max||mRNA*|
Bet v 1 is standardly used as a model pollen protein PR-10 allergen in different types of research aims . Genomic sequences of PR proteins of two different fruit species which allergens belong to PR proteins of the same type as Bet v 1 were selected to design degenerate primers and to find conserved sequence, that is, the base sequence of the DNA molecule that remained essentially unchanged and thus maintained during the development . Malus domestica was used in the selection as a typical fruit caused cross-allergy and Vitis vinifera was used in the selection as a species with an allergenic protein homology but without a high clinical relevance. Based on the alignment analysis reported above, Bet v 1 promoter exons (Table 6) were used in the in silico analysis.
|Accession number in NCBI||Description||Type of nucleic acid||Length|
|AJ289770.1||Betula pendula ypr10b gene, promoter region and exons 1–2.||DNA linear||2687 bp|
|AJ291705.2||Vitis vinifera PR10.1 gene for class 10 pathogenesis-related protein.||DNA linear||1235 bp|
|AF020542.1||Malus domestica major allergen Mal d 1 gene, complete cds.||DNA linear||2253 bp|
These sequences were aligned by BLAST as conservative sequences can be identified by homologous searching using this too . Specifically, blastn was used for inter-species comparisons with the result showed in Figure 6 where alignment of sequences AJ289771.1, AJ291705.2, AF020542.1 to sequence AJ289770.1 can be seen.
The design of degenerate primers for optimal PCR amplification should be based on a conserved region with a length of approximately 200–500 base pairs  what is the length that was positively identified in the screened Bet v 1 homologs. Degenerate primer is a mixture of oligonucleotide sequences, each of which has a slightly different sequence, that is, there are several probable bases in it. This extends the range of sequences that can be amplified. This is a sequence of approximately 20–25 bp in length, but the forward and reverse primers must be sufficiently distant from each other, it is another characteristic that was identified positively in the aligned sequences. Based on the obtained results, degenerate primers were designed in this region (Figure 7) that provide an amplicon with the length of approximately 365 bp.
Plant material of clinically relevant Bet v 1 (high rate and low rate cross-reactions) cross-reactive plant species and spices were used for the PCR screening analysis. Birch DNA was used as a positive control in the analysis. Total genomic DNA was extracted following the instructions of GeneJET Plant Genomic DNA Purification Mini Kit (Thermo Scientific) or NucleoSpin® Food (Macherey-Nagel). Nanodrop Nanophotometer™ was used for quantity and quality analysis of the extracted DNA. PCR amplifications were performed in a Bio-Rad C1000™ Thermocycler with the following program: an initial denaturation step at 95°C for 5 min followed by 40 cycles at 95°C for 45 s, 54°C for 45 s, and 72°C for 35 s with a final cycle at 72°C for 10 min. The amplified products were inspected by electrophoresis in 1.5% agarose in a 1 × TBE buffer, visualized after GelRed™ staining and photographed under UV light.
Using the degenerate primer pair that was designed on the basis of identified conservative region of Bet v 1 sequences, in all of the screened plant species, PCR was positive with the exception of two samples—curry and black pepper spice (Table 7). Here, in the case of curry, only a very weak amplicon is visible in the agarose gel (Figure 8), that is why it can be supposed, that a further optimization of degenerate PCR will give a positive result, too. In the case of black pepper spice, using an alternative DNA extraction protocols should be tested further.
|High rate cross-reactions||Low rate cross-reactions||Supposed cross-reactions|
|Species||Bet v 1 PCR||Species||Bet v 1 PCR||Species||Bet v 1 PCR|
|M. domestica||+||C. betulus||+||Ulmus spp.||+|
|P. avium||+||F. excelsior||+||A. absinthium||+|
|P. persica||+||F. sylvatica||+||S. cereale||+|
|P.p. v. nucipersica||+||Q. robur||+||T. aestivum||+|
|A. graveolens||+||P. armeniaca||+||P. pratense||+|
|D. carota||+||P. domestica||+||L.perenne||+|
|P. crispum||+||P. communis||+||L. chinensis||N/A|
|C. avelana||+||A. chinensis||+||M. indica||+|
|J. regia||+||Musa spp.||+||C. sinensis||+|
|S. tuberosum||+||A. hypogaea||+||C. sativa||+|
|P. dulcis||+||C. annum||+|
|F. vulgare||+||S. oleracea||+|
|C. carvi||+||P. sativa||+|
|P. anisum||+||B. napus||+|
|C. sativum||+||C. pepo||+|
Sequence homology search algorithms became commonly used and efficient tools in molecular genetics [39, 40]. Nowadays, a number of different motifs finding algorithms are available and reported them to be impossible to provide a comprehensive report of all of them. Each algorithm has its own advantages and disadvantages. One of the aims of different patterns discovery is finding of specific motifs in nucleotide or protein sequences for the purpose of better understanding of their structure and function  or for their identification . Describing the existing polymorphism is relevant for allergens not only toward its static description, but moreover toward its biological and clinical implication. Different changes in allergen expression are reported for pollution or abiotic stress responses [26, 43, 44]. Very specific knowledge is obtained in the field of the variability of allergenic molecules with respect to the genetic origin of the allergens for different plant species, such as olive, palm date or apple [45, 46, 47, 48]. In birch, 13 Bet v 1 putative alleles have been characterized and their occurrence in different cultivars is a matter for future study . Allergens identification has become an integral part of the characteristics of many foodstuffs. The research in this area is important not only from the scientific point of view, but also from the view of impact's to the health as there is an increasing number of people suffering from allergies.
A variety of allergens from different fruits were identified based on experimental immunology and molecular biology, that is, by sequencing, leading to gene and protein identification. Whereas allergens are typically described in certain plant species, each of them has a high degree of sequence identity to other proteins in their groups. Among the different fruit allergens, the pathogenesis-related (PR) proteins, classified into 17 families based on sequence, diverse structure, function and biological activity, and they are produced in response to different biotic and abiotic stresses. Allergens of individual plant food sources are very well described and structural details are known as well as the interaction with the immune system of patients. But at the level of regulation and expression of the genes themselves in plants, our knowledge is very limited for the known allergenic proteins. Basic genomic and transcriptomic analysis of the allergens will help to understand their natural genomic background in individual plant varieties and will lead to better personal allergy management in the future.
This work was supported by the grants KEGA 007SPU-4/2017 and by The Danube strategy project DS-2016-0051.
Conflict of interest
Authors declare no conflict of interest.
The authors would like to thank Ing. Beáta Kováčová for her technical assistance and to colleagues from Department of Biochemistry and Biotechnology, namely Dr. Martin Vivodík for providing DNA of rye and buckwheat for this analysis.