Application of Genomic Data for PCR Screening of Bet v 1 Conserved Sequence in Clinically Relevant Plant Species

Bet v 1 is a highly immunogenic protein, which is the main cause of sensitivity to birch pollen and is described as the main birch allergen. Despite the structural similarity, Bet v 1 homologs show different properties and immunoreactivity. Here, the bioinformatic algorithms were applied for known Bet v 1 homologous nucleic acids sequences to find homology and conserved regions. Genomic sequences of PR proteins of two different fruit species, which allergens belong to PR proteins of the same type as Bet v 1, were selected to design degenerate primers. Subsequently, screening of the presence of Bet v 1 conserved genomic sequence was performed in 45 clinically relevant plant species.


Introduction
Genomic knowledge about major birch pollen allergen is very well known for quite a long time. In the last 30 years, many of different homologs for Bet v 1 have been cloned, and many of their products were characterized from the allergenic point of view. Molecular profiling of allergic sensitization has helped to elucidate the immunological connections of allergen cross-reactivity, whereas advances in biochemistry have revealed structural and functional aspects of allergenic proteins in the last decades [1]. Bet v 1 has been identified as existing in three subfamilies, based on the sequence similarity. The most precise identification is actually done for major birch allergen Bet v 1 that was firstly identified in Betulla verrucosa [2]. Bet v 1 is reported in vascular plants as common ones. The first class, pathogenesisrelated protein family (PR-10) is expressionally connected to the pathogen attack or abiotic stress. Highest concentrations of PR-10 proteins were found in reproductive tissues (pollen, seeds and fruits) [3] and were described with a high level of similarity with the human lipocalin 2. Birch Bet v 1 and human lipocalin 2 possess specific structures that allowed them to bind iron. Bet v 1 turns to in the situation when it is not binding iron. This subsequently affects Th2 cells of the human immune system [4]. The ribonuclease activity of PR proteins is known to be activated under the function in antiviral pathway [5]. The other subfamilies of Application of Genomic Data for PCR Screening of Bet v 1 Conserved Sequence in Clinically… DOI: http://dx.doi.org /10.5772/intechopen.80312 Bet v 1 homology is referred to as birch-fruit-vegetables-nuts syndrome. The most common are plants from the family Rosaceae and Apiaceae. Similarities in amino acid sequences were found in different plants and foods [20,21] but a fruit similarity prevails ( Table 1). Most often, allergens are located in fruit pulp. With respect to homology to the main birch allergen Bet v 1, it can be noted that in areas where the incidence of birch is not quite typical, for example, in southern Europe, sensitivity to Bet in 1 homologs occurs in trees that are similar to alder, hazel, beech and grass allergens [22]. Pomegranate, edible chestnuts, raspberries, spices may also be mentioned. Hrubiško et al. [23] mentioned the crossreactivity of birch pollen with walnut, almonds, avocados, cherry, plum, peas or asparagus.
Silver Birch is native in most of Europe, northwest Africa and western Siberia, but absent in the southern parts of Europe. It is the most common tree found in Scandinavia and the Alps and a potent pollen producer in those areas. In all of those areas, birch is the most relevant spring pollen allergen relevant during the period from March to May (Figure 2).
Atmospheric concentrations of birch pollen grains were monitored [24] and the matched major birch pollen allergen Bet v 1 simultaneously across Europe. The major birch pollen allergen Bet v 1 was determined with an allergen-specific ELISA. The average European allergen release from birch pollen was 3.2 pg. Bet v 1/ pollen and the average allergen release in 2009 did not differ substantially between countries. However, a>10-fold difference between daily allergen releases per pollen was detected in all countries. Results of aeropalynological observations in Kiev were reported [25] to be carried out with a gravimetric method. The most abundant pollen types were as follows: Betulaceae (21%), Chenopodiaceae (10%), Ambrosia (10%), Artemisia (9%) Pinaceae (8%) and Poaceae (6%).
A real-time PCR method based on SYBR GREEN technology was developed to analyze the different Bet v 1 expression level [26]. The expression of Betv1 allergen gene was analyzed upon various growth places around Kiev of tested birch pollen samples. Sample from forest growth condition was chosen as a calibrator for Systems Biology expression analyses. qRT-PCR showed a variation in the abundance of allergen transcripts among the samples from different places of growth (Figure 3). In samples from urbanized area was the expression of Betv1 allergen in average 1.5× higher (ranged from 0.77 up to the 2× higher) when comparing to the forest sample served as a calibrator. In samples from borders of the urbanized area was the expression of Betv1 allergen only 0.55× higher when comparing to the forest sample. These findings are interesting when comparing them with those findings [24] that reported that extracts from pollen collected in urban areas had higher chemotactic activity on human neutrophils compared to pollen from rural sites, although the allergen content remained unchanged. Questions about the exact correlations between the expression level and allergenic potential need are to be answered in further research.
Actually, different primary genomic data are available for Bet v 1 isoforms originated from birch ( Table 2) and only limited information exist about its transcriptomic characteristics.
Beside the Bet v 1 -basic allergen component of birch pollen pelvis, minor components exist as well and some of them are clinically relevant too. Allergens of molecular weights of 29.5, 17, 12.5 and 13 kDa had been isolated form birch pollen. The following allergens have been characterized (except of Bet v 1): Bet v 2, a 15 kDa, a profiling; Bet v 3, a 24 kDa calcium-binding protein; Bet v 4, a 9 kDa calcium-binding protein; Bet v 5, a 35 kDa isoflavone reductase-related protein; Bet v 6, a 30-35 kDa protein, phenylcoumaran benzylic ether reductase; Bet v 7, a 18 kDa protein, a cyclophilin and Bet v 11 (www.phadia.com).

Bet v 1 genomics and in silico analysis
From a theoretical point of view, in nature, a protein similarity or analogy to the protein antigen exists to any not only in the plant but also in the animal kingdom. Evidence for this is antigens, particularly those with allergenic potential. From a practical point of view, there is such a similarity for almost every protein and is called homology. Proteins that are similar are referred as the protein family/ superfamily. There are a huge number of protein families, many of which have been confirmed to be with allergenic activity [18]. In homology, it is the result of a common evolutionary origin. Homologous genes can be characterized as two or more genes derived from a common original DNA sequence [27]. When identifying genes in the model species and related species, it is often important to distinguish genes mutually linked directly by the species and genes that have been duplicated independently from them. These are two types of homologous genes, orthologs and paralogs with many definitions of them. A status where homology is the result of gene duplication, that is, the two copies remain side by side during the body's past (e.g., alpha and beta hemoglobin), that is why, the genes are called paralogs (para = parallel, analogous, concurrent). In a situation where homology is the result of speciation, that is, the process of generating species, and the past of the gene reflects the past of the species (e.g., alpha hemoglobin in both humans and mice), it should be about orthologs (ortho = exact). Orthologs are genes that are associated with a common origin, genes of different species that have evolved from a common ancestral gene, are called "true" homologs. These genes tend to maintain the same function as the gene they developed from during development process. The identification of orthologs is crucial for a reliable prognosis of gene function in novel genes. Paralogs are genes associated with duplication in the genome. They develop new features even when they are associated with the original function. They deviate from each other within the species. Unlike orthologs, the paralogs gene is a new gene that has a new function. During gene duplication, one copy of the gene is  mutated to produce a new gene with a new function, although the function often relates to the role of the generic gene [27][28][29]. Paralogs may result from different types of gene duplication, unequal crossing-over, transposon-mediated duplication or polyploidy, that is, increase in the number of chromosomes in the cell nucleus above the normal diploid state [29]. In the case of the molecular systematics of organisms, it is desirable that the studied sequences are homologously specificthey are called orthologous [30]. Orthologs exist in genomes in a single copy that performs the same function in all organisms examined. The series of evolutionarily conserved genes are paralogs  during the evolution; they were done with one or more duplications, followed by the separation of the structure and the functions to the loss of some copies. In some organisms (e.g., in higher plants), the determination of orthologs and paralogs is problematic, their genomes have undergone a series of gene duplications and loss of individual copies of genes. Gene duplication is understood as the source of new genes with new features, but it is not always a fundamental transformation of gene function. Duplicated genes often retain a certain degree of functional overlap that is in certain conditions can be manifested as redundancy.
Different Bet v 1 isoforms are relevant as to be naturally existed-a, b, c, d, e, f, g, j and l. When describing the process of the induction of type I allergy, they differ in reaction mechanism with the IgE from patients, and it is reported in [7] that comparison of in vitro and in vivo IgE binding activity is influenced by the six amino acid residues at different positions of the Bet v 1 molecule. Betula verrucosa (pendula) Bet v 1 is well known on the nucleotide level, too and 47 isoallergens sequences are stored in the NCBI database for its mRNA with the different level of their sequence identities. Dendrogram of phylogenetic similarity of the Bet v 1 isoallergens sequences with the gene coding Betv1 (X15877.1) is illustrated in Figure 4.

Technical approaches and methodologies for PCR screening of Bet v 1 sequences in plants
Bioinformatics provides an interdisciplinary tool, that is used to manage and analyze biological data and known sequences of nucleic acids. Many features of nucleic acids can be used in bioinformatic algorithms as motifs for description of their genomic variability and their better understanding. Individual sequence motifs are recognized by their order and nucleotide preference, and many motif discovery algorithms have been used in different molecular or bioinformatic studies [31][32][33][34].
Here, the bioinformatic algorithms were applied for known Bet v 1 homologous sequences what makes them suitable for applying bioinformatic tools such as BLAST [35] to find homology or conserved regions. The first step was to align the individual isoforms and their variants with each other. First, isoforms that exist in two or three variants in the database were compared to each other, namely Bet v 1.0101, Bet v 1.0102, Bet at 1.0104 and Bet at 1.0204. Table 3 shows results of the sequence alignment of the variants of the Bet isoform at 1.0101. All three isoforms are linear mRNAs, differing only in the number of base pairs. Records Z80099.1 and Z80098.1 have the same number of base pairs. Their overlaps and query cover are up to 100% and the identity 99%.
Bet v 1.0102 can be found in the NCBI under the names of B. verrucosa Bet v 1d mRNA (mRNA linear and 677 bp) and B. verrucosa Bet in 1 h mRNA (also linear mRNA with 677 bp). They possess a 100% query cover and 99% identity using when using the megablast algorithm. Similar, 100% query cover and 99% identity exist for of the Bet  Table 4).
Number of nucleotide differences among Bet v 1 isoforms for the conserved part based on the NCBI data are summarized in Figure 5.
The aim for the design of degenerate primers and their subsequent application in the analysis is the basic description and molecular classifications of allergens; finding of correlations between sequence and structural similarities and cross-reactivity between homologous allergens. Genomic knowledge of allergens also helps to define their common properties and will be helping to clear possible factors that cause allergenic potential in the future [37]. Basis necessity in degenerate primer designing is an alignment of selected nucleotide sequences [38]. Here, Bet v 1 was used as a model to analyze functionality of degenerate primers in clinically relevant cross species screening of genomic sequences of allergens ( Table 5).
Bet v 1 is standardly used as a model pollen protein PR-10 allergen in different types of research aims [12]. Genomic sequences of PR proteins of two different fruit species which allergens belong to PR proteins of the same type as Bet v 1 were selected to design  degenerate primers and to find conserved sequence, that is, the base sequence of the DNA molecule that remained essentially unchanged and thus maintained during the development [39]. Malus domestica was used in the selection as a typical fruit caused cross-allergy and Vitis vinifera was used in the selection as a species with an allergenic protein homology but without a high clinical relevance. Based on the alignment analysis reported above, Bet v 1 promoter exons ( Table 6) were used in the in silico analysis.   These sequences were aligned by BLAST as conservative sequences can be identified by homologous searching using this too [39]. Specifically, blastn was used for inter-species comparisons with the result showed in Figure 6 where alignment of sequences AJ289771.1, AJ291705.2, AF020542.1 to sequence AJ289770.1 can be seen.
The design of degenerate primers for optimal PCR amplification should be based on a conserved region with a length of approximately 200-500 base pairs [38] what is the length that was positively identified in the screened Bet v 1 homologs. Degenerate primer is a mixture of oligonucleotide sequences, each of which has a slightly different sequence, that is, there are several probable bases in it. This extends the range of sequences that can be amplified. This is a sequence of approximately 20-25 bp in length, but the forward and reverse primers must be sufficiently distant from each other, it is another characteristic that was identified positively in the aligned sequences. Based on the obtained results, degenerate primers were designed in this region (Figure 7) that provide an amplicon with the length of approximately 365 bp.
Plant material of clinically relevant Bet v 1 (high rate and low rate cross-reactions) cross-reactive plant species and spices were used for the PCR screening analysis. Birch DNA was used as a positive control in the analysis. Total genomic DNA was extracted following the instructions of GeneJET Plant Genomic DNA Purification Mini Kit (Thermo Scientific) or NucleoSpin® Food (Macherey-Nagel). Nanodrop Nanophotometer™ was used for quantity and quality analysis of the extracted DNA. PCR amplifications were performed in a Bio-Rad C1000™ Thermocycler with the following program: an initial denaturation step at 95°C for 5 min followed by  Table 6. Genomic sequences used for the conserved region identification.  40 cycles at 95°C for 45 s, 54°C for 45 s, and 72°C for 35 s with a final cycle at 72°C for 10 min. The amplified products were inspected by electrophoresis in 1.5% agarose in a 1 × TBE buffer, visualized after GelRed™ staining and photographed under UV light. Using the degenerate primer pair that was designed on the basis of identified conservative region of Bet v 1 sequences, in all of the screened plant species, PCR was positive with the exception of two samples-curry and black pepper spice ( Table 7). Here, in the case of curry, only a very weak amplicon is visible in the agarose gel (Figure 8), that is why it can be supposed, that a further optimization of degenerate PCR will give a positive result, too. In the case of black pepper spice, using an alternative DNA extraction protocols should be tested further.
Sequence homology search algorithms became commonly used and efficient tools in molecular genetics [39,40]. Nowadays, a number of different motifs finding algorithms are available and reported them to be impossible to provide a comprehensive report of all of them. Each algorithm has its own advantages and disadvantages. One of the aims of different patterns discovery is finding of specific motifs in nucleotide or protein sequences for the purpose of better understanding of their structure and function [41] High rate cross-reactions Low rate cross-reactions Supposed cross-reactions