Primers used in semi-quantitative PCR.
Avocado native “Mexicano” (Persea americana var drymifolia) has been a really important species in agricultural and indigenous medicine. In the agricultural world, it has been the germplasm source for the generation of economically important cultivars like Hass and it is the main source of rootstocks for the world production of Hass avocado fruit. In spite of its importance, little is known about the molecular network of seed-fruit development. The aim of this work was to know the expressed genes (ESTs) during the early avocado native “Mexicano” seed development. Using total RNA we constructed cDNA libraries of fourth months seed development, sequencing, assembling and bioinformatic analysis was made. For validation, a semi-quantitative PCR experiments with the most abundant genes were made. About 5005 ESTs from the 5’ representing 1653 possible unigenes were isolated. After assembling process, we have 171 genes that are closely related to Nelumbo nucifera sequences. The transcriptome is dominating by one bHLH transcription factor, three metallothioneins and snakin, suggesting its main role in seed development. Until now, there are no molecular studies in avocado seed development.
- avocado fruit
- transcription factor
- metal homeostasis
- seed development
- gene expression
In principle, the frequency with which the sequence of a given gene is read in ESTs sequencing projects should reflect the relative abundance of the corresponding mRNA. This approach uses EST counts to infer the relative level of expression of a gene [17–19].
In this work, we report the analysis of an ESTs collection from immature avocado nativo mexicano seeds and the analysis of expression of bHLH transcription factor, metallothioneins (MTs) and snakin like more abundantly expre3ssed.
2. Materials and methods
2.1. Biological material
Seeds from avocado nativo mexicano fruits of three stages of development (1, 4 and 8 months) were excised from fruits and frozen immediately in liquid nitrogen until use. The materials were collected in the avocado Germplasm Bank of the Instituto Nacional de Investigaciones Forestales y Agropecuarias (INIFAP; Uruapan, Michoacán México).
2.2. cDNA library construction and sequencing by Sanger method
Total RNA from frozen seed tissue of 4 months of development was extracted using López-Gómez et al.’s protocol with some modifications . The cDNA complementary library was built from 1 μg of total RNA using SMART™ cDNA Library Construction Kit (Clontech). The obtained cDNA sequences were cloned into pTripIEx2 vector. Cleavage experiments were made using
2.3. Assembly and identification of full-length cDNA
Sanger sequences were assembled using default parameters of the 454 Newbler-Assembler v1.1.03.24 (454 Life Science, Branford, CT) using 16,526 generated by the University of Florida and the Washington University Genome Sequencing Center.
Unigene set was generated by combining all assembled contigs and non-assembled reads (singlets). The consensus sequences of the Unigenes were analyzed with EuGeneHom  to identify the Unigenes that contained the components of a complete cDNA (5′ UTR, ORF and 3′ UTR): http://genoweb.toulouse.inra.fr/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl
2.4. Functional annotation
Stand-alone BLAST software was obtained from the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov). The unigenes were compared by BLASTagainst nucleotides and proteins plant databases. The BLAST results from different databases were used for gene ontology (GO) mapping and annotation. Blast2Go software was used to perform GO functional classification.
2.5. Transcript characterization and homologous search
The transcripts were analyzed by UGENE V1.26.1 software for the identification of ORF, CDS and hypothetical protein sequence and physicochemical parameters, using the standard genetic code.
2.6. Alignment and phylogeny
Mega7.0.14 software was used for alignment (Clustal W algorithm with Blossom 62 matrix) and reconstruction of the phylogenetic pattern (Neighbor Joining model with JTT matrix-based model for distance computing and 1000 replicates as bootstrap test).
2.7. Semi-quantitative RT-PCR
The RNA from different seed stages (1, 4 and 8 months of development) digested with DNAse I amplification grade (Invitrogen) was used as a template in sqPCR reactions, and the synthesis of cDNAs was carried out with the First Strand cDNA Synthesis Kit (Thermo Scientific). In Veriti 96-well Thermal cycler (Applied Biosystems), in reactions of 10 μl (100 ng/μl cDNA,10X Reaction Buffer, 2 mM dNTPs, 50 mM MgCl2, 1 U Taq DNA polymerase), sqPCR primers (Table 1) were designed by Primer3 webtool (http://bioinfo.ut.ee/primer3-0.4.0/); the primers for bHLH transcription factor, three metallothioneins, snakin and ubiquitin as reference gene were selected. The amplification procedures were 95°C - 10min., 30 cycles (95°C - 45 s, annealing 30 s, amplification 72°C - 45 s), 72°C - 7 min. Gene expression ratio between selected gene and endogenous control was calculated using band intensity measured with GelAnalyzer 2010. The semi-quantitative PCR was performed with three repeats.
|GenBank Accession||Gene name||Primer sequence (5′–3′)||Tm (°C)|
3.1. Sequencing and assembly of
Persea americanavar. drymifolia seed transcriptome
A total of 5005 sequences was assembled and obtained 3328 sequences (171 contigs of 3222 sequences + 106 singletons); these sequences were imported into BLAST analysis. In contig assembly for the avocado seed ESTs, a total number of contigs was 171, derived from 3222 sequences and unique transcripts represented was 106 sequences. Contig length ranged from 84 to 1149 bp (Figure 1), the peak EST length was 84–195 bp with 478 sequences in range. The shortest sequence examined was 84 bp to known genes like GTP-binding nuclear protein.
3.2. Functional annotation
Functional interpretation is an important step in the analysis of transcriptomics which cannot be done without the availability of functional annotation. The most widespread and probably most extensive functional annotation schema for gene and protein sequences is the Gene Ontology (GO)  as standard in all public databases. Automatic functional annotation methods basically rely on sequence, structure, phylogenetic or co-expression relationships between known and novel sequences . A total of 277 uniESTs sequences were manually annotated for a closer understanding of gene expression in avocado seed. The annotation proceeds through three basic steps: homologs search, GO term mapping and actual annotation. At the first step, NCBI-BLASTX and BLASTN are typically used, and for this work, the e-value 1xE−5, cut-off:33 and the number of 20 retrieved BLAST hits are used. These uniESTs were classified into five functional categories, including antioxidative protection (677, 20.34%), transcription regulatory (1013, 30.44%), defense (507, 15.23%), cellular structure and organization (287, 8.62%) and unknown (844, 25.36%).
Gene ontology annotations and functional analyses of avocado seed transcriptome were carried out with automated software Blast2GO. These were assigned into three standard classifications: biological processes, molecular functions and cellular components, and summarized according to GO criteria. The majority GO annotation was for biological process (65.34%), cellular component (19.34%) and molecular function (15.33%). In addition, the organisms closely related to the genetic load on avocado seed were reviewed within the databases, and the majority of analyzed sequences (23.47%) (Figure 2) were closely related to
|arlgES1||Transcription factor bHLH96||942||1.35E-103||GO:0046983||123|
|arlgES153||Metallothionein type 3||195||4.52E-28||GO:0043167||123|
|arlgES154||Stress response nst1||696||1.68E-34||GO:0009507; GO:0010207||113|
|arlgES2||40S ribosomal S29||171||1.77E-32||GO:0003735; GO:0043167; GO:0005829; GO:0005840; GO:0006412||100|
|arlgES3||PREDICTED: uncharacterized protein LOC104611921||282||4.53E-37||N/A||96|
|arlgES4||Vesicle-associated membrane 726||579||5.71E-130||GO:0016192; GO:0005575||90|
|arlgES6||ras-related RIC2||525||2.61E-118||GO:0007165; GO:0043167; GO:0005622||78|
|arlgES8||Isocitrate dehydrogenase [NADP]||588||1.02E-132||GO:0044281; GO:0006091; GO:0016491; GO:0043167||74|
|arlgES156||PREDICTED: uncharacterized protein LOC103701850 isoform X3||552||2.83E-19||GO:0005739||66|
|arlgES157||Potassium transporter 12 isoform X1||366||2.97E-55||GO:0022857; GO:0009536||64|
|arlgES12||60S ribosomal L24||492||5.32E-63||GO:0003735; GO:0022618; GO:0005829; GO:0042254; GO:0005840; GO:0006412||58|
|arlgES13||60S ribosomal L7–2-like||738||1.65E-136||GO:0003735; GO:0005829; GO:0042254; GO:0005840; GO:0006412||57|
|arlgES15||Ethanolamine utilization eutQ||291||3.63E-61||N/A||54|
|arlgES158||Programmed cell death 4||636||1.00E-80||N/A||52|
|arlgES16||Type 2 metallothionein||243||5.56E-24||GO:0043167||51|
3.3. Avocado seed abundant genes and validation
The most abundant sequences match with metallothionein genes. This result suggests that metallothionein genes dominate the avocado seed transcriptome like avocado fruit . Metallothioneins (MTs) were discovered by Margoshes and Vallee as cadmium-bound proteins isolated from the cortex of the equine kidney. These proteins were named for the high sulfur content and metals they are able to bind; depending on the metal species, these may possess more than 20% of its nature of metal ions [27, 28]. Mammalian metallothioneins are 60 amino acid peptides with 20 Cys residues and a molecular mass of about 6–7 kDa. Mammalian MTs are capable of binding up to 7 divalent metal ions via mercaptide bonds (sulfur-metal) with the Cys residues. By convention, any peptide or protein that resembles several characteristics of mammalian metallothioneins can be classified as metallothionein . Plant metallothioneins have two (highly conserved) sequence similarity regions corresponding to the two Cys-rich terminal domains joined by a less conserved “spacer” (about 40 aa without Cys residues). In plants, the most distinctive feature is to have a large spacer, which differs from the MT of the animals in which the Cys-rich domains are separated by a short spacer of less than 10 amino acids which do not include aromatic residues. The distribution of Cys residues as well as the length of the spacer region served to classify more MT of plants into four types, namely group 1, 2, 3 and 4 .
From the analysis of these abundant transcripts, we founded the existence of three MT’s genes on avocado seed: PaMT2a, PaMT2b and PaMT3, which were registered in GenBank database with an accession code shown in Table 3. Characteristics predicted in silico for avocado metallothioneins, we have two sequences belonging to Methallotionein-2 superfamily due to the two highly conserved Cys-rich motifs and the long spacer in the middle of them. PaMT3 keeps the spacer but Cys-rich motifs are not so conserved grouping this as part of the third family of plant metallothioneins (MT3). Alignment shown that the most conserved amino acids are around the Cys residues for both MT2 and MT3 groups, which are associated with the “metal binding clusters” (Figures 4 and 5). The alignments performed identify the amino and terminal carboxyl regions as having the most conserved Cysteine sequences, which correspond to the metal binding clusters. The intermediate spacer of about 40 amino acids is much more variable but has no cysteines. In mammalian metallothioneins, this spacer is very small (8 amino acids) and has no aromatic amino acids. However, in the family 2 to which the plants belong, we can find conserved tyrosine residues in the spacer, as well as several less conserved phenylalanines present in the metallothionein sequences of types 2 and 3 (Figures 4 and 5). Seed metallothioneins correspond with the three reported: NnMT2a, NnMT2b and NnMT3 ; the avocado seed metallothioneins are closely related with
|Name||GenBank||Length (aa)||Weight (kDa)||Isoelectric Point||Cys Residues||% Cys|
Sequences of related proteins were obtained from Zhou et al.  and downloaded from NCBI-GenBank: BAD18376.1 GmMT1 (
Another abundant transcript is a messenger codified for a transcription factor type bHLH. The basic domain (bHLH) is a highly conserved amino acid motif that defines a group of transcription factors, which was initially described in animals and was soon discovered in all major eukaryotic lineages . Proteins containing a bHLH domain (referred to as bHLH proteins) are involved in a variety of regulatory processes; their functions include the regulation of neurogenesis, myogenesis and the development of the heart in animals [34, 35], control of phosphate uptake and glycolysis in yeast  or modulation of secondary metabolism pathways, epidermal differentiation and environmental responses in plants . The bHLH domain consists of two distinct segments composed by 50–60 amino acids, 10–15 mostly basic amino acids form the stretch (basic region) and approximately 40 amino acids form the two amphipathic helices separated by a loop (helix-loop-helix region). The analysis of the structure of bHLH proteins (yeast and mammalian) showed the basic region made in the DNA contact, while the two helices promote the formation of heterodimers between bHLH proteins . These bHLH transcription factors are generally classified into six major groups (FAs) based on their ability to bind to DNA [35, 38, 39]. Most bHLH proteins are classified into group A or B; in group A, it is expected to bind to E-box consensus sequences (CACCTG or CAGCTG), in group B, it is specifically bind to G-Box consensus sequences (CACGTG or CATGTTG) and in group C, bHLH proteins share a PAS domain and bind to the recognized sequences without a need a E-box (ACGTG or GCGTG) sequences. The E group includes bHLH proteins containing a conserved Pro or Gly residue at a key position within the basic region, preferably bind to sequences referred to as N-boxes (CACGCG or CACGAG), and further share an additional WRPW motif. Groups D and F represent particularly proteins which were separated in the basic region. Some group D proteins have been described as being unable to bind to DNA and could form heterodimers that function as negative regulators of bHLH binding to DNA . Group F includes so-called COE proteins. A phylogenetic study indicated that group A contained mammalian bHLH proteins and lacked bHLH plant proteins. The other groups had a mixture of different species and most of the bHLH proteins of plants belonged to group B [41, 42], It has been shown that the bHLH family of proteins in plants is monophyletic and subjected to significant radiation before the evolution of mosses; bHLH groups established in terrestrial plants during the first 400 million years were conserved during the later evolution of plants, although there were many duplications of genes. The transcription factors are very varied since it does not have many amino acids conserved throughout its sequence; nevertheless in the sites of union to the DNA like the case of the bHLH, the great majority of its amino acids is conserved within its main motive. Due to their propensity to form homodimers or heterodimers, bHLH proteins can participate in an extensive set of a combinatorial interactions leading to the regulation of multiple transcriptional programs. The development of fleshy fruits involves complex physiological and biochemical changes. Recent studies have described the involvement of bHLH proteins in the determination of plant organ size. The SPATULA protein was shown to control cotyledon, leaf and petal expansion by affecting cell proliferation in
However, since they do not have any additional information regarding the function and/or tissue in which the function is performed, we analyzed our sequence with bHLH sequences of which their function or organospecificity is known; two bHLH factors were chosen with these characteristics of
Multicellular organisms produce small cysteinerich antimicrobial peptides (AMPs) as an innate defense against pathogens. Native Mexican avocado seed abundantly express Snakin (PaSn) gene. These kind of AMPs were initially isolated from potato but were later found to be ubiquitous. Novel plant APs isolated include in
We identified a single cDNA sequence for snakin/GASA (gibberellic acid-stimulated), which contains a coding sequence of 318 bp and encodes a predicted 106 amino acid peptide. This molecule comprises a 26 amino acid signal peptide (residues 1–26), identified by SignalP (http://www.cbs.dtu.dk/services/SignalP/) and a 79 amino acid mature peptide (Figure 8). An amino acid alignment of avocado snakin with other similar APs (Figure 9) showed that PaSn has the longest sequence compared with the previously reported StSN1 and StSN2 genes from potato. In addition, PaSn has the 12 characteristic Cys residues of this type of AP. In addition to the highly conserved 12 Cys residues, the other motifs in the PaSn protein consist of residues that are mostly polar, non-polar and basic (Figure 9). From these analyses, we hypothesized that the Mexican avocado snakin gene could be involved in plant defense in a similar way to that of the StSN1 and StSN2 genes in potato . Until now, this is the first Snakin gene isolated from a seed.
3.4. Expression patterns of selected genes measured by sqPCR
Expression patterns of five genes from the seed library were studied by semi-quantitative PCR (Figure 10); bHLH transcription factor, three metallothioneins, antimicrobial peptide snakin and SUMO like reference gene during avocado seed development. These genes could be divided into three stages according to the time of growth of the seed in avocado fruit. The bHLH gene has an expression pattern comparable to the endogenous gene SUMO (Ubiquitin), suggesting a role throughout the formation and development of the avocado fruit seed possibly modulating the biogenesis of the seed or embryo; since from the first month of formation (E1) to ripening (E8), similar expression levels were present. For the Metallothionein gene group, PaMT3 presented a pattern of constant expression in the three stages of seed development used slightly above at the peak of expression compared to the endogenous gene but not for PaMT2a which has an initial level of expression low in first month of development, having its maximum expression peak in the stage of 4 months; PaMT2b has its maximum expression peak at the beginning of the fruit formation in the first month, decaying this by month 4 and recovering expression levels for ripening; it should be noted that metallothioneins have been directly involved within various roles within the functions. Some of them as carriers or facilitators of metal ions for processes of defense, synthesis or hydrolysis of reserve components to make them more bioavailable ; however, the authors do not reach an agreement to say that the different types or families of metallothioneins play a specific role. The snakin gene has a similar behavior throughout the development of the fruit emphasizing its role within the defense against pathogens or as the first barrier of protection or signaling of attack, making it therefore important to maintain the levels of expression throughout development and possibly after this for fruit protection (Figure 10). The expression patterns of the selected genes identified by sqPCR and the different expression patterns of avocado seed transcriptome suggested various roles of these genes in response to seed development and protection in avocado fruit.
In this work, we identified and characterized three novel metallothioneins and one transcription factor gene from avocado nativo mexicano seeds, which are expressed abundantly during seed development. This suggests that they can have a protagonic paper during seed development and probably form a network to protect the embryo for drought stress. More studies are necessary to elucidate the paper of these genes during avocado seed-fruit development.