Primers used in semi-quantitative PCR.
Avocado native “Mexicano” (Persea americana var drymifolia) has been a really important species in agricultural and indigenous medicine. In the agricultural world, it has been the germplasm source for the generation of economically important cultivars like Hass and it is the main source of rootstocks for the world production of Hass avocado fruit. In spite of its importance, little is known about the molecular network of seed-fruit development. The aim of this work was to know the expressed genes (ESTs) during the early avocado native “Mexicano” seed development. Using total RNA we constructed cDNA libraries of fourth months seed development, sequencing, assembling and bioinformatic analysis was made. For validation, a semi-quantitative PCR experiments with the most abundant genes were made. About 5005 ESTs from the 5’ representing 1653 possible unigenes were isolated. After assembling process, we have 171 genes that are closely related to Nelumbo nucifera sequences. The transcriptome is dominating by one bHLH transcription factor, three metallothioneins and snakin, suggesting its main role in seed development. Until now, there are no molecular studies in avocado seed development.
- avocado fruit
- transcription factor
- metal homeostasis
- seed development
- gene expression
Avocado (Persea americana Mill.) is an oleaginous fruit produced by a tree belonging to the magnoliid clade, a basal linage of flowering plants. It belongs to the large plant family of Lauraceae, with approximately 2500–3000 species [1, 2]. Avocado has been rapidly incorporated as a component of human diet in many countries . Due to low cost, vigor of seedling growth and easy propagation, most of the countries are still using seeds to produce rootstocks for grafted avocado trees despite their genetic variability . Several Mexican varieties are derived from seeds that are resistant to attack by Phytophthora cinnamomi [5, 6] and are adapted to the soil and environmental conditions of the region. The Mexican state of Michoacán is the primary avocado-producing region in the world, and all the rootstocks used for the commercial production of cultivar Hass are obtained from P. americana var. drymifolia (“nativo mexicano”) . The principal consume form is as fresh fruit, but is really important in cosmetic industry . Avocado plant has medicinal properties, including cancer prevention [9–11]. There is ethnopharmacological information on the use of avocado seeds for the treatment of health-related conditions, especially in America. Recent research has shown that the avocado seeds are rich in phenolic compounds and these maybe play a role in putative health effects . The avocado fruit is a berry of one carpel containing a single seed. This large and very conspicuous seed is made up of two fleshy cotyledons and a central attached plumule, hypocotyl and radicle, the whole surrounded by two papery seed coats closely adherent to each other. There is no endosperm left in the seed at maturity. The cotyledons are formed of indifferentiated parenchyma tissue interspersed with occasional idioblasts. Starch is the main storage material of the cotyledons and is present in great abundance . Despite its importance, avocado seed development remains uncharacterized. To date, little information is available regarding the molecular biology of the seed. Analysis of expressed sequence tags (ESTs) is a rapid and effective method to identify novel genes or to investigate gene expression in different tissues, organs and plants [14, 15]. Furthermore, EST libraries and databases could provide valuable resources for functional genomic studies .
In principle, the frequency with which the sequence of a given gene is read in ESTs sequencing projects should reflect the relative abundance of the corresponding mRNA. This approach uses EST counts to infer the relative level of expression of a gene [17–19].
In this work, we report the analysis of an ESTs collection from immature avocado nativo mexicano seeds and the analysis of expression of bHLH transcription factor, metallothioneins (MTs) and snakin like more abundantly expre3ssed.
2. Materials and methods
2.1. Biological material
Seeds from avocado nativo mexicano fruits of three stages of development (1, 4 and 8 months) were excised from fruits and frozen immediately in liquid nitrogen until use. The materials were collected in the avocado Germplasm Bank of the Instituto Nacional de Investigaciones Forestales y Agropecuarias (INIFAP; Uruapan, Michoacán México).
2.2. cDNA library construction and sequencing by Sanger method
Total RNA from frozen seed tissue of 4 months of development was extracted using López-Gómez et al.’s protocol with some modifications . The cDNA complementary library was built from 1 μg of total RNA using SMART™ cDNA Library Construction Kit (Clontech). The obtained cDNA sequences were cloned into pTripIEx2 vector. Cleavage experiments were made using E. coli BM25.8 cells to obtain the plasmid pTriplEx2. Sequencing reactions were performed using ABI PRISM BigDye Terminators v3.0 kit (Applied Biosystems), by 5′ end of plasmids extracted from random clones. The sequences obtained were filtered by quality using PHRED ; vector masked and trimming of poly A/T were performed using LUCY2 software  resulting in 5002 high-quality reads.
2.3. Assembly and identification of full-length cDNA
Sanger sequences were assembled using default parameters of the 454 Newbler-Assembler v1.1.03.24 (454 Life Science, Branford, CT) using 16,526 generated by the University of Florida and the Washington University Genome Sequencing Center.
Unigene set was generated by combining all assembled contigs and non-assembled reads (singlets). The consensus sequences of the Unigenes were analyzed with EuGeneHom  to identify the Unigenes that contained the components of a complete cDNA (5′ UTR, ORF and 3′ UTR):
2.4. Functional annotation
Stand-alone BLAST software was obtained from the National Center for Biotechnology Information (NCBI,
2.5. Transcript characterization and homologous search
The transcripts were analyzed by UGENE V1.26.1 software for the identification of ORF, CDS and hypothetical protein sequence and physicochemical parameters, using the standard genetic code.
2.6. Alignment and phylogeny
Mega7.0.14 software was used for alignment (Clustal W algorithm with Blossom 62 matrix) and reconstruction of the phylogenetic pattern (Neighbor Joining model with JTT matrix-based model for distance computing and 1000 replicates as bootstrap test).
2.7. Semi-quantitative RT-PCR
The RNA from different seed stages (1, 4 and 8 months of development) digested with DNAse I amplification grade (Invitrogen) was used as a template in sqPCR reactions, and the synthesis of cDNAs was carried out with the First Strand cDNA Synthesis Kit (Thermo Scientific). In Veriti 96-well Thermal cycler (Applied Biosystems), in reactions of 10 μl (100 ng/μl cDNA,10X Reaction Buffer, 2 mM dNTPs, 50 mM MgCl2, 1 U Taq DNA polymerase), sqPCR primers (Table 1) were designed by Primer3 webtool (
|GenBank Accession||Gene name||Primer sequence (5′–3′)||Tm (°C)|
3.1. Sequencing and assembly of Persea americana var. drymifolia seed transcriptome
A total of 5005 sequences was assembled and obtained 3328 sequences (171 contigs of 3222 sequences + 106 singletons); these sequences were imported into BLAST analysis. In contig assembly for the avocado seed ESTs, a total number of contigs was 171, derived from 3222 sequences and unique transcripts represented was 106 sequences. Contig length ranged from 84 to 1149 bp (Figure 1), the peak EST length was 84–195 bp with 478 sequences in range. The shortest sequence examined was 84 bp to known genes like GTP-binding nuclear protein.
3.2. Functional annotation
Functional interpretation is an important step in the analysis of transcriptomics which cannot be done without the availability of functional annotation. The most widespread and probably most extensive functional annotation schema for gene and protein sequences is the Gene Ontology (GO)  as standard in all public databases. Automatic functional annotation methods basically rely on sequence, structure, phylogenetic or co-expression relationships between known and novel sequences . A total of 277 uniESTs sequences were manually annotated for a closer understanding of gene expression in avocado seed. The annotation proceeds through three basic steps: homologs search, GO term mapping and actual annotation. At the first step, NCBI-BLASTX and BLASTN are typically used, and for this work, the e-value 1xE−5, cut-off:33 and the number of 20 retrieved BLAST hits are used. These uniESTs were classified into five functional categories, including antioxidative protection (677, 20.34%), transcription regulatory (1013, 30.44%), defense (507, 15.23%), cellular structure and organization (287, 8.62%) and unknown (844, 25.36%).
Gene ontology annotations and functional analyses of avocado seed transcriptome were carried out with automated software Blast2GO. These were assigned into three standard classifications: biological processes, molecular functions and cellular components, and summarized according to GO criteria. The majority GO annotation was for biological process (65.34%), cellular component (19.34%) and molecular function (15.33%). In addition, the organisms closely related to the genetic load on avocado seed were reviewed within the databases, and the majority of analyzed sequences (23.47%) (Figure 2) were closely related to Nelumbo nucifera sequences (Figure 3). The 20 most abundant uniESTs and their annotations are shown in Table 2.
|arlgES1||Transcription factor bHLH96||942||1.35E-103||GO:0046983||123|
|arlgES153||Metallothionein type 3||195||4.52E-28||GO:0043167||123|
|arlgES154||Stress response nst1||696||1.68E-34||GO:0009507; GO:0010207||113|
|arlgES2||40S ribosomal S29||171||1.77E-32||GO:0003735; GO:0043167; GO:0005829; GO:0005840; GO:0006412||100|
|arlgES3||PREDICTED: uncharacterized protein LOC104611921||282||4.53E-37||N/A||96|
|arlgES4||Vesicle-associated membrane 726||579||5.71E-130||GO:0016192; GO:0005575||90|
|arlgES6||ras-related RIC2||525||2.61E-118||GO:0007165; GO:0043167; GO:0005622||78|
|arlgES8||Isocitrate dehydrogenase [NADP]||588||1.02E-132||GO:0044281; GO:0006091; GO:0016491; GO:0043167||74|
|arlgES156||PREDICTED: uncharacterized protein LOC103701850 isoform X3||552||2.83E-19||GO:0005739||66|
|arlgES157||Potassium transporter 12 isoform X1||366||2.97E-55||GO:0022857; GO:0009536||64|
|arlgES12||60S ribosomal L24||492||5.32E-63||GO:0003735; GO:0022618; GO:0005829; GO:0042254; GO:0005840; GO:0006412||58|
|arlgES13||60S ribosomal L7–2-like||738||1.65E-136||GO:0003735; GO:0005829; GO:0042254; GO:0005840; GO:0006412||57|
|arlgES15||Ethanolamine utilization eutQ||291||3.63E-61||N/A||54|
|arlgES158||Programmed cell death 4||636||1.00E-80||N/A||52|
|arlgES16||Type 2 metallothionein||243||5.56E-24||GO:0043167||51|
3.3. Avocado seed abundant genes and validation
The most abundant sequences match with metallothionein genes. This result suggests that metallothionein genes dominate the avocado seed transcriptome like avocado fruit . Metallothioneins (MTs) were discovered by Margoshes and Vallee as cadmium-bound proteins isolated from the cortex of the equine kidney. These proteins were named for the high sulfur content and metals they are able to bind; depending on the metal species, these may possess more than 20% of its nature of metal ions [27, 28]. Mammalian metallothioneins are 60 amino acid peptides with 20 Cys residues and a molecular mass of about 6–7 kDa. Mammalian MTs are capable of binding up to 7 divalent metal ions via mercaptide bonds (sulfur-metal) with the Cys residues. By convention, any peptide or protein that resembles several characteristics of mammalian metallothioneins can be classified as metallothionein . Plant metallothioneins have two (highly conserved) sequence similarity regions corresponding to the two Cys-rich terminal domains joined by a less conserved “spacer” (about 40 aa without Cys residues). In plants, the most distinctive feature is to have a large spacer, which differs from the MT of the animals in which the Cys-rich domains are separated by a short spacer of less than 10 amino acids which do not include aromatic residues. The distribution of Cys residues as well as the length of the spacer region served to classify more MT of plants into four types, namely group 1, 2, 3 and 4 .
From the analysis of these abundant transcripts, we founded the existence of three MT’s genes on avocado seed: PaMT2a, PaMT2b and PaMT3, which were registered in GenBank database with an accession code shown in Table 3. Characteristics predicted in silico for avocado metallothioneins, we have two sequences belonging to Methallotionein-2 superfamily due to the two highly conserved Cys-rich motifs and the long spacer in the middle of them. PaMT3 keeps the spacer but Cys-rich motifs are not so conserved grouping this as part of the third family of plant metallothioneins (MT3). Alignment shown that the most conserved amino acids are around the Cys residues for both MT2 and MT3 groups, which are associated with the “metal binding clusters” (Figures 4 and 5). The alignments performed identify the amino and terminal carboxyl regions as having the most conserved Cysteine sequences, which correspond to the metal binding clusters. The intermediate spacer of about 40 amino acids is much more variable but has no cysteines. In mammalian metallothioneins, this spacer is very small (8 amino acids) and has no aromatic amino acids. However, in the family 2 to which the plants belong, we can find conserved tyrosine residues in the spacer, as well as several less conserved phenylalanines present in the metallothionein sequences of types 2 and 3 (Figures 4 and 5). Seed metallothioneins correspond with the three reported: NnMT2a, NnMT2b and NnMT3 ; the avocado seed metallothioneins are closely related with Nelumbo nucifera metallothioneins like 79% for NnMT2a-PaMT2a; 66% NnMT2b-PaMT2b and 71% for NnMT3-PaMT3 Figure 6. NnMT2a and NnMT3 were associated with processes of seed germination, tolerance to accelerated aging and salt on Arabidopsis . However, avocado seeds have a low tolerance to aging. Basic function of MTs is metal homeostasis and has been reported during biotic and abiotic stress conditions too . Recently, it has been reported that one metallothionein interacts with a cytoskeleton protein in the nucleus of rice cells in response to salt stress .
|Name||GenBank||Length (aa)||Weight (kDa)||Isoelectric Point||Cys Residues||% Cys|
Sequences of related proteins were obtained from Zhou et al.  and downloaded from NCBI-GenBank: BAD18376.1 GmMT1 (Glycine max); BAD18378.1 VaMT1 (Vigna angularis); AAF04584.1 MsMT1 (Medicago sativa); BAD18382.1 PsMT1 (Pisum sativum); BAD18380.1 VfMT1 (Vicia faba); ABL10085.1 LbMT2 (Limonium bicolor); BAD18383.1 PsMT2 (Pisum sativum); Q39459.2 CaMT2 (Cicer arietinum); BAD18379.1 VaMT2 (Vigna angularis); ABA08415.1 AhMT2 (Arachis hypogaea); AAL76147.1 AtMT2a (Arabidopsis thaliana); NP_195858.1 AtMT2B (Arabidopsis thaliana); ABN46987.1 NnMT2a (Nelumbo nucifera); ABN46988.1 NnMT2b (Nelumbo nucifera); CAA10232.1 FsMT2 (Fagus sylvatica); ABR92329.1 SmMT2a (Salvia miltiorrhiza); NP_566509.1 AtMT3 (Arabidopsis thaliana); AAS99234.1 NcMT3 (Noccaea caerulescens); BAB85599.1 BjMT3 (Brassica juncea); ACB10219.2 EgMT3 (Elaeis guineensis); ACV51811.1 TaMT3 (Typha angustifolia); Q40256.1 MaMT3 (Musa acuminata); ABN46986.1 NnMT3 (Nelumbo nucifera); ACC77568.1 PjMT3 (Prosopis juliflora); CAH59438.1 PmMT3 (Plantago major); NP_181731.2 AtMT4 (Arabidopsis thaliana); AAB65792.1 GmMT4 (Glycine max); P30570.2 TaMT4 (Triticum aestivum); AAS78805.1 OsMT4 (Oryza sativa Japonica Group) and NP_001105499.1 ZmMT4 (Zea mays), .
Another abundant transcript is a messenger codified for a transcription factor type bHLH. The basic domain (bHLH) is a highly conserved amino acid motif that defines a group of transcription factors, which was initially described in animals and was soon discovered in all major eukaryotic lineages . Proteins containing a bHLH domain (referred to as bHLH proteins) are involved in a variety of regulatory processes; their functions include the regulation of neurogenesis, myogenesis and the development of the heart in animals [34, 35], control of phosphate uptake and glycolysis in yeast  or modulation of secondary metabolism pathways, epidermal differentiation and environmental responses in plants . The bHLH domain consists of two distinct segments composed by 50–60 amino acids, 10–15 mostly basic amino acids form the stretch (basic region) and approximately 40 amino acids form the two amphipathic helices separated by a loop (helix-loop-helix region). The analysis of the structure of bHLH proteins (yeast and mammalian) showed the basic region made in the DNA contact, while the two helices promote the formation of heterodimers between bHLH proteins . These bHLH transcription factors are generally classified into six major groups (FAs) based on their ability to bind to DNA [35, 38, 39]. Most bHLH proteins are classified into group A or B; in group A, it is expected to bind to E-box consensus sequences (CACCTG or CAGCTG), in group B, it is specifically bind to G-Box consensus sequences (CACGTG or CATGTTG) and in group C, bHLH proteins share a PAS domain and bind to the recognized sequences without a need a E-box (ACGTG or GCGTG) sequences. The E group includes bHLH proteins containing a conserved Pro or Gly residue at a key position within the basic region, preferably bind to sequences referred to as N-boxes (CACGCG or CACGAG), and further share an additional WRPW motif. Groups D and F represent particularly proteins which were separated in the basic region. Some group D proteins have been described as being unable to bind to DNA and could form heterodimers that function as negative regulators of bHLH binding to DNA . Group F includes so-called COE proteins. A phylogenetic study indicated that group A contained mammalian bHLH proteins and lacked bHLH plant proteins. The other groups had a mixture of different species and most of the bHLH proteins of plants belonged to group B [41, 42], It has been shown that the bHLH family of proteins in plants is monophyletic and subjected to significant radiation before the evolution of mosses; bHLH groups established in terrestrial plants during the first 400 million years were conserved during the later evolution of plants, although there were many duplications of genes. The transcription factors are very varied since it does not have many amino acids conserved throughout its sequence; nevertheless in the sites of union to the DNA like the case of the bHLH, the great majority of its amino acids is conserved within its main motive. Due to their propensity to form homodimers or heterodimers, bHLH proteins can participate in an extensive set of a combinatorial interactions leading to the regulation of multiple transcriptional programs. The development of fleshy fruits involves complex physiological and biochemical changes. Recent studies have described the involvement of bHLH proteins in the determination of plant organ size. The SPATULA protein was shown to control cotyledon, leaf and petal expansion by affecting cell proliferation in Arabidopsis thaliana . Nicolas et al.  described a bHLH transcription factor preferentially expressed in grape berry fruit, but is weakly detected in seeds. This gene is involved in cell size determination. Three basic helix-loop-helix transcription factors (bHLH) were also found to be involved in Arabidopsis fruit dehiscence process: ALCATRAZ (ALC), SPATULA (SPT) and INDEHISCENT (IND); they form a regulatory network that orchestrates the differentiation of the valve margin, allowing seed dispersal . A protein Blast was performed and the sequences that were selected for a multiple alignment were the ones that presented greater coverage and identity with the transcription factor bHLH of Mexican native avocado seed. High sequence variability was found for the nearest bHLH motif, which presenting a large number of conserved amino acids (Figure 7).
However, since they do not have any additional information regarding the function and/or tissue in which the function is performed, we analyzed our sequence with bHLH sequences of which their function or organospecificity is known; two bHLH factors were chosen with these characteristics of Arabidopsis thaliana, bHLH Zoupi (GenBank Accession: OAP16519) involved in seed development and SPATULA (GenBank Accession: AT4G36930) involved in cell development; however, no conserved amino acids were found with the avocado bHLH (data not shown). To try to infer the function of this transcription factor highly expressed in avocado seed, it will be necessary to carry out research studies of the recognition boxes inside in DNA to infer the possible association to the group to which it belongs. It is probable that the bHLH transcription factor of the avocado seed has a principal paper in seed differentiation and development.
Multicellular organisms produce small cysteinerich antimicrobial peptides (AMPs) as an innate defense against pathogens. Native Mexican avocado seed abundantly express Snakin (PaSn) gene. These kind of AMPs were initially isolated from potato but were later found to be ubiquitous. Novel plant APs isolated include in Arabidopsis (family of 12-cysteine peptides).
We identified a single cDNA sequence for snakin/GASA (gibberellic acid-stimulated), which contains a coding sequence of 318 bp and encodes a predicted 106 amino acid peptide. This molecule comprises a 26 amino acid signal peptide (residues 1–26), identified by SignalP (
3.4. Expression patterns of selected genes measured by sqPCR
Expression patterns of five genes from the seed library were studied by semi-quantitative PCR (Figure 10); bHLH transcription factor, three metallothioneins, antimicrobial peptide snakin and SUMO like reference gene during avocado seed development. These genes could be divided into three stages according to the time of growth of the seed in avocado fruit. The bHLH gene has an expression pattern comparable to the endogenous gene SUMO (Ubiquitin), suggesting a role throughout the formation and development of the avocado fruit seed possibly modulating the biogenesis of the seed or embryo; since from the first month of formation (E1) to ripening (E8), similar expression levels were present. For the Metallothionein gene group, PaMT3 presented a pattern of constant expression in the three stages of seed development used slightly above at the peak of expression compared to the endogenous gene but not for PaMT2a which has an initial level of expression low in first month of development, having its maximum expression peak in the stage of 4 months; PaMT2b has its maximum expression peak at the beginning of the fruit formation in the first month, decaying this by month 4 and recovering expression levels for ripening; it should be noted that metallothioneins have been directly involved within various roles within the functions. Some of them as carriers or facilitators of metal ions for processes of defense, synthesis or hydrolysis of reserve components to make them more bioavailable ; however, the authors do not reach an agreement to say that the different types or families of metallothioneins play a specific role. The snakin gene has a similar behavior throughout the development of the fruit emphasizing its role within the defense against pathogens or as the first barrier of protection or signaling of attack, making it therefore important to maintain the levels of expression throughout development and possibly after this for fruit protection (Figure 10). The expression patterns of the selected genes identified by sqPCR and the different expression patterns of avocado seed transcriptome suggested various roles of these genes in response to seed development and protection in avocado fruit.
In this work, we identified and characterized three novel metallothioneins and one transcription factor gene from avocado nativo mexicano seeds, which are expressed abundantly during seed development. This suggests that they can have a protagonic paper during seed development and probably form a network to protect the embryo for drought stress. More studies are necessary to elucidate the paper of these genes during avocado seed-fruit development.