The Evolutionary History of CBF Transcription Factors: Gene Duplication of CCAAT – Binding Factors NF-Y in Plants

Eukaryotic gene expression is often controlled by complex and refined combinatorial transcription factor networks composed of multiprotein complexes that derive their gene regulatory capacity from both intrinsic properties and from their trans-acting partners (Singh, 1998; Wolberger, 1998; Remenyi et al., 2004). Participation in such higher complex order allows an organism to use single transcription factors to control multiple genes with different temporal and spatial expression patterns (Siefers et al., 2009). In this chapter, we provide a synopsis of the genetic and genomic mechanisms that might be responsible for the gene copy diversification observed in the eukaryotic NF-Y transcription factor family. We identify the genes coding for NF-Y transcription factors in eukaryotes with an emphasis on the duplication of the NF-Y family in the plant lineage and discuss the important consequences of its gene diversification.


Introduction
Eukaryotic gene expression is often controlled by complex and refined combinatorial transcription factor networks composed of multiprotein complexes that derive their gene regulatory capacity from both intrinsic properties and from their trans-acting partners (Singh, 1998;Wolberger, 1998;Remenyi et al., 2004). Participation in such higher complex order allows an organism to use single transcription factors to control multiple genes with different temporal and spatial expression patterns (Siefers et al., 2009). In this chapter, we provide a synopsis of the genetic and genomic mechanisms that might be responsible for the gene copy diversification observed in the eukaryotic NF-Y transcription factor family. We identify the genes coding for NF-Y transcription factors in eukaryotes with an emphasis on the duplication of the NF-Y family in the plant lineage and discuss the important consequences of its gene diversification.

The CCAAT cis-element promoter
Eukaryotic genes contain numerous cis-regulatory elements that mediate their induction, repression or basal transcription (Dynan and Tjian, 1985;Myers et al., 1986;Maity and de Crombrugghe, 1998). These regulatory elements can be found in the proximity of transcribed genes, such as the promoter region and/or in distant regions of the genes where they may act as enhancers (de Silvio et al., 1999). The transcriptional regulation of several eukaryotic genes is coordinated through sequencespecific binding of proteins to the promoter region located upstream of the gene. During evolution, many of these protein-binding sequences, which are found in a wide variety of organisms, have shown a high degree of conservation (Edwards et al., 1998). The CCAAT box is one of the most common upstream elements, found in approximately 25-30% of eukaryotic promoters (Bucher, 1990;Mantovani, 1998). It is typically located between 60-100 bp upstream of the transcription start site and it can function in direct or in inverted orientations (Dorn et al., 1987b;Bucher, 1990;Edwards et al., 1998;Mantovani, 1998;Stephenson et al., 2007) with possible cooperative interactions between multiple boxes (Tasanen et al., 1992) or other conserved motifs (Muro et al., 1992;Rieping and Schoffl, 1992; www.intechopen.com Edwards et al., 1998). CCAAT boxes are highly conserved within homologous genes across species in terms of position, orientation, and flanking nucleotides (Mantovani, 1998). In addition, the spacing between the CCAAT box and other promoter-specific cis-elements is also conserved among species (Dorn et al., 1987a;Chodosh et al., 1988;Maity and de Crombrugghe, 1998).The expression of genes under the control of promoters that contain CCAAT boxes may be ubiquitous or tissue/stage specific, suggesting that the gene expression pattern is also determined by other cis and trans elements (Stephenson et al., 2007). In Sacharomyces cerevisiae, CCAAT boxes are found in the promoters of cytochrome genes, in genes coding for proteins that are activated by non-fermentable carbon sources (McNabb et al., 1995) and in genes involved in nitrogen metabolism (Dang et al., 1996). In the filamentous fungus Aspergillus nidulans, CCAAT boxes are present in genes involved with penicillin biosynthesis (Steidl et al., 1999). In higher eukaryotes, a multitude of promoters contain CCAAT boxes, including those of developmentally controlled and tissue-specific genes (Berry et al., 1992), housekeeping and inducible genes (Roy and Lee, 1995) and cellcycle regulated genes (Mantovani, 1998). In addition, many cell-cycle regulated promoters lack a recognizable TATA-box, but contain more than one CCAAT box in a position close to and sometimes overlapping with the start site of transcription (Zwicker and Muller, 1997).

The CBF/NF-Y transcription factor
Several CCAAT-binding proteins have been isolated and described, including CBF/NF-Y (CCAAT Binding Factor/Nuclear Factor of the Y box), CTF/NF1 (CCAAT Transcription Factor/Nuclear Factor 1), C/EBP (CCAAT/Enhancer Binding Protein) and CDP (CCAAT Displacement Protein) (Mantovani, 1999). Among them, NF-Y is the most ubiquitous and specific one acting as a key proximal promoter factor in the transcriptional regulation of an array of different eukaryotic genes. Unlike other CCAAT-binding proteins, NF-Y requires a high degree of conservation of the CCAAT pentanucletide sequence and shows strong preference for specific flanking sequences (Dorn et al., 1987a;Stephenson et al., 2007). Therefore, the NF-YC transcription factor can be distinguished from the other CCAATbinding proteins based on its DNA sequence requirements (Maity and de Crombrugghe, 1998). The CBF/NF-Y transcription factor, which will be referenced in this chapter as NF-Y, is a conserved oligomeric transcription factor found in all eukaryotes that is involved in the regulation of diverse genes McNabb et al., 1995;Edwards et al., 1998;Mantovani, 1998;Siefers et al., 2009). NF-Y typically acts in concert with other regulatory factors to modulate gene expression in a highly controlled manner (Nelson et al., 2007). In many eukaryotic promoters, the functional NF-Y-binding sites are relatively close to the TATA motif (Bucher, 1990) and are invariably flanked by at least one additional functionally important cis-element. Several reports have shown that various factors, including transcription factors, co-activators, and TATA-binding proteins, interact with NF-Y or its subunits in promoting transcriptional regulation (Mantovani, 1999;Yazawa and Kamada, 2007). NF-Y was originally identified as the protein that recognizes the MHC class II conserved Y box in Ea promoters (Dorn et al., 1987a;Matuoka and Chen, 2002). It specifically recognizes the consensus sequence 5'-CTGATTGGYYRR-3' or 5'-YYRRCCAATCAG-3'(Y is 5 pyrimidines and R is 5 purines) present in the promoter region of eukaryotic genes.
Bioinformatic analyses indicate that about 30% of mammalian promoters have predicted www.intechopen.com NF-Y binding sites (Bucher, 1990;Testa et al., 2005), and chromatin immunoprecipitation data have demonstrated additional widespread NF-Y binding in nonpromoter sites. Suggesting the importance of binding context, NF-Y-regulated gene expression can be tissue specific, developmentally regulated, or constitutive (Maity and de Crombrugghe, 1998;Siefers et al., 2009).The transcriptional activity of NF-Y can be regulated by differential expression, alternative splicing, protein-protein interactions, and cellular redox potential (Matuoka and Yu Chen, 1999). NF-Y has been shown to be involved in the regulation of some G1/S genes whose expressions are attenuated during the senescence process (Matuoka and Yu Chen, 1999). NF-Y plays a pivotal role in the cell cycle regulation of the mammalian cyclin A, cdc25C, and cdc2 genes, in the S-phase of the cell cycle (Currie, 1998). Additionally, there are a number of genes involved in the cellular response to damage and stress, including the phospholipid hydroperoxide glutathione peroxidase genes (Huang et al., 1999), which are regulated by NF-Y, indicating its pivotal role in the removal of damaging agents from cells (Matuoka and Chen, 2002). Although NF-Y functions basically as a transactivator of gene expression, it is also involved, directly or indirectly, in the downregulation of transcription. For instance, NF-Y binds to the mouse CCAAT box renin enhancer and blocks the binding of positive regulatory elements (Shi et al., 2001). In this case, NF-Y dysfunction would lead to the damage of systems that control blood pressure (Matuoka and Chen, 2002). NF-Y is composed of three different subunits named NF-YA (also known as HAP-2 or CBF-B), NF-YB (HAP3 or CBF-A), and NF-YC (HAP5 or CBF-C) that interact to form a complex that can bind CCAAT DNA motifs and control the expression of target genes ( Figure 1). Each subunit is required for DNA binding, subunit association and transcriptional regulation in both vertebrates and plants (Sinha et al., 1995;Stephenson et al., 2007). Yeast possesses a fourth subunit, called HAP4, which provides a transcriptional activation domain to the complex (Forsburg and Guarente, 1989;Lee et al., 2003). The yeast HAP4 protein is not needed for DNA-binding but contains an acidic domain that is essential to promote transactivation when associated with the HAP2/HAP3/HAP5 complex (Olesen and Guarente, 1990;Serra et al., 1998). In vertebrates, the function of this fourth domain was incorporated into other subunits (Forsburg and Guarente, 1989;Yazawa and Kamada, 2007). Despite the wide cellular distribution and functional variability of NF-Y-regulated genes, most eukaryotic genomes have only one or two genes encoding each NF-Y subunit (Maity and de Crombrugghe, 1998;Riechmann and Ratcliffe, 2000). Fungi and animals, for example, present single genes encoding each protein subunit. Thus, there is minimal combinatorial diversity in the subunit composition of the heterotrimeric NF-Y in these organisms (Siefers et al., 2009). In contrast, the NF-Y complex in vascular plants is generally encoded by gene families (Riechmann and Ratcliffe, 2000).

NF-Y subunits
NF-Y is the only transcription factor thus far identified for which the interaction of three heterologous subunits creates the DNA binding domain McNabb et al., 1995;Sinha et al., 1995). All three NF-Y subunits are essential for the DNA binding activity and one molecule of each subunit forms the NF-Y-DNA complex . Each NF-Y subunit contains a conserved domain with identities greater than 70% across species. This highly conserved domain is located at the Cterminus of NF-YA; in the central part of NF-YB; and at the N-terminus of NF-YC (Li et al., 1992).

www.intechopen.com
The NF-YA conserved domain can be divided in two functionally distinct regions: an Nterminal region that is required for NF-YB and NF-YC association and a C-terminal region required for DNA-binding . Additionally, NF-YA usually contains a glutamine (Q)-rich and a serine/threonine (S/T)-rich regions. There are numerous variants of NF-YA due to alternative splicing at the Q-S/T domains (Li et al., 1992) and, although the expression of these isoforms is variable depending of tissue and cell types, they all seem intact in terms of transcriptional function (Matuoka and Chen, 2002). Both NF-YB and NF-YC subunits possess the highly conserved histone-fold motif (HFM) and are structurally similar to core histone subunits H2B and H2A, respectively, and to the archaebacterial histone-like protein Hmf-2 Baxevanis et al., 1995;Mantovani, 1998). In terms of identity, NF-YB is 30% identical to H2B, 14% to H2A, 17% to H4 and 18% to H3; NF-YC is 21% identical to H2A, 15% to H4 and H3 and 20% to H2B (Liberati et al., 1999). Other proteins showing a remarkable identity (25-30%) to both NF-YB and NF-YC are present in Archaea. These proteins homodimerize and associate with DNA, forming nucleosome-like structures (Sandman et al., 1990). The NF-YB and NF-YC subunits also contain residues that are important for their contact with DNA (Romier et al., 2003;Stephenson et al., 2007). In contrast, the conserved segment of NF-YA has no homology with the histone-fold motif, or with any of the known dimerization motifs present in other heteromeric DNA-binding proteins . Some portions of NF-YA, NF-YB and NF-YC present a high degree of identity with yeast HAP3, HAP2 and HAP5, respectively. These HAP genes, which are components of the yeast CCAAT-binding protein, are necessary for the expression of genes encoding components of the electron transport chain. Yeast strains mutated for either of the three genes failed to grow on media containing a nonfermentable carbon source such as lactate or glycerol, a characteristic respiratory-defect phenotype (McNabb et al., 1995). Assembly of the NF-Y heterotrimer in mammals (where this complex is better studied) follows a strict, stepwise pattern (Sinha et al., 1995;Sinha et al., 1996) (Figure 1). Initially, the NF-YB and NF-YC subunits form a tight heterodimer ( Figure 1a) similar to those of the HFM, a conserved protein-protein and DNA-binding interaction module (Luger et al., 1997) composed by 65 amino acid stretch common to all histones that is required for nucleosome formation (Baxevanis et al., 1995;Luger et al., 1997;de Silvio et al., 1999). This dimer then moves to the nucleus, where the third subunit (NF-YA, Figure 1b) is recruited to generate the complete, heterotrimeric NF-Y ( Figure 1c). Interestingly, NF-YA is unable to interact with the NF-YB or NF-YC alone, interacting only with the NF-YB-NF-YC heterodimer (Serra et al., 1998). The complete NF-Y is able to bind promoters containing the core pentamer nucleotide sequence CCAAT ( Figure 1d) with high specificity and affinity resulting in either positive or negative transcriptional regulation ( Figure 1e) (Peng and Jahroudi, 2002;2003;Ceribelli et al., 2008;Siefers et al., 2009). Because the NF-Y transcription factor contains H2B-like and H2A-like molecules (NF-YB and NF-YC, respectively), the complex presents all the core histone components and could mimic the interaction of the nucleosome core with genomic DNA (Struhl and Moqtaderi, 1998). In this scenario, it has been demonstrated that the NF-YA/NF-YB/NF-YC trimer or the NF-YB/NF-YC dimer can bind to H3/H4 tetramer during nucleosome assembly (Caretti et al., 1999). In addition, the NF-Y complex also can bind to the chromatin even after nucleosome formation, indicating the ability of NF-Y to interact with genomic DNA assembled in the nucleosome. The interaction between the NF-Y transcription factor and the DNA molecule causes local disruption of the nucleosomal architecture (Coustry et al., 2001).
This disruption results in a partial dissociation of DNA from the histone core, which might enable the access of the general transcription machinery to initiate the transcription process. Fig. 1. Assembly of NF-Y subunits and its binding to DNA. Initially, the NF-YB and NF-YC subunits form a tight heterodimer via protein-protein interactions (a). The dimer then moves to the nucleus, where is recruited the third subunit (NF-YA) (b) to generate the complete, heterotrimeric NF-Y (c) that is able to bind promoters containing the core pentamer nucleotide sequence CCAAT (d) resulting in either positive or negative transcriptional regulation (e). Adapted from Mantovani (1999). White circles and oblong black circles into each NF-Y subunit represent the DNA-binding domain and the NF-Y interaction domain of NF-Y subunits, respectively.

Gene duplication and evolution
DNA duplication act as one of the main forces driving the evolution of organisms by creating the raw genetic material that natural selection can subsequently modify. Gene duplications arise in eukaryotes at a rate of 0.01 paralogs per gene per million years (Lynch and Conery, 2000), the same order of magnitude of the mutation rate per nucleotide per year (De Grassi et al., 2008). Duplication of individual genes, chromosomal segments, or entire genomes represent the primary source for the origin of evolutionary novelties, including new gene functions and expression patterns (Holland et al., 1994;Sidow, 1996;Lynch and Conery, 2000). However, how duplicated genes successfully evolve from an initial state of complete redundancy, wherein one copy is likely to be expendable, to a stable situation in which both copies are maintained by natural selection, is unclear (Sidow, 1996;Lynch and Conery, 2000;Ober, 2010). In the evolutionary history of plants, genome duplications have been relatively common, leading to the hypothesis that most angiosperms are to some extent polyploidal (Soltis, 2005). The genome of Arabidopsis, for example, possesses traces of at least three polyploidy events (Vision et al., 2000;Simillion et al., 2002), followed by subsequent gene loss (Bowers et al., 2003;Ober, 2010). Similar to a point mutation, a duplication that occurs in an individual can be fixed or lost in the population. Compared with pre-existing alleles, if a new allele of the duplicate gene is selectively neutral, it has a small probability (1/2N) to be fixed in a diploid population (where N is the effective population size). This suggests that the majority of duplicated genes will be lost. For those duplicated genes that do become fixed, the fixation time averages is 4N generations (Kimura, 1989;Zhang, 2003).
On an evolutionary scale, gene duplication may result in new functions via different scenarios. Although the most likely outcome is a loss of function in one of the two gene copies (nonfunctionalization, Figure 2a), in rare instances one copy may acquire a novel evolutionarily advantageous function and become preserved by natural selection (neofunctionalization, Figure 2b), while the other copy retains the original function. Alternatively, after duplication, mutations may occur in both genes leading to specialization to perform complementary functions (subfunctionalization, Figure 2c) (Lynch and Conery, 2000;Lynch and Force, 2000). This process produces novel genetic variants that drive genetic innovation (Lynch and Conery, 2000;Conrad and Antonarakis, 2007). Because gene duplication generates functional redundancy, it is often not advantageous to the organism to possess two identical genes. In nonfunctionalization (Figure 2a), the accumulation of deleterious mutations might lead to the loss of the original function of one paralogue. Alternatively, instead of being completely lost, many duplicated genes are silenced or become pseudogenes and are thus either unexpressed or functionless (Gallagher et al., 2004;Nicole et al., 2006;Yang et al., 2006;Beisswanger and Stephan, 2008;Xiong et al., 2009). Pseudogenization is the most frequent fate of duplicated genes. In Caenorhabditis elegans, for example, genomic analyses have identified 2168 pseudogenes or approximately one pseudogene for every eight functional genes (Harrison et al., 2001). In humans, one pseudogene was identified for approximately every two functional genes (Harrison et al., 2002). As pseudogenes generally do not confer a selective advantage, they have a low probability of being fixed in large populations (Ober, 2010).
Unless the presence of an extra amount of gene product is advantageous, it is unlikely that two genes with the same function will be stably maintained in the genome of the organism (Nowak et al., 1997). In subfunctionalization (Figure 2c), both duplicated copies may become, by accumulation of mutations, partially compromised to the point at which their total capacity is reduced to the level of the single-copy ancestral gene (Force et al., 1999;Stoltzfus, 1999;Lynch and Force, 2000). Subfunctionalization can occur through the modification of the regulatory elements by mutations (Force et al., 1999;Hinman and Davidson, 2007) or by epigenetic silencing (Rodin and Riggs, 2003). In an evolutionary scale, one of the most important forms of subfunctionalization is the division of gene expression after duplication (Force et al., 1999). For example, zebrafish ENGRAILED 1 and ENGRAILED 1-B, generated by a chromosomal segmental duplication, are a pair of transcription factors that occurred in the lineage of ray-finned fish. While ENGRAILED-1 is expressed in the pectoral appendage bud, ENGRAILED 1-B is expressed in a specific set of neurons in the hindbrain/spinal cord (Force et al., 1999). In yeast, more than 40% of gene pairs exhibit significant expression divergence (Gu et al., 2002). Also, the comparison of 17 fungal genomes revealed that duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control (Wapinski et al., 2007). On the other hand, if two redundant gene copies were retained without significant functional divergence in the genome, the organism may acquire increased genetic robustness against harmful mutations (Figure 1d) (Conrad and Antonarakis, 2007). In neofunctionalization (Figure 2b), the ancestral gene keeps its ancestral function, while the duplicated gene gains a new function under positive selection for advantageous mutations (De Grassi et al., 2008). However, in many cases, rather than an entirely new function, a related function evolves after gene duplication. For example, the red and green-sensitive opsin genes of humans where the result of a gene duplication that occurred in hominoids and Old World monkeys (Yokoyama and Yokoyama, 1989). After the duplication process, functional divergence of the two opsins resulted in a 30-nanometer difference in their maximum absorption wavelength. This difference conferred a sensitivity to a wide range of colors for humans and related primates (Zhang, 2003). The fate of a gene that suffers duplication seems to be the result of diverse and, in some cases, interdependent factors (Taylor et al., 2001). These variables include its functional category (Papp et al., 2003;Kondrashov and Koonin, 2004;Marland et al., 2004), degree of conservation (Conant and Wagner, 2002;Davis and Petrov, 2004;Jordan et al., 2004;Braybrook and Harada, 2008), sensitivity to dosage effects (Kondrashov and Koonin, 2004), as well as its regulatory and architectural complexity (He and Zhang, 2005). Some observations indicate that natural selection created a preferential association of duplications with certain gene categories. For example, genes encoding proteins that interact with the environment are more frequently retained after the duplication process than genes which interact at intracellular compartments www.intechopen.com (Li et al., 2003;Marland et al., 2004). In addition, genomes tend to retain duplicated genes involved in signal transduction and transcription, but to lose duplicated DNA repair genes (Blanc and Wolfe, 2004;Maere et al., 2005;Paterson et al., 2010). It has been shown that shortly after duplication the protein-coding sequence and cisregulatory regions of some duplicated genes can evolve independently (Figure 3a) (Wagner, 2000). This independent evolution can generate protein sequence divergence of duplicated genes (Figure 3b1) or protein network divergence (Figure 3b2), where the protein interaction domains (cis-regulatory elements) of the original sequence evolve by maintenance, gain, or loss of interacting partners. Alternatively, the divergence of cis-regulatory motifs in the promoter-proximal region (Figure 3c) can generate expression divergence between the duplicated genes (Conrad and Antonarakis, 2007).

Gene duplication of NF-Y in plants
While duplication of NF-Y genes is poorly understood in the plant lineage, many of the functional mechanistic details are likely conserved across plant, animal and fungal lineages. This inference comes from strong cross-kingdom conservation of functional important amino acid residues in mammalian and yeast NF-Ys Sinha et al., 1995;Coustry et al., 1996;Kim et al., 1996;Sinha et al., 1996;Mantovani, 1998;Romier et al., 2003). CCAAT-like motifs are found in several plant promoters, and binding activity to CCAAT sequences has been identified in plant nuclear extracts (Yazawa and Kamada, 2007). Besides, at least some plant NF-YA and NF-YB subunits have been shown to complement yeast mutant strains lacking the corresponding NF-Y subunit. Additionally, several groups have demonstrated that each of the three plant NF-Y proteins can substitute their yeast counterparts in gene expression assays (Edwards et al., 1998;Masiero et al., 2002;Ben-Naim et al., 2006;Siefers et al., 2009). These observations indicate that plant NF-Y subunits might act as general transcription factors, as in mammals (Yamamoto et al., 2009). Although a complete functional plant NF-Y complex has not yet been described, the individual subunits are known to be involved in a number of important physiological processes, such as specific developmental processes and response to environmental stimuli (Lotan et al., 1998;Kusnetsov et al., 1999;Miyoshi et al., 2003;Ben-Naim et al., 2006;Combier et al., 2006;Wenkel et al., 2006;Cai et al., 2007;Nelson et al., 2007;Warpeha et al., 2007;Siefers et al., 2009). A well-established example is the NF-YB subunit gene called LEAFY COTYLEDON-1 (LEC1), which specifically controls embryo development, especially the maturation phase. LEC1 plays specialized roles not only because of its developmentally regulated expression but also due to its distinct molecular activity, as the in vivo function of LEC1 cannot be replaced by other NF-YB subunits, except for the most closely related Leafy Cotyledon 1 Like (L1L) (Kwong et al., 2003;Lee et al., 2003;Yamamoto et al., 2009). In Arabidopsis, many NF-Y subunit genes are expressed ubiquitously, although some are differentially expressed. For example, while the AtNF-YC-4 transcript accumulates in seeds 7 days after germination, AtNF-YB-9 is only expressed in green siliques (Gusmaroli et al., 2001). Plant NF-Y function also appears to be important for responses to drought stress. Although a specific mechanism of action remains unclear, overexpression of the AtNF-YB1 subunit and its orthologue in maize (Zea mays), ZmNF-YB2, leads to enhanced drought resistance (Nelson et al., 2007). Another study showed that overexpression of maize NF-YA5 reduced drought susceptibility, anthocyanin production and stomatal aperture, while nf-ya5 mutants had the expected opposite phenotype in each situation (Li et al., 2008). In addition, several publications strongly suggest that NF-Y transcription factors are also involved in photoperiod-regulated flowering (Ben-Naim et al., 2006;Wenkel et al., 2006;Siefers et al., 2009). We adopted a high throughput comparative genomic approach to conduct a broad survey of fully sequenced genomes, including representatives of amoebozoa, yeasts, fungi, algae, mosses, plants, vertebrate and invertebrate species to identify the presence of homologous genes coding for each of the three subunits that form the NF-Y transcription factor (Table 1). NF-Y gene and protein sequences were obtained through blast searches (blastp, blastx and tblastx) against the Protein and Genome databases with the default parameters at the NCBI (National Center for Biotechnology Information -http://www.ncbi.nlm.nih.gov) and against completed genome projects database at the JGI (Joint Genome Institutehttp://www.jgi.doe.gov). The results point to a scenario where all fungi and the majority of metazoa possess single genes coding for each of the NF-Y subunits ( Table 1). The metazoa exceptions include the amphioxus Branchiostoma floridae, the nematode Caenorhabditis elegans and the gastropod Lottia gigantea, all of each present a proportional duplication of the three subunits, possessing two genes for each subunit (Table 1). In contrast, plants possess gene families coding for each NF-Y subunit (Table 1). For instance, in the model plant Arabidopsis thaliana 10 genes coding for NF-YA, 13 for NFY-B, and 13 for NF-YC were identified. Because of the heterotrimeric composition, the 36 Arabidopsis NF-Y subunits could theoretically combine to generate 1.690 unique transcription factors (Siefers et al., 2009). This Arabidopsis NF-Y expansion is a general feature of the plant lineage, including monocots and eudicots. In rice (Oryza sativa), for example, 11 genes were identified coding for the NF-YA subunit, 10 for NF-YB and 8 for NFY-C. Four of the rice NF-YB subunits have been characterized and at least one of these genes is involved in chloroplast development (Miyoshi et al., 2003;Yazawa and Kamada, 2007). Interestingly, the moss Physcomitrella patens and the lycophyte Selaginella mollendorffii possess single genes coding for NF-YA subunits whereas the other subunits are encoded by multiple genes (Table 1). Since the evolutionary rates can be species dependent, the difference observed in the number of genes of NF-Y subunits in eukaryotic class (Table 1), especially in vascular plants, can be result of recent duplication process that contribute to the establishment of genes families coding each NF-Y subunit. However, some duplicated genes might have suffered high level of diversification what could be responsible to prevent their identification in our analyses. Representative plants genes (monocot and eudicot) were selected to perform phylogenetic analyses of the NF-Y subunits. The phylogenetic analysis was reconstructed after protein sequence alignments using a Bayesian approach in MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003). The mixed amino acid substitution model plus gamma and invariant sites was used in two independent runs of 5,000,000 generations each with two Metropoliscoupled Monte Carlo Markov chains (MCMCMC) that were run in parallel (starting each from a random tree). Markov chains were sampled every 100 generations, and the first 25% of the trees were discarded as burn-in. The remaining ones were used to compute the majority rule consensus tree, the posterior probability of clades and branch lengths (Figure 4 to 6).   Phylogenetic analysis showed that the gene diversification of all NF-Y subunits likely resulted from several duplication events along evolution and diversification of plants (Figure 4 to 6). It was possible to observe the formation of four independent highly supported clusters for the NF-YA subunit (I to IV, Figure 4), three for NF-YB (I to III, Figure  5) and five for NF-YC (I to V, Figure 6). Based on these results, we suggest that each cluster might possess an independent ancestral subunit that the duplicated members of each group originated from. However, independent duplication events have occurred in many species after the divergence of monocots and eudicots. For example, the soybean and Arabidopsis genomes have experienced a series of recent duplication events (red squares in figures 4 to 6) that could be the result of chromosome duplication or could be derived from polyploidization events (soybean is a good example of a recent polyploidization). These duplications can help us to explain the differences observed in the number of genes coding for the NF-Y subunits in plants (Table 1). Additionally, these duplications seem to be relatively recent and can provide the raw material for neofunctionalization ( Figure 2b) and functional divergence of duplicated genes ( Figure 3). With few exceptions (genes that possess an unresolved position in the phylogenetic tree are plotted with black arrows, Figures 4 to 6), all clusters of a specific NF-Y subunit are formed by well-defined subclusters of monocot and eudicot representatives (Figure 4 to 6). Events of duplication inside a specific plant family were also observed between the two fabaceae species Glycine max and Medicago truncatula (green square, Figure 4), which could indicate concerted evolution of duplicated genes between these related species (Figure 2e). This is similar to the cladespecific shifts in selective constraint following concerted duplication events observed for MADS box transcription factors in angiosperms (Shan et al., 2009). The duplication process is a prominent feature of plant genomic architecture (Figure 1). This has led many researchers to speculate that gene duplication may have played an important role in the evolution of phenotypic novelty within the plant lineage (Flagel and Wendel, 2009). As a result of pervasive and recurring small-scale duplications, which may be followed by functional divergence, many nuclear genes in plants are members of gene families and may exhibit copy number variation lineages (Blanc and Wolfe, 2004;Schlueter et al., 2004), as can be observed in table 1. Evidence for frequent gene duplication has also been observed in the evolutionary history of numerous gene families that have expanded during the diversification of the angiosperms Zahn et al., 2005;Duarte et al., 2010). In multigene families descended from a common ancestor, individual genes in the group exert similar functions and have similar DNA sequences (Conrad and Antonarakis, 2007). One concept, concerted evolution, applies particularly to localized and typically tandem copies of a gene. The concept posits that all genes in a given group evolve coordinately, and that homogenization is the result of gene conversion (Figure 2e) (Conrad and Antonarakis, 2007). The emerging picture points to plant NF-Y complexes acting as essential regulatory hubs for many processes. Multiple NF-Y subunits in vascular plants may associate with each other in various combinations that regulate the expression of specific gene sets and might provide similar levels of combinatorial diversity for transcriptional fine-tuning (Siefers et al., 2009).The amplification observed in the plant lineage (Table 1) raises the possibility that new and divergent functions of heterotrimeric complexes have evolved in plants (Nelson et al., 2007) indicating a more complex regulatory role for the various NF-Y proteins in plants than in other organisms (Stephenson et al., 2007).
The existence of multiple genes for each subunit in the plant genome indicates that the specificity of subunit interaction may be determined by preferential protein-protein interaction, tissue or cell-specific expression of each gene or a combination of both (Yazawa and Kamada, 2007). The large number of possible combinations has hindered the analysis of plant NF-Y complexes and suggests that they might act in a more intricate system than in vertebrates and yeast, which have only one gene that encodes each HAP subunit (Yazawa and Kamada, 2007). Additionally, the multiple copies for each NF-Y subunit raises a question if a specific NF-Y subunit interacts with any other two NF-Y subunits or if the NF-Y subunit interacts with only specific member(s) of the other two subunits (Thirumurugan et al., 2008). Although the presence of many genes encoding NF-Y subunits suggests a high degree of genetic redundancy in plants, the analysis of mutants in single NF-Y genes in Arabidopsis has been associated with defects in development and enhanced stress sensitivity, suggesting a specialized function for each member (Lotan et al., 1998;Kwong et al., 2003;Lee et al., 2003;Zanetti et al., 2010). This could indicate that duplicated genes have passed through a neofunctionalization process (Figure 2b). Some proteins may require several key substitutions before acquiring a new function, while others may be more mutationally labile. An example includes the terpene synthase gene family in Norway spruce (Picea abies). These genes appear to have undergone repeated rounds of neofunctionalization ( Figure 2b) (Keeling et al., 2008) and a small number of key amino acid substitutions among paralogs was sufficient to alter the substrate specificity and terpenoid product profiles (Flagel and Wendel, 2009). Another example of neofunctionalization ( Figure 2b) in plants is observed in Arabidopsis, where a specific amino acid residue identified in LEC1 and LEC1-LIKE (L1L) is responsible for differentiating their functions (seed development) from those of other NF-YB members (Kwong et al., 2003;Lee et al., 2003;Yamamoto et al., 2009). In addition, the analysis of amino acid substitution rates in plants has been appointed for the asymmetric evolution of certain duplicates of NF-YB and NF-YC subunits, which appears to be coupled with the asymmetric divergence in gene function (Yang et al., 2005;Yamamoto et al., 2009). With respect to expression patterns, the Arabidopsis NF-Y gene family presents some members that are ubiquitously expressed and others that are tissue specific or induced only after the switch to reproductive growth in flowers and siliques (Gusmaroli et al., 2001;Yazawa and Kamada, 2007). The difference observed in the expression pattern of these genes could represent an example of cis-regulatory divergence (Figure 3c), where the ciselement of gene evolves independently from the other members of gene family, and becomes regulated by different stimuli and/or trans-activators. Because genes that harbor NF-Y binding domains include genes that are constitutive, inducible, and cell-cycle-dependent, the regulation of the expression of these genes cannot be exclusively due to NF-Y binding to DNA. In this scenario, the interaction with other transcription factors, either functionally or physically, will contribute to the NF-Y action (Matuoka and Chen, 2002). In addition, the independent evolution of protein-binding domains present in duplicated gene architecture can contribute to protein network divergence (Figure 3b2), increasing the numbers of possible interacting partners of NF-Y genes.
When compared with other forms of mutation, a notable feature of duplication is that it creates genetic redundancy. This redundancy fosters evolutionary innovation, creating the opportunity for duplicates to explore new evolutionary terrain (Flagel and Wendel, 2009).
The most important contribution of gene duplication to evolution is to supply new genetic material for mutation, drift and selection to act upon. This leads to the creation of new genes and new gene functions (Hurley et al., 2005;Woollard, 2005;Schmidt and Davies, 2007), two important factors in the origin of genomic and organismal complexity (Gu et al., 2002;Taylor and Raes, 2004;Sterck et al., 2007). The plasticity of a genome or species in adapting to environmental changes would be severely limited without gene duplication, because no more than two variants (alleles) exist at any locus within a diploid individual. A good example is the dozens of duplicated immunoglobulin genes that constitute the vertebrate adaptive immune system. It seems difficult to imagine how this system could have acquired this high complexity level without gene duplication (Zhang, 2003). Plant gene families are largely conserved even over evolutionary time scales that encompass the diversification of all angiosperms and nonflowering plants (Rensing et al., 2008). This property of plant genomes indicates that plants have not created new gene families, but have been endowed with a basic genetic toolkit of ancient origin. Despite the evolutionary conservation of gene families, lineage-specific fluctuations in gene family size are frequently observed among taxa (Velasco et al., 2007;Ming et al., 2008;Rensing et al., 2008), which suggests that the diversity and lineage-specific phenotypic variation observed in land plants may not be explained by an equally diverse set of entirely novel genes. Indeed, much of plant diversity may have arisen from the duplication and adaptive specialization processes of pre-existing genes (neofunctionalization and subfunctionalization, Figure 2b and c, respectively). This perspective assigns gene duplication a central role in plant diversification, being a key process that generates the raw material necessary for adaptive evolution (Flagel and Wendel, 2009).

Conclusions
Whereas various classes of structural and metabolic genes preferentially return to a single copy state following whole-genome duplication (Paterson et al., 2010), transcription factors tend to be preferentially retained among the duplicated genes in A. thaliana (Flagel and Wendel, 2009). Our findings support the hypotheses that this preference seems to be true for all plant species, based on the number of genes identified for each NF-Y subunit. Certainly, further studies encompassing functional assays are required to ascertain the role of these genes in plant metabolism. The number of interacting partners in a molecular network (connectivity) of a particular gene also influences the probability of duplication gene retention (Flagel and Wendel, 2009). In this scenario, the high number of genes coding for the three subunits of NF-Y transcription factor in higher plants leads to numerous interaction possibilities among different genes of each subunit and among these genes and other transcription factors what could contribute to gene retention of the NF-Y transcription factor family in plants.