Plastome numbers and characteristics (average size, number of proteins, and structural RNAs) among the Archaeplastida. The minimum and maximum genome sizes are indicated in italic.
Photosynthetic eukaryotic cells arose more than a billion years ago through the engulfment of a cyanobacterium that was then converted into a chloroplast, enabling plants to perform photosynthesis. Since this event, chloroplast DNA has been massively transferred to the nucleus, sometimes leading to the creation of novel genes, exons, and regulatory elements. In addition to these evolutionary novelties, most cyanobacterial genes have been relocated into the nucleus, highly reducing the size, gene content, and autonomy of the chloroplast genome. In this chapter, we will first present our current knowledge on the origin and evolution of the plant plastome in the different Archaeplastida lineages (Glaucophyta, Rhodophyta, and Viridiplantae), focusing on its gene content, genome size, and structural evolution. Second, we will present the factors influencing the rate of DNA transfer from the chloroplast to the nucleus, the evolutionary fates of the nuclear integrants of plastid DNA (nupts) in their new eukaryotic environment, and the drivers of chloroplast gene functional relocation to the nucleus. Finally, we will discuss how cytonuclear interactions led to the intertwined coevolution of nuclear and chloroplast genomes and the impact of hybridization and allopolyploidy on cytonuclear interactions.
- plastome evolution
- functional gene transfer
- nuclear integrant of plastid DNA (nupt)
- nucleo-cytoplasmic interactions
Photosynthetic eukaryotic organisms harbor a chloroplast genome (also called ‘plastome’) within their cells. This genome derives from the endosymbiosis of a prokaryotic organism, which was then gradually converted into the chloroplast. With the increased number of sequences within publicly available databases and the emergence of very sophisticated phylogenetic and phylogenomic analyses, we can infer much more precisely the origin of this primary endosymbiotic event. In addition, these comparative analyses allow for investigation of plastome evolutionary dynamics in the different plant lineages and the extent of nuclear influence over the chloroplast genome. Overall, plant plastomes harbor a very low gene content compared to their prokaryotic ancestor, which appears to result from either gene loss due to redundant functions in both chloroplast and nuclear genomes or functional transfer and relocation of chloroplast genes into the nucleus. The relocation of thousands of chloroplast genes from the chloroplast to the nucleus was rendered possible due to the massive transfer of DNA from the chloroplast to the nucleus. However, chloroplast genes that have been integrated into the nucleus are not immediately functional and have to adapt to their new eukaryotic environment by acquiring various regulatory elements (i.e., promoter, polyadenylation signal, and target peptide). Despite most of these functional transfers occurred soon after the endosymbiotic event, some clever real-time experiments (using a selectable marker) have allowed for understanding how easily and by which molecular mechanisms DNA is transferred from the chloroplast to the nucleus. Such experiments have also permitted the study of the subsequent evolution of chloroplast DNA in the nuclear genome, and how a chloroplast gene becomes functional in the nucleus.
2. Chloroplast origin and evolution
Photosynthetic eukaryotic cells arose through the engulfment of a cyanobacterium that was then converted into the chloroplast, enabling plants to use sunlight to fix carbon. This major functional innovation allowed for eukaryotes to transition from heterotrophy to autotrophy. This primary endosymbiotic event is at the origin of the astonishing biodiversity visible today in plants, including the Glaucophyta, Rhodophyta, and Viridiplantae lineages (Figure 1). With the advent of next-generation sequencing technologies, the number of fully sequenced plastomes has hugely expanded, providing insight into chloroplast evolution in the different plant lineages. In this part, we will present our current knowledge on chloroplast origin and what has been unraveled on the chloroplast genome evolution, regarding genome size, gene content, structure, and mutation rate.
2.1. Primary endosymbiosis event and origin of chloroplasts
The first hypothesis of the endosymbiotic origin of chloroplasts is commonly credited to Russian botanist K. Mereschkowsky, who observed similarities between cyanobacteria and chloroplasts of plants and algae . This hypothesis was then reaffirmed by Margulis in the 1970s. The origin of this primary endosymbiosis event is still debated. While fossil-based phylogeny estimated the origin of chloroplasts to be around 1.4–1.7 billion years ago , gene-based approaches dated it around 0.9 billion years ago . Different phylogenetic analyses aimed at determining the cyanobacterial lineage from which the chloroplast was derived and revealed that chloroplasts were closely related to the nitrogen-fixing cyanobacteria Chroococcales,
It is now widely accepted that this primary endosymbiotic event has a single origin [6, 7, 8]; however, it is still unclear how long it took for the conversion of the bacterial endosymbiont into a fully integrated organelle. This transition from endosymbiont to organelle surely involved many steps. The first steps corresponded to the loss of the bacterial wall and the early acquisition by the endosymbiont of a transport system to transfer proteins and metabolites from the cytosol to the chloroplast. This latter step is constituted by two protein complexes: translocon of the outer (TOC) membranes of the chloroplast and translocon of the inner (TIC) membranes of the chloroplast [9, 10, 11]. The TIC/TOC complexes allow for transportation of the pre-proteins (proteins with a cleavable chloroplast target peptide) from the cytosol, where they are synthetized, to the chloroplast, where the target peptide is cleaved (reviewed in ). The presence of the same protein import apparatus in the different Archaeplastida lineages is the best evidence of the single origin of chloroplasts. Finally, the transition also necessitated the gradual functional transfer of endosymbiont genes to the nucleus , leading to the massive reduction of plastome size and gene content.
2.2. Evolution of chloroplast genomes
2.2.1. An unequal sequencing effort
Most of our current knowledge of the conversion from endosymbiont to organelle has been obtained by comparing contemporary Archaeplastida organelles with their closest bacterial relatives. During the last few years, advances in high-throughput sequencing and bioinformatic methods greatly facilitated the assembly, analysis, and publication of complete plastomes. To date, more than 2300 plastomes are fully assembled and deposited in the GenBank database. This number of plastomes actually doubled in the last 2 years. However, the number of sequenced plastomes varies greatly between the different Archaeplastida lineages. Indeed, almost 80% of them belong to Angiosperms. Thus, there is an important inequality in the sequencing effort. The poor level of plastome sequencing in plant lineages outside of the Angiosperms needs to be improved to fully understand chloroplast genome evolution in plants. Some efforts to fill this gap have been performed in the last 2–5 years, but they are still insufficient. In the Glaucophyta, only one chloroplast genome is available (NC_001675), and another is sequenced but not yet published (Lang et al., unpublished). In contrast, the sequencing of Rhodophyta and Chlorophyta (green algae
2.2.2. Gene content evolution
As mentioned previously, the conversion of the cyanobacterial endosymbiont into a chloroplast necessitated the functional transfer or replacement of most cyanobacterial genes into the nucleus. Compared to the thousands of genes (at least 2000) thought to have been once present in the cyanobacterial genome, Archaeplastida plastomes encode a maximum of around 250 genes [13, 14]. This observation indicates that most genes (includes protein coding and structural RNAs) present in the cyanobacterial ancestor have been functionally transferred relatively soon after the endosymbiotic event. Despite gene content among modern chloroplast genomes being relatively well conserved, there are important variations. Thus, Rhodophyta have the highest number of genes (237 in average; minimum 207; up to 266 in
These variations in gene content revealed the divergent evolution of plastomes in the different lineages. As an example, Rhodophyta gene content is characterized by the complete absence of the NADPH dehydrogenase complex . Conversely, some genes are Rhodophyta-specific or rare in other Archaeplastida such as RNase P RNA, tmRNA, or signal recognition particle RNA [16, 17, 18]. Rhodophyta chloroplasts generally have a large genome size (see later) characterized by a high number of genes and other features such as the presence of bacteria-like operons, suggesting that Rhodophyta plastomes are phylogenetically closest to the ancestral cyanobacteria genome than any other algae . Gene content variations are also well documented in the Angiosperm family in which multiple independent gene losses have been found such as
2.2.3. Size variation
Among plants, chloroplast genomes range from less than 100 kb to more than 1 Gb, again excluding the non-chlorophyll species that exhibit significantly smaller chloroplast genomes (Table 1). The largest chloroplast genome ever sequenced has very recently been found in the red algae
Several factors can explain the important size variations found among the Archaeplastida. In the case of the red algae
2.2.4. Structural evolution
Among plants, most plastomes seem to exhibit a conserved quadripartite structure, with a large and small single copy separated by two inverted repeats (Palmer 1983). However, multiple rearrangements occurred in diverse lineages, which modified this conserved structure. One of the most striking examples is the loss of one IR that occurred multiple times in the different chloroplast-bearing lineages, such as in the Fabaceae and the Geraniaceae [30, 33, 34]. This has also been reported for different Gymnosperms species such as
Chloroplast genome structure and gene order are also highly affected by inversions. Many inversions have been described in the literature, especially in legumes, with, for instance, fragments of 50 kb in the Papilionoideae , 36 kb in the Genistoids ; 29 kb in Sophoreae  or 7 kb in
2.2.5. Evolution rates of plastomes
Chloroplast genomes are known to be highly conserved, with relatively low rates of mutations, especially when compared to the plant nuclear genome. Indeed, the chloroplast genome evolves on average 10 times slower than the nuclear genome , with about 1 or less mutation/kb/million years  compared with approximately 7 mutations/kb/million years for the nuclear genome . However, there are some exceptions, especially in three Angiosperm families (i.e., Fabaceae, Campanulaceae, and Geraniaceae) that are known to have accelerated evolutionary rates of their plastomes along with multiple structural rearrangements and size variations [19, 28, 30, 42, 44, 48, 49]. For example, the
To sum up this first section on the origin and evolution of plant plastomes originating from the primary endosymbiosis event, the recent sequencing and bioinformatics progress significantly increased the number of chloroplast genomes available for the scientific community. These advances have greatly improved our knowledge about the evolutionary dynamics of plastomes. Despite the diversity of organisms that harbor chloroplasts, plastomes in general seem to be relatively well conserved among the Archaeplastida (in terms of structure, size, and gene content); however, multiple independent alterations of these features have been observed in the different lineages. In addition, a few plant families (or group of species) seem to present an atypical evolution of the chloroplast genome. It is certain that the continuous effort to sequence much more plastomes (especially in the Glaucophyta and Rhodophyta) will allow the identification of new examples of such atypical evolution and will permit a better understanding of what are the causes and the molecular mechanisms involved in limiting or increasing plastome evolution.
3. Impact of the cyanobacterial endosymbiosis on plant nuclear genome evolution and origin of chloroplast proteins
Since the endosymbiotic event, the host genome (nuclear) has acquired most of the cyanobacterial genes, leading to the gradual loss of autonomy of the endosymbiont and the reduction of its genome. In this part, we will present our current knowledge on the mechanisms as well as the numerous cases of chloroplast DNA transfers to the nucleus and where it is now integrated in the nuclear genome. We will then detail the subsequent evolution and adaptation processes of the chloroplast genome that took place in its new eukaryotic environment. We will also discuss which factors can influence relocation of a chloroplast gene to the nucleus, and how a chloroplast gene transferred to the nucleus may become functional. Finally, we will discuss the important role that transfer of chloroplast DNA to the nucleus plays in the process of diversifying the plant nuclear gene content.
3.1. DNA transfer from the chloroplast to the nucleus
Much earlier than the complete sequencing and assembly of the first chloroplast genome (
3.2. Short-term and long-term evolution of chloroplast DNA transferred to the nucleus
Some of the chloroplast DNA fragments that were experimentally shown to insert in the nuclear genome were characterized [55, 60] and were often large in size (usually greater than 10 kb in length). Considering the massive transfer of chloroplast DNA to the nucleus, one would expect that some of these
3.3. Functional replacement of hundreds to thousands chloroplast proteins in the nucleus
Following endosymbiosis, the symbiont to organelle transition involved many steps. This includes the loss of the bacterial cell wall, the acquisition of a protein machinery that transfers nuclear-encoded proteins from the cytosol to the chloroplast (also known as the TIC and TOC complexes [75, 76]), and finally, the functional relocation of most chloroplast genes to the nucleus. As detailed below, a chloroplast gene may be replaced either only after its functional transfer to the nucleus, or directly substituted by a gene of a mitochondrial or eukaryotic origin.
Since the endosymbiosis event, thousands of genes have relocated within the nuclear genome. Indeed, cyanobacterial genomes encode a minimum of 2000 proteins, whereas current plant plastomes encode only 80–200 proteins, although 800 to more than 2000 proteins have been found in some algae and plant chloroplasts , respectively. Apart from some genes that presented redundant functions in both chloroplast and nuclear genomes, most chloroplast genes have been functionally relocated to the nucleus with their proteins targeted back to the organelle. Thus, the spectrum of proteins required for function and biogenesis of the cytoplasmic organelle did not greatly evolve since its creation.
3.3.1. Functional transfer and relocation of a chloroplast gene to the nucleus
The current plastome of most plants encodes a maximum of 200 proteins  whereas more than 2000 proteins in the chloroplast, suggesting the functional gene transfer and relocation of most chloroplast genes to the nucleus. As chloroplast genes are of prokaryote origin, they are not readily functional in the nuclear genome. To function in this novel environment, a chloroplast gene has to acquire or hijack nuclear gene regulatory elements (eukaryote promoter and terminator), as well as a transit peptide to target the protein back to the chloroplast [60, 79]. However, the acquisition of all these nuclear elements does not have to take place right after the transfer of the chloroplast gene to the nucleus, as they can retain their open reading frames for several million years . In addition, some chloroplast genes can be relatively easily functional as a few chloroplast promoters (i.e.,
To date, the number of chloroplast-encoded proteins (about 80) is relatively well conserved among flowering plants. However, a few chloroplast genes have been independently lost in various plant lineages , allowing to understand how they became functional. Such chloroplast gene losses were most particularly observed in the Fabaceae, for which the plastome has been extensively reorganized and contains localized accelerated mutation rates . Some of these genes, such as
3.3.2. Functional replacement of a chloroplast gene by a gene of mitochondrial (prokaryotic) or eukaryote origin
The functional replacement of a chloroplast gene does not necessarily necessitate its functional transfer from the chloroplast to the nucleus. In the case of the chloroplast RPS16 protein, the chloroplast
Another evolutionary mechanism enabling the functional replacement of a chloroplast gene may occur
The continuous deluge of organellar DNA to the nucleus has facilitated the functional transfer of almost all chloroplast genes to the nucleus, reducing extensively the plastome size. Additionally, this organellar DNA was not only used to replace organellar genes but also enabled diversifying the plant nuclear gene content .
3.4. Importance of chloroplast DNA transferred to the nucleus in diversifying the plant nuclear gene content
Chloroplast gene sequences transferred to the nucleus may present different fates. As presented in the two previous sections: (i) they may remain non-functional, decay, and ultimately be lost; (ii) they may acquire all the necessary elements to conserve the same function and have the protein targeted back to the chloroplast; or (iii) they may acquire new subcellular locations and functions. As mentioned earlier, Martin
4. Cytonuclear interactions, coadaptation processes, and incompatibilities
The conversion of the cyanobacterial endosymbiont into the chloroplast partly results from the gradual transfer of hundreds to thousands of endosymbiont genes to the nuclear host. Across all lineages, more than 90% of the plant chloroplast proteins are now encoded in the nucleus. Within the few chloroplast-encoded proteins, about 40% of them are involved in chloroplast protein complexes that are made up of proteins encoded in both the chloroplast and the nucleus. These complexes exhibit important functions that are vital for the plant, such as photosystems I and II. One can only wonder how the stoichiometry between those two compartments is maintained. Indeed, one cell might contain hundreds to thousands of chloroplast copies compared to only one copy in the nucleus. Furthermore, chloroplast inheritance is often maternal, whereas nuclear bi-parental inheritance occurs in angiosperms during sexual reproduction. Therefore, coevolving interactions between cytoplasmic and nuclear genomes have been necessary and have resulted in significant coadaptation processes. When these fine-tuned coevolutionary interactions are disrupted, after intra-interspecific hybridization and/or genome doubling, for instance, incompatibilities and deleterious phenotypes can be observed. These evolutionary processes will be discussed in the light of previous work on synthetic and natural hybrids, as well as in polyploid species.
4.1. Hybridization and cytonuclear intergenomic complexes
Several evolutionary scenarios can explain coadaptation between chloroplast and nuclear genomes after intraspecific hybridization. First, cytoplasmic genomes lack sexual reproduction and are more susceptible to fix and accumulate deleterious mutations by genetic drift . Only positive selection for compensatory nuclear alleles will allow for regaining of optimal organelle function . This mechanism of
Second, some mutations in the organelles could also be adaptive in specific environments and fixed in the population by natural selection. Subsequently, coadaptation process may favor specific nuclear variants to preserve intergenomic interactions. This mechanism is called
As mentioned above, the examples for intergenomic coadaptation and incompatibilities are scarce, and we are still very far from unraveling the molecular processes underlying such interactions. Applications of genome-wide studies in association with high-throughput sequencing would greatly improve our understanding of cytonuclear coevolution.
4.2. Effects of whole genome doubling and interspecific hybridization on cytonuclear complex stability
As shown above, cytonuclear interactions are extremely fine-tuned coevolved molecular processes that are still largely understudied. However, in recent years, efforts have been made, especially in neo-polyploid plant species (natural and resynthesized) to better apprehend the consequences of whole genome duplication (WGD) and interspecific hybridization on cytonuclear interactions and stability. In this last section, we will review our knowledge on such systems and elaborate on the many future issues to address.
Although completely overlooked, it is astonishing to envision the numerous and drastic consequences of a WGD event on copy number variation and stoichiometry on those cytonuclear complexes. Impacts of WGD on genomic structure and functional changes have been extensively studied in a large variety of plant systems. Genome redundancy can lead to changes in epigenetic patterns (including transposable element dynamics), altered gene expression (changes in global gene expression but also possible biased contribution of redundant copies), and fractionation processes (gene loss, homologous and non-homologous exchanges). However, to date, very few studies have investigated how the duplication of nuclear genes would affect the assembly dynamics of the multi-subunit cytonuclear complexes . Different hypotheses predict the fate of nuclear and cytoplasmic genes implicated in cytonuclear complexes. They are based on the prediction that selection will favor compensatory mechanisms to maintain coordinated expression between cytoplasmic and nuclear genes leading
Only a handful of studies have looked at the consequences of WGD on a longer time scale, in that case, occurrences of subfunctionalization and pseudogenization of duplicated copies are to be expected. Coate et al.  stated that there might be a considerable influence of cytonuclear complex sensitivity to gene dosage imbalance and thus the need to return to single copy status or stay in duplicates. More specifically, Coate et al.  demonstrated that in
All of these processes could be enhanced through allopolyploidy, where divergent parental species first hybridized before genome doubling. In that case, the nuclear genome is redundant and a mixture of two, more or less, divergent parental genomes, whereas the organelles have (usually) a uniparental origin. Therefore, as chloroplast inheritance is usually maternal, selection should favor maintenance of maternal nuclear copies over the paternally inherited homoeolog as to preserve pre-existing coadaptive cytonuclear interactions. In allopolyploids, different scenarios leading to pseudogenization of paternal copies can be envisioned and were tested in a limited set of genes and species. The first scenario involves downregulation and relaxed selection of the paternally inherited homoeolog. An alternative scenario involves preferential gene conversion to the maternal homoeolog resulting in the loss of the paternal-like copy. It is important to note that both scenarios are not exclusive but could be part of a dynamic and gradual process, with first overexpression of the maternal copies leading to paternal homoeolog pseudogenization and maternally biased gene conversion. These hypotheses have only been tested in the Rubisco nuclear-encoded gene
These few studies already highlight the complexity of the different model systems that can be highly influenced by various evolutionary processes such as pre-existing coadaptive mechanisms, natural selection, and divergence between parental individuals (different populations to different species). As all Angiosperms have experienced at least one round of genome duplication and most of them multiple WGDs (Triticum and Brassica), paleopolyploid species are perfect candidates to elucidate the long-term impact of diploidization and biased genome fractionation on rates of asymmetric gene loss and pseudogenization. Additionally, it seems essential to integrate plant families that have contrasted rate of chloroplastic evolution (such as in Geraniaceae, Campanulaceae, and Fabaceae) and paternally inherited chloroplast genomes (such as in Actinidia, Medicago, and most Conifers). Finally, life history features such as reproductive strategy (perennial vs. annual), mating system (selfer vs. outcrosser), population level dynamics, and effective population size will also impact fixation rate of mutations.
We would like to thank the European Union Seventh Framework Program (FP7-CIG-2013-2017; Grant No. 333709 to Mathieu Rousseau-Gueutin) and an Agreenskills Plus fellowship to Julie Ferreira de Carvalho. We would also like to thank Dr. Christina Richards (Department of Integrative Biology, University of South Florida) for carefully and critical reading of the manuscript.
Conflict of interest
No conflict of interest.