Elastin is the extracellular matrix protein providing large arteries, lung parenchyma and skin with the properties of extensibility and elastic recoil. Within these tissues, elastin is found as a polymer formed by tropoelastin monomers assembled and cross-linked. In addition to specific protein regions supporting the covalent cross-links, tropoelastin is featured by the presence of highly repetitive sequences rich in proline and glycine making up the so-called hydrophobic domains. These protein segments promote structural flexibility and disordered protein properties, a fundamental aspect to explain its elastomeric behavior. Unlike other matrix proteins such as collagens or laminins, elastin emerged relatively late in evolution, appearing at the divergence of jawed and jawless fishes, therefore present in all species from sharks to humans, but absent in lampreys and other lower chordates and invertebrates. In spite of an intense interrogation of the key aspects in the evolution of elastin, its origin remains still elusive and an ancestral protein that could give rise to a primordial elastin is not known. In this chapter, I review the main molecular features of tropoelastin and the available knowledge on its evolutionary history as well as establish hypotheses for its origin. Considering the remarkable similarities between the hydrophobic domains of the first recognizable elastin gene from the elasmobranch Callorhinchus milii with certain fibrillin regions from related fish species, I raise the possibility that fibrillins might have provided protein domains to an ancestral elastin that thereafter underwent significant evolutionary changes to give the elastin forms found today.
- extracellular matrix
Elastin is an extracellular matrix (ECM) component of tissues such as the large arteries and lung parenchyma, among others. Even if it is considerably less abundant compared to other matrix proteins, such as collagens, it impacts the biomechanical properties as it is ultimately responsible for the extensibility and elastic recoil . Different aspects make elastin an unusual protein, for example, its molecular structure, fundamental to understand its function, or the complexity of the mechanisms giving to the formation of elastin-based polymers in the ECM. In the genomics era, where uncountable genomes are decoded and extraordinary valuable information on the phylogenetic relationships has been established, it is remarkable that the evolutionary origin of elastin remains unclear. This fact looks anomalous considering that its roots are relatively close to us in an evolutionary scale, compared to other cell components originated at the dawn of life. It is worthwhile and the main objective of this work to review the main molecular features of elastin and the current knowledge on its evolutionary history as well as to explore new avenues to explain its origin. A couple of hypotheses are put on the table to stimulate (and provoke) discussion and further research. In this regard, I can only recognize the contributions of Professor Fred W. Keely to the overall understanding of the biology of elastin, and particularly to its evolutionary relationships .
2. Molecular features of tropoelastin
Most ECM components such as collagens or fibronectin are usually large polypeptides with numerous domains that include repetitive motifs allowing multimerization into supramolecular assemblies . Tropoelastin, the monomer making up polymeric elastin, features some of these characteristics but is indeed an unusual ECM protein. While fibril-forming collagens such as types I and II contain almost 1500 residues and fibronectin goes far beyond 2000, tropoelastin rarely exceeds 800 aminoacids, with some species such as the dog harboring elastin chains of about 500 residues. Moreover, crystal structures of particular domains of collagens or fibronectin have been determined by X-ray diffraction whereas those of elastin have persistently remained elusive [4, 5]. This is in fact a consequence of the disordered nature of the elastin polypeptide, a key feature to understand its elastomeric properties [6, 7]. Analysis of all known elastins shows an alternation between lysine-rich and hydrophobic domains. The former contain the lysine residues destined to take part in the covalent cross-linking catalyzed by members of the lysyl oxidase (LOX) family, therefore named cross-linking domains. The latter are consistently rich in hydrophobic domains and have been shown to be essential for the elastomeric behavior . With noticeable variations, these domains contain stretches of the sequence VPGVG in multiple combinations either forming repeats or included within glycine/proline-rich segments. Molecular dynamic simulations and experimental studies have shown that the hydrophobic motifs support the disordered state by promoting structural flexibility [6, 8, 9]. In fact, this plasticity is likely not limited to the hydrophobic domains but also impacts the whole molecule, considering the fact that lysine oxidation and further condensation of cross-linking domains occur in a random manner, as recently reported . The quasi-stochastic assembly of highly flexible tropoelastin monomers results in aggregates that exhibit the elastomeric properties and are the basis for the formation of elastic fibers in numerous tissues, including the lung, skin and blood vessels.
3. Evolutionary history of elastin
It may be easily inferred that this sophisticated natural material has required thousands of millions of years to be shaped by the evolution. However, unlike collagens or laminins, whose origin dates back 770–880 million years (Myr) to the emergence of the Metazoa, tropoelastin appeared late on stage [2, 11]. In fact, not even it emerged with the blood vascular systems, a feature of vertebrates and many invertebrates, but it made its debut 400 Myr ago with the jawed vertebrates, therefore absent in jawless fishes such as the lamprey and hagfish. Elastin has been repeatedly invoked as the vascular component that allowed the development of closed circulatory systems. However, these are present in a wide variety of invertebrates including annelids, cephalopods and non-vertebrate chordates . Here, the magic word is blood pressure. It is actually the presence of elastin that led to closed systems exhibiting high pressures (from 30 to 200 mmHg) in jawed vertebrates, in contrast to non-elastin based systems in invertebrates and lower vertebrates, with blood pressures values ranging from one or a few mmHg up to 20–30 mmHg. Most ancient elastin so far reported includes that from the elasmobranch fish Callorhinchus milii (elephant shark) and displays the characteristic alternating pattern of hydrophobic and cross-linking domains (Figure 1). Amphibians such as the western clawed frog (Xenopus tropicalis) or teleost fishes such as zebrafish (Danio rerio) and fugu (Takifugu rubripes) feature two versions of tropoelastins, elastin a (elna) and elastin b (elnb), compared to other vertebrate genomes that possesses a single gene. The case of zebrafish is particularly interesting as specialization of elnb contributed to the smooth muscle-like characteristics of the bulbus arteriosus, a chamber of the heart zebrafish that is homologous to the aortic trunk of higher vertebrates .
Search and identification of elastin sequences in genome databases from different organisms is crucial to delineate its evolutionary history, and to this aim, a significant number of sequences are known today [2, 14, 15]. Nevertheless, an accurate phylogenetic reconstruction of elastin evolution is still quite incomplete. Focus has been placed on different parts of the gene, including a central conserved region, the C-terminal, the 3′-untranslated region and a region presumably resulting from exon replication. However, reaching a unified picture has been difficult. Being an intrinsically disordered protein (IDP) does not make things easier as IDPs lack strict structural constraints, and therefore are more permissive to substitutions . With the only restriction that the conformational flexibility not be altered, IDPs evolve faster than well-folded proteins adding higher complexity to their phylogenetic analyses . To this respect, the soluble monomer of lamprin, the non-collagen/non-elastin major connective tissue component of the lamprey annular cartilage, contains tandem repeats of the sequence GGLGY that are recognized by anti-elastin antibodies targeting the VPG repeats of elastin in a remarkable example of evolutionary convergence . In fact, more distant polypeptides such as some insect proteins or spider silks have also acquired these repeats .
Despite the difficulties, phylogenetic trees such as that shown in Figure 2 based on the central conserved region have been generated.
4. Hypotheses on the evolutionary origin of elastin
As mentioned above, genomic roots of tropoelastin trace back to the elephant shark and related species. Recent publication and open access to whole genome sequences and assemblies of lower vertebrates/chordates have not shown any traceable sign of tropoelastin-related sequences, and that was also true for genomes of invertebrates. These findings (or the lack of them) raise questions as to the origin of tropoelastin and the existence of an ancestral protein. Here, two main hypotheses are proposed to explain its emergence and further evolution: (1) tropoelastin appeared de novo; and (2) like other ECM components, it emerged from the assembly of preexisting proteins that eventually gained novel capabilities. What follows discusses evidences and arguments for and against these hypotheses.
4.1 Tropoelastin as a de novo protein
Following Darwin’s postulates, the general assumption is that new genes evolve from existing ones in an endless, slow-paced journey since the beginning of life. However, recent studies are showing that this has not been always the case and that new genes can arise from the dark depths of the non-coding genome . By gaining the capability of being transcribed and translated, stretches of “junk” DNA can give rise to de novo protein products. Interestingly, when deeply studied, de novo genes produce firstly dysfunctional or disordered proteins and in many cases with repetitive sequences . Therefore, it is not unreasonable to consider that an ancestral tropoelastin might have emerged as a de novo gene. The identification of de novo genes is mostly based on the comparison of syntenic regions. This type of analysis has revealed, for instance, that the gene FLJ33706, overexpressed in Alzheimer’s disease, appeared in human after the divergence from chimpanzee . Another three human genes of unknown function have been described to originate from chimp non-coding DNA . Unfortunately, synteny of genetic loci is often lost over long evolutionary timescales. Therefore, distant genomic events resulting in de novo products are difficult if not impossible to identify. Its emergence in the crossroad of jawed vertebrates places tropoelastin in an unfavorable scenario. Studies so far performed have not found evidences for its de novo origin.
4.2 Reorganization or assembly from pre-existing components
It is a recurring theme in the evolutionary history of ECM proteins that the gradual appearance of specific gene families and domains, often in pre-metazoan lineages, allowed thereafter their assembly and formation of matrix components genuine to animals . This has been the case for matrix proteins such as fibrillar or basement membrane collagens, and for matrix-remodeling enzymes like LOX (see Figure 2). The late emergence of tropoelastin does not fit with this behavior. As mentioned above, no single tropoelastin-related sequence has been found in genomes back to the elephant shark in the evolutionary scale. Or yes? Before the onset of tropoelastin, microfibrils were largely responsible for tissue elasticity in many species. Extracellular matrix structures such as the mesoglea from the cnidarian jellyfish or the blood vessels in invertebrates are functionally elastic due to microfibrils [25, 26]. These supramolecular structures, visualized as beaded filaments under electron microscopy, contain numerous proteins, being fibrillins the major constituent. Like many other ECM components, fibrillins, from which three isoforms exist in humans, fibrillin-1, −2 and − 3, are multidomain proteins that expand along a large polypeptide sequence of almost 3000 aminoacids . The large size and the variety of domains explain the existence of multiple diseases caused by defects in fibrillins, named fibrillinopathies, including various forms of Marfan syndrome, isolated ectopia lentis, kyphoscoliosis, Shprintzen-Golberg syndrome, and stiff skin syndrome, among others . Epidermal growth factor-like domains (EGF and calcium binding EGF) dominate the structure, with 46–47 repeats, followed by transforming growth factor (TGF)-β binding protein domains (TB) and hybrid domains with 8 and 2 repeats, respectively (Figure 3). TB domains are shared with latent TGF-β binding proteins (LTBP) and have served to compute phylogenetic reconstructions for these proteins . Using this approach, a TB domain-containing protein was identified in cnidarians, dating the emergence of an ancestral fibrillin to 600 Myr ago. This ancestral fibrillin, not only present in cnidarians, but also in molluscs, annelids, arthropods, echinoderms, urochordates, cephalochordates and lower vertebrates, such as the lamprey, underwent a duplication event at the divergence of jawed and jawless fishes giving to fibrillin-1 and an ancestral fibrillin-2/3. Interestingly, just before this branching, the ancestral fibrillin gained (or reshaped) a domain characterized by a high content of proline- and/or glycine termed “unique region”, claimed to provide a flexible behavior and for which a specific function has not yet been demonstrated (see also Figure 3) . In fact, when looking carefully to the these domains from different species including jawed and jawless fishes, a clear evolution from a short sequence with just a few proline and glycine residues as seen in the ascidian Ciona intestinalis or the lancelet Branchiostoma floridae to a longer segment that increases its proline/glycine content in the lamprey and progresses to extra-long fragments with a significant number of VPG-containing repeats, such as the Japanese pufferfish Takifugu rubripes. These sequences remarkably resembles the hydrophobic domains of tropoelastin, particularly those seen in the first jawed fish such as the elephant shark (see Figure 1), and, intriguingly, their appearance is evolutionary coincident with that of elastin and the high blood pressure closed circulatory systems in these organisms. It is tempting to speculate that the VPG-containing repeats from fibrillin-1 contributed to the assembly of an ancestral elastin (Figure 4). Subsequent changes giving to KA or KP domains or their incorporation from an unknown ancestor, as well as extensive domain duplication and expansion, might have ended up sculpting the elastin backbone as it is found in elastin-expressing species living today. In fact, in these organisms, microfibrils provide the scaffolding platform where elastogenesis takes places, making the entire microfibril-elastic fiber unit the material responsible for the biomechanical properties of tissues such as the lung, blood vessels and skin . Within this context, it has been speculated that the unique region in fibrillin-1 evolved to support the interaction with elastin [29, 31]. Considering that intrinsically disorder regions can use their flexibility to allow the association between two (or more) IDPs, it is not misconceived to think that the acquisition of this domain by an ancestral tropoelastin may have served to mutually establish the binding both fibrillin-1 and elastin . Constraints to keep the structural conformation rather than the primary sequence may have then blurred the phylogenetic relationships, making difficult to trace back the origin of this genomic event. Curiously, the unique region of fibrillin-1 in higher vertebrates dynamically evolved losing the VPG-containing repeats while still keeping a high content of proline residues, perhaps reflecting novel requirements in the fibrillin-1/elastin interaction during elastogenesis in these species.
5. Concluding remarks
Whether elastin evolved as a de novo protein or derived from a pre-existing fibrillin-1 (or any other unknown) gene remains with the available genomic information as an obscure enigma. The intention of this chapter was to bring together the current knowledge about the evolutionary history of elastin and to discuss the hypotheses that eventually may explain its origin. While this is certainly in the realm of speculation, it is hoped that the sequencing and annotation of more genomes as well as the advent of further molecular and genomic analyses will permit to get more insight about the evolutionary roots of this fascinating protein.
Sequences used in this work are:
Elephant shark predicted elastin isoform X1 (Callorhinchus milii) [Genbank XP_007894595].
Human fibrillin-1 preproprotein (Homo sapiens) [Genbank NP_000129].
Zebrafish fibrillin-1 (Danio rerio) [Genbank XP_017207479].
Stickleback fibrillin-1 (Gasterosteus aculeatus) [UniProtKB G3PX14].
Fugu fibrillin-1 isoform X1 (Takifugu rubripes) [Genbank XP_003969883.1].
Pufferfish fibrillin 1 (Tetraodon nigroviridis) [UniProtKB H3C692].
Elephant shark predicted fibrillin-1-like, partial (Callorhinchus milii) [Genbank XP_007909428].
Sea lamprey putative fibrillin-1 (Petromyzon marinus) [UniProtKB S4RBV9].
Lancelet putative fibrillin-1 (Branchiostoma floridae) [Genbank XP_002601550].
Fibrillin-1 (Ciona intestinalis) [Genbank XP_009858101].
We thank M. Mar Alba (Universitat Pompeu Fabra, Barcelona, Spain) for helpful comments.
I acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).