The story of DNA structure is as varied as it is interesting, the most famous tale being the “discovery” of B-DNA by Watson and Crick. For many biologists, this simple, but elegant structure is all that is needed for a basic, albeit superficial understanding of cellular genetics. A deeper appreciation for how DNA functions comes from the recognition that this is a highly malleable molecule, providing the cell with a plethora of conformations to exploit in replication and transcription.Some of these conformations can give rise to mistakes, while others help to repair those mistakes in the genetic code. In this chapter, we dive into the cellular pot and find a literal alphabet soup of DNA structures. We start our journey by presenting the fundamental principles that serve as the vocabulary to analyze and describe the features of nucleic acid structures. We will explore the conformational variations that lead from double-helices to complexes composed of three or four strands, then consider how conformations interconvert through various intermediates. Although B-DNA is the standard form in the cell, we suggest that this dance away from the norm is essential for cellular function, giving the cell life and, hence, its genetic soul.
Replication is the process by which the cell creates an exact copy of the genetic information encoded in DNA—it is thus intuitive that we would be interested in the actual structure of DNA as a molecule. One would think that, for replication, we need only be concerned with the DNA duplex at the beginning, the single-stranded intermediate state, and the final duplex, since these structures generally tell us how the information is stored and read, and what the resulting product is. However, it is becoming clear that although the general structure of DNA is important in the overall mechanism of replication (Watson & Crick, 1953a), the conformational details are important for understanding how proteins recognize their cognate DNA sequence, and how mutations may be introduced and are repaired. Thus, we must explore and dissect the details in terms of variations that define the particular sequence dependent shape of DNA.
We will not attempt the impossible task of covering every aspect of DNA structure, only those that may be relevant to replication. Also, as crystallographers, we will have a bias towards studies derived from X-ray diffraction and other physical methods, although we will always attempt to relate these back to the biology of replication. In the process, we will explore the details of DNA structure that help elucidate structural principles that contribute to our understanding of the mechanism and fidelity of the replicative process.
2. A brief history of DNA structure
DNA structure has had over 55 years of history and, in that time, has undergone periods of discovery that have pushed the field forward in spurts. The evidence that DNA is the genetic molecule in the cell came from the studies of Avery, MacLeod, and McCarty (Avery et al., 1944), and confirmed by Hershey and Chase (Hershey and Chase, 1952). The seminal experiments of Meselsen and Stahl (Meselson and Stahl, 1958) using heavy atom labeled DNA demonstrated that replication is semiconservative, with each newly replicated daughter strand being paired with one of the two parental strands. These classic studies from the 1940’s and 1950’s set the stage for a race to determine the molecular structure of DNA, a now familiar story that helps to bring perspective to the discussions in this chapter.
2.1. The race for the structure of DNA: X-ray fiber diffraction studies.
The key element in the race towards the structure of DNA was the availability of X-ray diffraction photographs of DNA fibers, the best of which came from the work of Franklin and Gosling in the lab of John Randall. It was clear at the time that DNA could adopt two different forms, an A-form under low humidity and a B-form at higher humidity. The A-DNA form gave the highest resolution data (Franklin and Gosling, 1953a), but, it was the lower resolution photograph of the “wetter” B-form (Franklin and Gosling, 1953b) (Fig. 1) that was more readily interpretable. From this photograph, DNA was clearly seen to be a helical structure (showing the characteristic “helical-X” in the diffraction pattern), with a repeat of 10 units (reflected in the pattern converging after 10 layer lines), and with a distance between repeating units of 3.4 Å (from the
Often missing from this story is that the Watson-Crick model depended not only on the large amount of biochemical and X-ray diffraction data being generated at the time, but also on a proper understanding of the chemical properties of DNA. One of the most important aspects of the Watson-Crick model was the proposal that guanines paired with cytosines and adenines with thymines. For this to occur, however, the nucleotide bases must be drawn in their proper tautomeric forms; however, up to that point, it was not clear, even to the organic chemists, what those forms should be. The initial assignment of guanine and thymine bases in their enol forms had lead to an early parallel model for DNA (Watson, 1968). It was not until the proper tautomers for the common nucleotides were assigned that the now familiar base pairs of G to C and A to T made sense, and, thus, provide a rationale for the well understood Chargaff rules for the complementary composition of nucleotides in the DNA of higher organisms (Chargaff, 1950) and a mechanism by which exact copies of the sequence information along a strand of DNA could result in an exact copy of a duplex through semiconservative replication (Watson and Crick , 1953a).
2.2. The single-crystal structures of DNA oligonucleotides
At this point, it should be stressed that Watson and Crick did not “discover” or “solve” the structure of DNA, but had presented a plausible and, basically, correct model that made important predictions that, in the end, led to the birth of modern molecular biology. However, several decades will pass before high resolution single crystals structures of synthetic DNAs emerge to support the essential elements of this model. For example, it was not immediately obvious that the Watson-Crick scheme, particularly for A=T base pairs, was correct—at the time the single-crystal structures of adenine bases paired with thymine or uracil bases showed geometries of Hoogsteen-type base pairs (this will be defined in Section 3). It was not until the crystal structure of the RNA dinucleotide phosphate ApU was determined to a remarkable 0.89 Å resolution (in crystallography, lower numbers refer to higher resolution) by Alexander Rich’s group (Rosenberg et al., 1973) that the Watson-Crick form of the A=U (and, thus, the analogous A=T) base pairs were confirmed. The concurrent structure of GpC also confirmed the Watson-Crick form of the G
In the late 1970’s, it became possible to chemically synthesize “long” stretches of a defined DNA sequence for crystallographic studies. In 1979, Rich’s group (Wang et al., 1979) determined the single crystal structure of the DNA sequence CGCGCG (we write only one strand and drop the “p” for the phosphates for the sake of efficiency, even for double-helical structures). This structure showed DNA to be an antiparallel double-helix with Watson-Crick type base pairs, consistent with the 1953 model. However, it came with a new twist—this double-helix was left-handed and was called Z-DNA (for the zig-zagged backbone). It was not until 1981, with the single-crystal structure of the sequence CGCGTATACGCG (known as the Drew-Dickerson dodecamer (Drew et al., 1981)), that the Watson-Crick structure for B-DNA was finally “proven” to be correct.
So, what of the dehydrated A-DNA form that Franklin had worked so hard on and struggled with? Soon after the Watson and Crick model of B-DNA, Franklin and Gosling published the structure of the fiber A-DNA form (Franklin and Gosling, 1953a), with a large number of single-crystals of A-DNA being determined and published in the 1980’s and 1990’s (the “heydays” of DNA crystallography (Mirkin, 2008)). The A-form was subsequently shown to be the native form of RNA duplexes, while DNA/RNA hybrids (primers for replication initiation) can interchange between the A- and B-forms.
Although it is well accepted that the B-DNA form is the most prevalent form in solution and in the cell, there is now a myriad of single-crystal DNA structures, including those assembled as double-, triple-, quadruple-, and even hexa- and octa-stranded complexes. There are hairpins from single-strands, structures with overhangs, etc., and a plethora of forms seen in complexes with proteins. We will discuss some of these in greater detail in Section 4 along with their relevant cellular functions, focusing on replication and the associated processes. First, we must delve into the detailed vocabulary used to describe DNA structure and provide a common language for the remainder of the chapter.
3. A vocabulary lesson for DNA structure
As with any description of a biopolymer, we will start the discussion of DNA structure at the simplest unit (the nucleotide building block), then develop the concepts of structure with increasing size and complexity. In order to reach this stage of complexity, we must first define terms that will be used in discussing DNA structure at all levels.
3.1. General principles
Almost every student today knows that DNA is composed of four basic building blocks, each defined by the unique chemical structure of the aromatic base, and each base attached to a phosphodeoxyribose backbone. The four common deoxyribonucleotides are categorized as the purine (deoxyadenosine, dA, and deoxyguanosine, dG) or pyrimidine (deoxythymidine, dT, and deoxycytosine, dC) nucleotides. The atoms of sugars are distinguished from those of the bases by a “prime” added to the atom name, so that the sugar carbons are C1’, C2’, C3’, C4’, C5’(Fig. 2), starting with the carbon at the glycosidic bond that attaches the base to the sugar, and so forth around the ring. The deoxynucleotides of DNA lack a O2’ oxygen, which distinguishes them from ribonucleotides (RNA). For simplicity, we will simply assume the deoxyform and drop the “deoxy” and “d” prefixes from this point on (Hendrickson et al., 1988).
3.2. What defines a stable DNA structure?
DNA in its functional form is not the isolated nucleotides, but a polymer built from the mononucleotides (G, C, A, T). A DNA polymer is constructed through condensation to form a phosphodiester linkage that bridges the O3’ and O5’ oxygens of sequential nucleotides (Fig. 2A). The primary structure, or sequence, of a DNA polymer strand is written in the direction that they are synthesized in the cell, starting at the free O5’ oxygen (5’-end) and progresses to the free O3’–end. Two complementary strands are brought together in a sequence specific manner to form an antiparallel double-strand, aligning one strand in the 5’ to 3’ direction and the complement 3’ to 5’. Nearly all functional secondary structures of DNA are multi-stranded, most commonly double-stranded. As the sequence of one strand dictates that of its complement, double-stranded DNA is often considered as a single biological molecule, even though the strands are not covalently linked.
3.2.1. Base pairing
Unlike proteins and RNA, the functional forms of DNA are typically complexes comprised of two or more strands, which are stabilized by base pairing, base stacking, and solvent interactions. Of these, base pairing is best understood for its important role in specifying the sequence of newly synthesized DNA during replication and in general sequence recognition, but is perhaps the most misunderstood for its contribution to DNA stability.
The most commonly recognized form of DNA, B-DNA, is the double-stranded duplex stabilized by Watson-Crick base pairing (Fig. 2B). In standard Watson-Crick G
Non-standard base pairs play critical roles in the varied structures observed in DNA and RNA. Wobble, mismatched, and reverse base pairs still use the Watson-Crick edges for hydrogen bonding. Reverse Watson-Crick base pairs are found in parallel duplexes, but are not immediately relevant to DNA replication. Wobble base pairing (Fig. 3A) is seen in mismatches between G
Hoogsteen base pairs take advantage of the Hoogsteen edge of a purine base, which is orthogonal to and, thus, can be accessed without disrupting the Watson-Crick base pairing edge (Fig. 3B). Consequently, Hoogsteen interactions allow the assembly of multi-stranded DNA complexes, including triplet helixes and G-quadruplexes.
3.2.2. Base stacking
Although not as intuitive, the stacking of bases into a column is as or more critical to the stability of multistranded DNAs (duplexes, triplexes, tetraplexes, etc) as base pairing. It is estimated that base stacking contributes as much as half of the total stabilizing free energy of a base pair in duplex DNA (Kool, 2001). Van der Waals interactions, electrostatic interactions, and solvent effects define the geometry and associated energies of stacked bases. Van der Waals forces drive bases to stack in a way that best complements their surface topologies. In addition, individual atoms carry permanent partial charges that contribute to either Coulombic attraction or repulsion between bases. This can be modeled as interactions between permanent dipoles, and it is this dipolar interaction, in conjunction with shape complementarities that helps to define the orientation of the stacked bases. The specific orientation of stacked base pairs contributes to the conformational stability of a DNA duplex. Likewise, deformations associated with specific base stacking geometries contribute to the mechanism of indirect sequence specific binding and recognition by proteins. Finally, since the nucleotide bases are aromatic and, therefore, primarily hydrophobic, stacking minimizes the solvent exposure of the base surfaces, thus, leading to the familiar face-to-face stacking of bases and base pairs. It is not surprising, therefore, that DNA conformations that increase exposure of bases are stabilized by organic solvents.
3.2.3. The phosphodeoxyribose backbone
The functional form of DNA links nucleotides together by phosphodiester bonds to form a continuous DNA strand. Phosphodiesters are highly acidic (
The overall charge of DNA in solution is not simply a sum of -1 for each nucleotide—the backbone charges are counterbalanced by positive cations that accumulate around the DNA. These counterions are simple ions (monovalent Na+ and K+, or divalent Mg+2 and Ca+2 being the most prevalent in a cell), but include cationic polyamines (spermine and spermidine), drugs (ethidium or
When a protein, such as DNA polymerase, binds to DNA, it must competitively displace the counterions associated with the DNA backbone. For example, nucleosome formation, which helps compact DNA in eukaryotes, is primarily driven by nonspecific interactions of the positive histones with the negative DNA backbone. In order to replicate or transcribe the information of the DNA, the respective polymerase and all of its associated proteins must compete against these non-specific interactions. Thus, the negative charge of the backbone is a platform for sequence independent electrostatic interactions with proteins in the cell(Rohs, et al., 2009).
3.2.4. Solvent effects
As with any biological molecule, solvent interactions directly influence DNA structure and function. Base pairing and stacking are in part stabilized by the hydrophobic effect. We have already seen how solvent (considered to consist primarily of water and salts) induces base pairs to stack and defines the effective charge of the phosphoribose backbone. Even base pairing is affected by solvent interactions. In forming a base pair, the hydrogen bond donor and acceptor groups of each base must break hydrogen bonds with water molecules first. If the enthalpy of any single hydrogen bond from one base to another base is essentially the same as they are from the base to water, why then do bases pair and exclude water (at 55.5 M concentration)? The primary answer is that sequestering hydrogen-bonding groups from the competing interactions of water increase the hydrogen bonding potential (Klotz, 1962). One can see from this why base stacking is so important in stabilizing double-, triple, and other multistranded DNA forms that are assembled through hydrogen bonding.
Water, however, is not entirely excluded from, but plays an important role in the structure of DNA. Even in a fully base paired duplex, numerous hydrogen bond donor and acceptor groups of the backbone and bases must be hydrated. There are classes of waters that can, in fact, be considered integral components of a DNA’s structure. In a G
Finally, we must briefly discuss how solvent plays a role in DNA function. DNA is a hydrated molecule, until it is bound to a protein, at which point the DNA becomes dehydrated—
3.3. Conformations of the deoxyribose sugar
In addition to charge effects, the phosphoribose backbone helps to define the conformation of DNA
The bonds in the furanose ring are distinguished from those that flow linearly from one nucleotide to the next, and are designated as ν
The base of each nucleotide is attached
3.4. Helical parameters
Now that we have assembled well-defined helical structures, how do we describe these structures? We can certainly do this in a very descriptive and qualitative manner, using the classical A- and B-forms as examples. For instance, we can characterize the standard B-form of DNA as a right-handed double-helix held together by Watson-Crick type base pairs that stack directly along a helical axis, resulting in two well defined grooves. However, this raises numerous questions, for example, at which point does a distortion to the Watson-Crick base pair become a wobble base pair, how far off the helix axis is allowed in this definition, and what if the helix axis is not straight? To address these and other questions, a set of quantitative measures called the “helical parameters” were developed to characterize the regular secondary structures of nucleic acids (both DNA and RNA)(Lavery, 1998).
The most commonly recognized parameters for DNA include the helical repeat (number of base pairs in one complete turn) and the helical rise (distance between nucleotides when measured along the helical axis). The repeat defines the angle relating each base pair along the helix axis (the helical twist = 360°/repeat), while the product of repeat and rise is the pitch (distance between one complete turn) of the DNA. These parameters restrict the geometries of the DNA. Indeed, if we consider only the closest physical approach between base pairs (the rise = 3.4 Å, as defined by the thickness of a base), the maximum phosphate-phosphate distance along a strand (measured at ~7.5 Å by single-molecule stretching (Allemandet al., 1998)), and the effective diameter of a duplex (9.5 Å), we see that the largest twist angle between stacked base pairs is ~42°, resulting in a smallest theoretical repeat of 8.5 base pairs per turn. This would be the most tightly or over-wound form of a DNA double-helix. If the phosphate-to-phosphate distance is relaxed to ~7 Å (for a C2’-
The helical parameters can be categorized into two general classes to describe the absolute and relative conformations in nucleic acids (Fig. 6); base-pair parameters (for single base pairs) and base step parameters (for adjacent base pairs). We note that these classes are not mutually exclusive, but are interrelated. Twist and rise are clearly base step parameters, since they describe the relative angle and distance between two adjacent stacked base pairs. The other base-step parameters that are generally considered relevant include slide, roll, tilt, and shift. It is easy to see that slide can effectively increase the diameter of a DNA duplex and, consequently affect the helical twist and repeat. A-DNA, for example, shows a large slide between base pairs, while B-DNAs have small slides, placingthe base pairs essentially stacked on top of each other. Not surprisingly, therefore, A-DNA has a larger overall diameter and, in fact, appears to have a hole down the middle when viewed down its helical axis.
A conundrum in A-DNA is that it has a rise of ~2.5 Å, which would appear to violate the closest approach between stacked base pairs. In this case, the inclination associated with the roll and tilt of the base pairs, in conjunction with the helical twist result in a shortening of the vertical distance between base pairs along the helical axis, even though the stacking distance remains 3.4 Å. Indeed, A-like DNAs that have little or no roll and tilt have helical rises that are ~3.4 Å, as expected (Ng et al., 2000; Vargasonet al., 2001).
Base pair parameters include those that relate the position or orientation of the base pair relative to the helical axis (inclination,
Each of these base pair and base step parameters are defined relative to the helical axis that runs down the center of DNA. However, it should be recognized that defining this axis is not entirely straight forward, particularly if the DNA trajectory is bent or curved. There are two approaches to defining helical axes: the global axis and the local axis. The global axis is essentially the continuous curve that best runs down the center of all base pairs in a structure, while the local axis is the best line that defines the center of any two adjacent base pairs (local axes need not be continuous). Thus, helical parameters are analyzed in the context of global or local axes, and are not interchangeable and may be very different.
Two distinguishing features of double-helical DNAs are the grooves. The widths of the major and minor grooves are measured as the phosphate-to-phosphate distance across the two strands in a direction perpendicular to the trajectory of the strands. These groove widths provide an important means for proteins to interact with the base pairs of the DNA. The wide major groove of B-DNA allows direct read-out of the bases, while the narrow major groove of A-DNA does not—there is, however, an advantage to A-DNA having a wider minor groove, which we will discuss in the next section. It should be immediately obvious from the earlier discussion that the base pair and base step parameters described above conspire to define the groove widths for each form of DNA.
Finally, we can see how a parameter such as twist has such a strong effect on the overall behavior of genomic DNAs. DNA when confined in the cell or the cell’s nucleus must be packaged into a compacted supercoiled form and, in the process, this induces stress that will perturb its secondary structure. For simplicity a set of terms have been defined for supercoiled DNA in the context of closed-circular double-stranded DNA such as those found in plasmids, bacterial chromosomes, and viral genomes. These terms can also be applied to linear eukaryotic DNAs that are spatially anchored and stressed through protein binding, DNA unwinding, and DNA compaction. In double-stranded DNA, the number of times the strands wrap around each other along the helical axis is defined as the twist (
Together, the twist and writhe define the topological properties of DNA. In truly closed-circular DNA that is unconstrained, twist and writhe are entirely correlated through the linking number (
4. The alphabet soup of DNA structures
DNA is highly polymorphic and, at least at the level of the helical structures, more variable than either proteins or RNA. The various forms of DNA have traditionally been named using the letters of the English alphabet and, from a survey of the literature, it was found that all but four letters have been assigned to at least one unique structural form (Ghosh and Bansal, 2003). We will, in this section, briefly describe a subset of DNA conformations that have been structurally characterized (Fig. 8 and 9) and the sequence propensities of these structures, starting with B-DNA and working our way through the variations on the double-helix and various multi-stranded conformations. Along the way, we will discuss their potential biological functions, particularly in DNA replication, as appropriate.
4.1. B-DNA: The standard form
B-form DNA is the most recognized and common structural form of DNA in the cell, being considered the conformation adopted by nearly all sequences within a genome. Interestingly, while B-DNA has a distinguishing set of structural properties, it is now understood to be highly variable and malleable. B-DNA is a right-handed, antiparallel double-helix in which the Watson-Crick base pairs are stacked directly along and perpendicular to the helical axis, giving rise to major and minor grooves that are similar in depth. The bases are all in the
Although these properties are general for B-DNA, the structure is highly variable from one sequence to the next and for the same sequence under different conditions. The concept of
Variations of the B-form have been primarily elucidated by detailed structural studies, particularly X-ray diffraction and NMR, on short oligonucleotides. The question that is often raised is whether these short lengths of DNA may in fact not be relevant (and, in the case of crystals, be otherwise distorted (Dickerson et al., 1994)) relative to sequences embedded in a genomic context. Studies by Tullius’ group using hydroxyl-radical foot printing (Greenbaum et al., 2007), have shown significant sequence dependent variation in the solvent accessibility and, thus, the helical structure of protein-free genomic DNA. These structural variations at the genomic level are highly correlated with variations in helical parameters measured in DNA crystal structures (unpublished results) derived from a self-consistent data set (Hays et al., 2005). In conclusion, there is growing recognition that even B-DNA is a highly variable structural form of the DNA double-helix, and that sequence dependent structural variations play a critical role in protein recognition and binding.
4.2. A-DNA: Underwinding for replication fidelity
A-form DNA is also a right-handed antiparallel helical duplex, but is characterized as an underwound structure that is more compact along the helix axis and broader overall across the helix relative to B-DNA. The nucleotide bases, all
A-DNA is involved in insuring the fidelity of DNA replication. An analysis of the structure of the
4.3. Z-DNA: The left-handed duplex
Z-form DNA is noteworthy as the only characterized left-handed form of the double-helix. The zig-zagged backbone, its namesake, results from the alternation between
The biological function of Z-DNA has been widely debated and underappreciated; however, several cellular functions for the Z-form are now supported by experimental evidence (Rich and Zhang, 2003). Z-DNA was initially characterized as a structure induced by high salt conditions (3 M NaCl) (Pohl and Jovin, 1972), leading many to wonder whether it could exist in a cell. Subsequently, it has been shown that cytosine methylation, and other cations such as spermine and spermidine at millimolar concentrations also stabilize Z-DNA (Rich and Zhang, 2003). Most importantly, as a left-handed structure, Z-DNA is the most underwound form of the double-helix and, consequently, serves as a sink for the torsional tension in negatively supercoiled DNA (Rich and Zhang, 2003). This expands the range of cellular situations that could support the formation, at least transiently, of Z-DNA. In one model, RNA polymerase, as it transcribes through a gene, would generate negative supercoils in its wake (Liu and Wang, 1987) and, on the process drive Z-DNA formation upstream of the transcribing gene. A detailed study of the promoter for human CSF-1 gene showed that up-regulation by the chromatin remodeling BAF protein involves a Z-DNA element (Liu et al., 2001). The authors suggested that Z-DNA upstream of the nuclear factor-1 binding site helped to maintain the gene in its activated, nucleosome-free state (nucleosomes do not bind to the very rigid Z-DNA form (Ausio et al., 1987)). In support of its potential role in the regulation of eukaryotic genes, we have found that Z-forming sequences accumulate near the transcription start site of genes in humans and other eukaryotes (Khuu et al., 2007; Schroth et al., 1992), and that ~80% of the genes in human chromosome 22 have at least one Z-DNA sequence in the vicinity of their transcription start sites (Champ et al., 2004).
The discovery of protein domains having very high specificity for Z-DNA (Rich and Zhang, 2003), in some cases with nanomolar
4.4. H-DNA: Three’s a crowd
When a single DNA strand invades the major groove of a DNA duplex, a triple helical structure is generated (Fig. 9). In order for the duplex to accommodate this third strand, it must unwind to broaden the major groove; thus, such triple-stranded helices are favored in negatively supercoiled DNA (Mirkin, 2008). The invading third strand can be intermolecular or intramolecular.
The interaction between strands involve the Hoogsteen edge of the Watson-Crick base pairs (Fig. 3) of the duplex to form base triplets, leading to the name H-DNA for such triplex structures. H-DNA is formed primarily in mirror repeat sequences (sequences that have dyad symmetry within a strand, as in …AGAGGGnnnGGGAGA…, definedby the sequence preference to form base triplets). Mirror-repeats occur randomly in prokaryotes, but are three to six times more frequent in eukaryotic genomes (Schroth and Ho, 1995). Specific H-DNA forming sequences have been identified in multiple promoter regions with documented effects on gene expression of several disease related genes, includingc-myc (Kinniburgh, 1989) and c-Ki-ras (Pestov et al., 1991). As with Z-DNA, the repeating sequence motif of H-DNA appears to be a source of genetic instability resulting from double-strand breaks. Wang and Vasquez (2004) reported a ~20 fold increase in mutation frequency upon incorporation of an H-DNA forming sequence found in the c-myc promoter region into mammalian cells. These results suggest that naturally occurring DNA sequences can cause increased mutagenesis via non-standard DNA structure formation.
4.5. HJ, G, and I: The four-stranded DNAs
There are several conformations of DNA that can be assembled from four strands. The three structures discussed here show very different and unique helical forms, starting with a conformation that is most similar to standard B-DNA, and leading through forms that differ dramatically from the original Watson-Crick model (Fig. 9).
4.5.1. The four-stranded Holliday junction
Robin Holliday proposed in 1964 that a four-stranded junction would be involved as an intermediate to allow reciprocal exchange of genetic information through recombination across two homologous DNA duplexes (Holliday, 1964). These intermediates, now referred to as Holliday junctions, are essential to several cellular processes including recombination dependent DNA lesion repair, viral integration, restarting of stalled replication forks, and proper segregation of homologous chromosomes during meiosis
Around the end of the 20th century, two groups almost simultaneously solved the single-crystal structures of the DNA Holliday junction (Ortiz-Lombardía et al., 1999; Eichman et al., 2000). Both structures strongly resembled the model derived from the solution studies (McKinney et al., 2003), showing the junction to be essentially two B-DNA double-helices, with standard Watson-Crick type base pairs, linked by two crossing strands that connectthe duplexes. A unique set of hydrogen bonds helps to stabilize the tight U-turns at the cross-over points (Eichman et al., 2002), and impose a strong sequence dependence in the formation of Holliday junctions, with the inverted repeats GGTACC > GGCGCC > (GATATC = GGGCCC) in their stability as four-stranded stacked-X junctions (Hays et al., 2005). In addition, the interactions define an ~40° angle relating the two linked duplexes—the structure of an asymmetric junction showed no interactions at the junction center, and an interduplex angle of ~60° (Khuu and Ho, 2009), similar to that determined in solution for analogous constructs (McKinney et al., 2003). The structure of the junction has now been determined with the drug psoralen (Eichman et al., 2001), methylated cytosines (Vargason and Ho, 2002), and various types of cations (Thorpe et al., 2003), all showing effects on the detailed geometry of this four-stranded intermediate (Watson et al., 2004). The effect of sequence on the formation and geometry of junctions lead to a model in which even non-sequence specific resolvases may show sequence preference, not as a result of any specific recognition motif between the protein and the DNA, but from the thermodynamic propensity of certain sequences to promote formation of the junction (Khuu, 2006).
In replication, Holliday junctions are essential intermediates in double-strand break repair (Cox et al., 2000) in which RecA facilitates invasion of a single-strand into a homologous double-strand sequence, followed by junction migration and resolution by RuvABC (RecG). Homologous recombination also plays a crucial role in rescuing replication forks that stall because of DNA damage. Recombination proteins repair double-strand ends produced when a replication fork encounters a single-strand interruption and help reset replication at stalled forks by converting blocked replication forks into Holliday junctions. Thus, DNA junctions are involved in the repair of damaged DNAs both during and after replication.
The four-stranded structures assembled from guanine-rich sequences are called G-quadruplexes or G-quartets. Such sequences are found primarily in telomeric DNA repeats (3’-overhangs at chromosome ends (Patel et al., 2007)), but have recently been identified in various other central regions of the genome, including centrometric sequences (Brooks et al., 2010) and in the immunoglobulin switch region. The strands are held together by pairing the Watson-Crick edge of each guanine with the Hoogsteen edge of an adjacent guanine, creating a cyclic arrangement of four guanines into G-tetrads. These tetrads are stacked with a right-handed helical twist, and are stabilized by monovalent cations (Na+ or K+) coordinated to the O2 oxygens of the guanines, and sandwiched between the base stacks.
G-quartets can be formed from the association of one, two, or four G-rich DNA strands with various topologies (Mirkin, 2008). Of these, the topologies that can be adopted by single-strands are perhaps most important for G-rich sequences at the 3’-ends (telomeric ends) of chromosomes (characterized as a single–stranded overhang of a guanine-rich sequence that assembles into a nucleo-protein structure). Such sequences have been shown to form G-quadruplex structures, from the DNA in the marconucleus of a ciliate (Mergny et al., 2002) to the exceptionally stable G-quartet formed under physiological conditions by the human telomeric repeats ((GGGTTA)3GGG) (Parkinson et al., 2002). The telomer ends are replicated through the reverse transcriptase function of telomerase, which is itself a protein-RNA complex (Zakian, 2009). The precise length of each telomere controls the cell’s ability to replicate, suggesting a regulatory role for their G-quadruplex structures. In normal cells, the length of the telomeric region is reduced during each round of replication until the Hayflick limit is reached, at which point the cell enters apoptosis (Zakian, 2009). The misregulation of telomerase activity can lead to immortality of cells and associated tumorogensis.
Although it is easy to envision formation of a G-quartet structure at the single-stranded end of a chromosome, G-rich repeating sequences with the potential ability to form G-quadruplexes have also been identified at internal sites within genomes (Brooks et al., 2010). Indeed, a recent study by Sarkies,
In order for a double-stranded G-rich region to extrude into a G-quartet structure, the complementary C-rich strand must also be extruded. The structure that is now associated with C-rich sequences is the four-stranded, intercalated i-motif. The i-motif, or I-form DNA, is fashioned from two parallel C-strands intercalated in a head-to-tail fashion [(Mills et al., 2002). The two duplexes of poly(dC) are stabilized by base pairing the Watson-Crick edges of two cytosines to form hemi-protonated C
5.Getting from here to there: Structural transitions in DNA
B-DNA is recognized as the “standard” form in the cell; however, if everything remains standard and static, then life would not be as rich, nor might it exist at all. DNA is thus not only polymorphic, it is also dynamic. In this section, we will explore the mechanisms that drive DNA from the norm as B DNA, focusing on two transitions that present interesting and important insights into how DNA transforms between structural forms.
5.1. Going from B to A
As we have seen, A-type DNA plays an important role in replication as the induced form in the active site of DNA polymerase, allowing the non-sequence specific recognition of base mispairs in the template/daughter duplex. The transition from B- to A-DNA was one of the earliest characterized, with dehydration of DNA fibers showing a distinct shortening in the helical rise, unwinding of the helical twist, and broadening in the diameter (Franklin and Gosling, 1953a). The transition is also induced in solution by alcohol (a dehydrant), as well as methylation of cytosines (which affects the water structure around the base pairs). The question is, what are the structural and energetic steps involved in this transition? Although this is basically a transition from one right-handed antiparallel double-helix to another, several dramatic structural rearrangments must take place, including a conversion of the sugar pucker, along with large sliding and inclination of base pairs. The details of this conformational shift were observed crystallogaphically at the atomic level on the short DNA sequence GGCGCC (Vargason et al., 2001), which was primarily in the B-form, but, upon cytosine methylation or bromination, adopts a number of conformational states, including true A-DNA forms and a set of logical intermediates between the B- and A-forms (Fig. 11). This study generates a structural map for how the sugar conformation works its way around the ring(Fig. 5), the order of translational and rotational distortions to the stacked base pairs, and the direction of propagation of a structural transition once initiated.
The transition involves conversion of the sugar from the B-DNA C2’-
Associated with the changes in sugar pucker are perturbations to the base stacking. As the sugars go through a transition from B- towards A-type sugars, the B-A chimeric intermediate (which is half B- and half A-type along each strand) induces a large buckle in the base pairs at the point of transition, which partially unstacks one of the two bases of the pair. The unstacking becomes complete when the sugars assume the full A-type pucker, resulting in an ~10% extension of the spacing between bases, or a rise of ~3.7 Å (Vargason et al., 2000), thereby allowing the large slide and subsequent displacement of the base pairs away from the helical axis that is characteristic of A-DNA. Thus, large shifts between base pairs are predicated on breaking the base stacking interactions, as one would expect. In addition, it shows the transition to A-DNA propagating back towards the 5’-end of each strand. The tilt and roll that causes the inclination and resulting shortened rise of A-DNA are the final steps. The B- to A-DNA transition is unique in that specific intermediates have been trapped to provide an atomic level map for the transition—this is perhaps the most detailed description of a complete structural transition of any biological macromolecule.
5.2. Switching hands: The B- to Z-DNA transition
A more dramatic transition is from the right-handed B- to left-handed Z-DNA (Fig. 12), which has been studied extensively in solution and in plasmids. The B-Z transition, however, does not simply twist a right-handed double-helix in the opposite direction. The sugar for alternating nucleotides along a strand change from C2’-
In order to accommodate all of these radical changes, there is a junction with an overall zero twist (the B-Z junction) that serves to splice the right- and left-handed twisted duplexes (Peck and Wang, 1983). The structure of this junction was determined in a clever way using a Z-DNA binding protein to stabilize half the DNA in the left-handed form, while allowing the other half to remain in its relaxed B-form (Ha et al., 2005). The structure shows that the bases at the B-Z junction itself have flipped out, which would allow for transition of the sugar pucker and rotation of the bases. It also allows the bases, when they pair again, to change the direction of the groovessense, while maintaining stacking between the left- and right-handed columns. The B-Z transition, therefore, can be thought of as initiating with a melting of two base pairs (two B-Z junctions, with a nucleation energy of ~10 kcal/mol (Peck and Wang, 1983)), with each junction subsequently migrating in opposite directions to allow the propagation of the left-handed DNA between them (the propagation energy per base pair being sequence dependent and lowest in alternating GC dinucleotides (Ellison et al., 1985)).
In this review, we have discussed a plethora of structures that come from physical biochemical studies, and show how these structures are defined by sequence and how they transform. Through its history, there has always been a nagging question of “Is this structure relevant?” Clearly, the B-DNA double-helix is relevant, not only to replication, but also to nearly all genetic processes. However, a clearer understanding for the biological roles of the non-B-type DNAs will require a detailed mapping of such structures (Ho, 2009), either experimentally or computationally, across genomes from various organisms.