Knowing the three-dimensional structure of biological macromolecules, such as proteins and DNA, is crucial for understanding the functioning of life. Biological crystallography, the main method of structural biology, which is the branch of biology that studies the structure and spatial organization in biological macromolecules, is based on the study of X-ray diffraction by crystals of macromolecules. This article will present the principle, methodology and limitations of solving biological structures by crystallography.
- biological macromolecules
- X-ray diffraction
- tridimensional structure
In 1953, James Watson and Francis Crick revealed the double helical structure of DNA using the results of Rosalyn Franklin obtained by X-ray scattering on natural filaments formed by DNA molecules . Proteins, the nanomachines essential to living organisms, have their “manufacturing plan” encoded in their DNA gene sequence . During their synthesis, proteins adopt a specific three-dimensional structure that allows them to perform their functions within the cell. “Seeing” the structure of biological macromolecules, such as proteins or nucleic acids (RNA or DNA), allows researchers to elucidate the mechanisms of live in all organisms, and among many other applications, allows them to design new drugs .
“Seeing” proteins or nucleic acids in three dimensions, a dream or a reality? Could microscopy, a technic known since more than 350 years that allows to visualize biological cells, be the right approach? Of course, the dimensions of these two objects, macromolecules and cells are very different: The cell size ranges generally from 10 to 100 microns (10−6 m), the dimensions of biological macromolecules, proteins or nucleic acids, are of the order of tens of angstroms (10−10 m) (Figure 1). To reach atomic details, the method of choice is crystallography, whose principle is based on the bombardment by X-ray of crystals composed of biological macromolecules .
In only 50 years, crystallography has become the technique of choice for the determination of structures of biological macromolecules at atomic scale, taking advantage of the major advances in the scientific fields as diverse as molecular biology, biochemistry, computer science, physics and more recently robotics. Today, crystallography is able to address the determination of three-dimensional structures of macromolecules more and more complex, more and more quickly. Currently, more than 25 crystal structures are deposited daily in the Protein Data Bank (http://www.rcsb.org) The protein Data Bank (PDB) is a databank that contains 120,262 entries of macromolecules structures (protein, nucleic acids, complexes), 107,455 have been solved by X-ray crystallography (July 2016).
The protein Data Bank (PDB) is a databank that contains 120,262 entries of macromolecules structures (protein, nucleic acids, complexes), 107,455 have been solved by X-ray crystallography (July 2016).
The physical principle of crystallography is based on X-ray diffraction by all the electrons constituting the atoms of all the macromolecules contained in the crystal (Figure 2). The analysis of these diffraction data then allows the crystallographer to calculate the electron density, which is the distribution of the electron cloud of the macromolecule in the crystal. This electron density provided it is sufficiently precise—this preciseness depends on the resolution of the diffraction data—allows the localization of each atom of the molecule, and thus the determination of its coordinates in the three-dimensional space .
To get this three-dimensional structure, several steps that falls within multiple disciplines are required (Figure 3). Each of these steps represents potential bottlenecks that need to be overcome. These are the production and the purification of the macromolecule, its crystallization, diffraction data collection and processing. Another crucial step is the determination of the phases of the measured signal, absolutely required to calculate the electron density. The last step is the refinement of the built structure, called the model, which will then be interpreted in the context of its biological function. The analysis of the model will thus raise new questions leading to the resolution of other crystal structures, such as structure of a complex between the studied protein and its partners . We will in the following sections describe each of these steps.
2. Steps upstream the structure determination
The first step, a step that falls within biology and includes molecular biology and biochemistry techniques, is the production of highly pure macromolecule in large quantity. Once the sequence of the macromolecule to be studied has been identified and characterized by bioinformatics analyses, the sequence corresponding to the gene of the macromolecule is cloned in an expression vector and produced classically in a bacterial organism (typically In a aqueous solution containing a buffer, salt and various additives.
In a aqueous solution containing a buffer, salt and various additives.
The next bottleneck is based on physical chemistry, specifically crystallization which addresses concepts such as solubility of molecules and their transition from soluble state to a solid crystalline ordered state . This step, built on statistical screenings plays with the variation of parameters such as temperature, pH, concentrations of biological macromolecules, as well as nature and concentration of crystallizing agents and various additives . Obtaining a single homogeneous crystal, that result to high quality diffraction data, represents a crucial step in the process of determining a macromolecular structure. In order to increase the success rate, crystallization robots are used today to screen more than several thousands of parameters. The size (from tens to hundreds of microns) and the morphology of the crystals are highly variable (Figure 4) and are not necessarily related to their diffracting power and quality.
3. The diffraction data
The crystals obtained during the previous step are fished using a small loop (Figure 4), cryo-cooled to protect them from radiation damage , and then placed into a monochromatic X-ray beam produced by an appropriate source, either a rotating anode generator available in crystallography laboratories or a synchrotron radiation, the latter producing significantly more intense beams . Under these conditions, the waves scattered by the electrons of the macromolecules that are three-dimensionally ordered in the crystal add up in given directions (the diffracted beam is characterized by a structure factor, Figure 7) and generate a diffraction spot on the screen of the detector (Figure 5A). All the spots, regularly spaced, form the diffraction pattern (Figure 5A). This diffraction pattern is reconstituted by using several hundreds of images, each corresponding to an orientation of the crystal that rotates on itself during the measurement of the diffraction data (Figure 2 and Figure 5B). The information contained in each diffraction spot is characterized by the amplitude and the phase of the structure factor characterizing the corresponding scattered wave.
The three-dimensional distribution of the spots is directly related to the cell parameters, e.g. the three lengths of the parallelepiped that constitutes the volume element (the cell), which is regularly repeated in space (Figure 6) and allows to describe the crystal. The distribution of the spot intensities is directly related to the electron density distribution (the macromolecules) in the cell. Mathematically, this means that the diffraction pattern is the Fourier transform of the electron density (Figure 7).
The electron density contained in one cell can thus be calculated by inverse Fourier transform, a mathematical property of this transformation, provided the amplitude and the phase of all the diffracted beams are known (Figure 7). Whereas the amplitude is directly proportional to the intensity of the diffracted spots, the phase information is not experimentally measurable.
In summary, the crystal “realizes” a Fourier analysis producing diffraction data, and the crystallographer will calculate a Fourier synthesis to get the electron density contained in one cell (Figure 7).
4. From the diffraction data to the electron density
Three main methods exist for the estimation of the phases . We have to remember here that the number of phases to be estimated is typically several tens to hundreds of thousands (the phase of each spot for which the intensity has been measured has to be estimated).
The first method is
The second method is
The third method is
5. From the electron density to the structural model
Once a first set of phases is estimated, a first electron density map is calculated. If this map is sufficiently interpretable, the macromolecule can be built step by step in this map (Figure 8). A combination of automated algorithm and manual method available through interactive graphics softwares are used , leading to a final model composed of the three-dimensional coordinates of each atom of the cell content constituted by one or several macromolecules.
From that first built model, the diffraction intensities are calculated by Fourier transform and compared to the intensities experimentally measured. This comparison allows the step by step improvement of the model. This cyclical process is called the crystallographic refinement, alternating the search for global minimum of energy functions and manual reconstruction of the model .
6. Steps downstream the structure determination
The final step, downstream the structure determination by X-ray diffraction, concerns the interpretation of the structure and its integration into the biological context [27–29]. It consists in the understanding of the structural result as a three-dimensional object and the appreciation of its function at the cellular or evolution level. The description of the interatomic interactions, the secondary structures (Figure 9), the domains and their arrangement that defines the fold or the tertiary structure (Figure 9), as well as the characterization of the shape, the electrostatic properties and the quaternary structure based on the content of the cell in the crystal packing, are often complemented by the study of the macromolecule in solution, to better characterize its oligomeric (Figure 9) and its dynamic behavior, alone or in the presence of interactors, if known. These studies use a variety of biophysical methods, such as mass spectrometry, analytical ultracentrifugation, light scattering, microcalorimetry or surface plasmon resonance (Biacore® technology), etc … . In the case of enzymes, these studies will be coupled with enzymological approaches to determine the activity and the catalytic constants.
An analysis based on bioinformatics tools will allow to place the structure determined in the context of structural and evolutionary knowledge at a given time . The lessons learned from these studies, often of primary importance, provide information including the classification of the structure and its sequence within a family counterparts, on the distribution and evolution of folding in the different domains of life (viruses, bacteria, archaea, eukaryotes), on the possible function when it is unknown, on the catalytic site and its spatial conservation and sequence, on the degree of oligomerization or on the existence of interaction with other partners, proteins, nucleic acids or ligands. A final type of study seeks to place the three-dimensional object into the context of the knowledge on the major biological mechanisms of live, such as knowledge on gene expression with transcriptomics, on complex formation with interactomics, etc … This information will include the characterization of the partners of the studied macromolecule at the scale of the cell or the whole organism.
All these steps, from the structure determination to the biological interpretation, far from being the end of the story, are often the beginnings of new structural studies (Figure 3). These can be articulated around analyses of the relative importance of the components of the macromolecule, the aminoacids, by determining the structure of mutants, or the studies of the interactions with partners by determining the structure of macromolecular complexes.
Watson J.D., Crick F.H.C., Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature, 1953, 171, p. 737-738.
Anfinsen C., The formation and stabilization of protein structure. Biochem. J., 1972, 128, p. 737–749.
Scapin G., Structural biology and drug discovery. Curr. Pharm. Des., 2006, 12, p. 2087–2097.
Blundell T.L., Johnson L.N., Protein Crystallography. Academic Press, New York, 1976.
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E., The Protein Data Bank. Nucl. Acids Res., 2000, 28, p. 235-242.
Wlodawer A., Minor W., Dauter Z., Jaskolski M., Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J., 2008, 275, p. 1-21. Review.
Wlodawer A., Minor W., Dauter Z., Jaskolski M., Protein crystallography for aspiring crystallographers or how to avoid pitfalls and traps in macromolecular structure determination. FEBS J., 2013, 280, p. 5705–5736. Review.
Chernov A.A. Protein crystals and their growth. J. Struct. Biol., 2003, 142, p. 3–21.
Luft J.R., Collins R.J., Fehrman N.A., Lauricella A.M., Veatch C.K., DeTitta G.T. A deliberate approach to screening for initial crystallization conditions of biological macromolecules. J. Struct. Biol., 2003, 142, p. 170-179.
Garman E.F., Schneider T.R. Macromolecular cryocrystallography. J. Appl. Crystallogr., 1997, 30, p. 211-237.
Moffat K, Ren Z., Synchrotron radiation applications to macromolecular crystallography. Curr. Opin. Struct. Biol., 1997, 7, p. 689–696.
Taylor G., The phase problem. Acta crystallogr. D., 2003, 59, p. 1881-1890.
Rossmann M.G. The molecular replacement method. Acta Crystallogr. A., 1990, 46, p. 73-82.
Tickle I.J., Driessen H.P., Molecular replacement using known structural information. Methods Mol. Biol., 1996, 56, p. 173-203. Review.
Abergel C., Molecular replacement: tricks and treats. Acta Crystallogr. D Biol. Crystallogr., 2013, 69, p. 2167-2173. Review.
Perutz, M.F., Isomorphous replacement and phase determination in non-centrosymmetric space groups. Acta Cryst. 1956, 9, p. 867–873.
Perutz M. F., Rossmann M. G., Cullis A. F., Muirhead H., Will G., North A. C., Structure of hemoglobin: a three-dimensional Fourier synthesis at 5.5 A resolution, therefore obtained by X-ray analysis. Nature, 1960, 185, p. 416-422.
Kendrew J.C., Bodo G., Dintzis H.M., Parrish R.G., Wyckoff H., Phillipps D.C., A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature, 1958, 181, p. 662-666.
Patterson A.L., A direct method for the determination of the components of interatomic distances in crystals. Zeitschrift für Kristallographie, 1935, 90, p. 517-542.
Read R.J., As MAD as can be. Structure, 1996, 4, p. 11-14. Review.
Blow D.M., How Bijvoet made the difference: the growing power of anomalous scattering. Methods Enzymol., 2003, 374, p. 3–22.
Ealick S.E., Advances in multiple wavelength anomalous diffraction crystallography. Curr. Opin. Chem. Biol., 2000, 4, p. 495–499.
Doublie S., Preparation of selenomethionyl proteins for phase determination. Methods Enzymol., 1997, 276, p. 523–530.
Anderson A.C., O'Neil R.H., Filman D.J., Frederick C.A., Crystal structure of a brominated RNA helix with four mismatched base pairs: an investigation into RNA conformational variability. Biochemistry, 1999, 38, p. 12577-12585.
Emsley P., Debreczeni J.E., The use of molecular graphics in structure-based drug design. Methods Mol. Biol., 2012, 841, p. 143-159.
Tronrud, D.E., Introduction to macromolecular refinement. Methods in molecular biology series, 2007, vol. 364: macromolecular crystallography protocols: vol. 2: 34 structure determination. p. 231–253, Humana Press Inc, Totowa, NJ.
Baker E.N., Seeing atoms: the rise and rise of crystallography in chemistry and biology, chemistry in New Zealand, January 2011, (New Zealand Institute of Chemistry), New Zealand.
Shi Y., A glimpse of structural biology through X-ray crystallography. Cell, 2014, 159, p. 995-1014.
Yonath A., X-ray crystallography at the heart of life science. Curr. Opin. Struct. Biol., 2011, 21, p. 622–626.
Malik S.S., Shrivastava T., Protein characterization using modern biophysical techniques. Advances in Protein Chemistry, 2013, OMICS Group eBooks. (http://www.esciencecentral.org/ebooks/ebooks-about.php)
Lecompte O., Thompson J.D., Plewniak F., Thierry J., Poch O., Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene, 2001, 270, p. 17. Review.
- The protein Data Bank (PDB) is a databank that contains 120,262 entries of macromolecules structures (protein, nucleic acids, complexes), 107,455 have been solved by X-ray crystallography (July 2016).
- In a aqueous solution containing a buffer, salt and various additives.