Investigating Macromolecular Complexes in Solution by Small Angle X-Ray Scattering

Macromolecular complexes have a huge interest in molecular biology. The comprehension of the biological processes in living systems is directly related to the knowledge of the shape and structure of the formed complex and the process of formation. Although X-ray diffraction, Nuclear Magnetic Resonance and cryoEM can provide information on the formed structures, there are several cases where none of those techniques can be applicable. Limitations on molecular weight, the necessity of a well ordered crystal, difficulties on sample preparation etc, are some of the bottle necks of those techniques (Svergun, 2007; Oliveira et al, 2010). Most importantly, in several cases the studies have to be performed directly in solution, with minimum interaction with the studied sample in order to avoid biased results. In this respect, scattering techniques are highly recommended since they allow a study directly in solution in a very non-invasive way. Small angle X-Ray scattering (SAXS) is a standard technique that can be applied to the study of particles in solution, providing information on size, shape, polydispersity, flexibility, oligomerization and aggregation state. Also, it allows real time measurements where the system can be monitored directly in solution, enabling the study of the in situ particle formation (Oliveira et al, 2009). The combination of SAXS and microscopy techniques has been used in several applications due to their complementarity (Oliveira et al, 2010; Andersen et al, 2009). In this chapter some general aspects of Small Angle X-Ray scattering and the state of the art modeling methods will be presented, with several applications.


368
photon interacts only with one atom and the resulting scattered photon has the same energy of the incident photon (elastic scattering).This effect is mostly related the so called Thomson scattering.The complete solution of the scattered beam is a sum of a plane wave plus a spherical wave (Jackson, 1988).Since the information of the scattering process is related with the spherical wave only this part is considered to investigate the structural information of the particle (Guinier and Fournet, 1955;Glatter and Kratky, 1982;Feigin and Svergun, 1987).One possible way to understand the scattering process is to start from the concept of the scattering from a single particle, fixed in space.This is sketched on Fig. 1, where an incident beam of wave vector 0 k  strikes the particle at the points O and P, separated by the vector r  .
Fig. 1.Representation of the scattering process for a fixed particle.
Since the scattering is assumed to be elastic, the scattered wave with wave vector k  has the same modulus of the incident wave so the difference between the incident and the scattered beam is given by: 00 , 2s i n 4 sin 2 (1) which leads to the definition of the reciprocal space momentum transfer vector q.The scattering amplitude   f q  is given by the Fourier transformation of the particle electron The measurable quantity is the scattering intensity   The index "1" indicates that until now this intensity is related to a single particle with fixed orientation.One usual mathematical procedure is to take the convolution integral in r' and define as the so called self-correlation function   r   : Now the scattering from a single fixed particle can be rewritten as, The self-correlation function   r   has several properties and asymptotic limits enabling the retrieving of several general parameters.The interested reader is invited to read the seminal book from Guinier and Fournet (Guinier and Fournet, 1955) and the articles from Ciccariello (Ciccariello, 1985) among others.A theoretical calculation of a 2D scattering profile for a fixed particle in space is shown in Fig2.
Fig. 2. Theoretical calculation for a two dimensional scattering profile for a fixed ellipsoidal particle.The intensity is given in logarithm scale.Inserts: vertical and horizontal 1D-profiles of the intensity.
Equation ( 5) is still too general to be used in practice.In real systems, the particles investigated are not fixed in space but instead they might be randomly oriented.The averaging procedure can be made either in real space or in reciprocal space.From the mathematical point of view is easier to perform the average in reciprocal space, by an extra integration over the solid angle : Substituting ( 5) on (6) we have, Where the volume integral element for dr  was written as 2 rd r d  , the spherical coordinates in real space.In the last operation the terms were rearranged.The angular integrals can be performed directly: The theoretical intensity of an ellipsoidal particle randomly oriented in space if shown in Fig3.As can be directly seen, now the 2D spectra is angular independent and any cut starting from the center towards a radial direction will have the same profile.

371
Some interpretations for the p(r) function will be explained in the next sections.The result given in equation ( 10) was derived for a single particle randomly oriented in space.In real systems the particles are dispersed in a matrix with electron density  0 and therefore it is necessary to extrapolate this result for a system of particles.One expression for the intensity of a system of particles is given by  2 () () Where    is the so called particle form factor and the function () Sq  is the system structure factor.For systems composed of identical particles (monodisperse sytems) the form factor is identical to the average scattering intensity of a single particle,     1 P q I q  .For polydisperse systems the form factor will be an average over the different sizes, electron densities and particle shapes.In this case, one usual procedure is to assume a known shape and electron density and performs the average over a distribution of sizes (Glatter and Kratky, 1982).On the other hand, the structure factor is related with particle interactions and there are several approaches to describe its behavior (Pedersen, 2002).For very diluted systems the particle interactions can be neglected and the structure factor is equal to 1. Therefore for a system of identical particles in a dilute regime we have indicating that, although the measured intensity is a contribution of a large number of particles, it contains the information of the scattering from a single particle randomly oriented.This shows that in a real case where the intensity I(q) is measured, it might be possible to obtain information about the single particle shape and conformation.
As mentioned above, the particles can be considered to be immersed in a matrix with constant electron density.It can be shown that in this case the scattering event will only happen if there are differences between the electron density of the particles and the matrix.
In this way the electron density (r) shown in equations 2-5 should be replaced by the electron density contrast (r) = (r)- 0 .In order to make this point clearer, one usual approach is to consider the particle form factor to be normalized and the electronic mass is explicitly shown: where  =  - 0 , is the scattering contrast between the particles and the matrix, V is the particle volume and P 1 (q) is the normalized form factor (P 1 (q=0)=1).

Experimental aspects and absolute calibration
A schematic setup for a typical SAXS experiment is shown in figure 4. Specific technical details about geometries and configurations can be found in several sources (Lindner and Zemb, 2002) and it will not be presented in this chapter.However, some general characteristics have to be addressed.Since only a small fraction of the incident beam is scattered, the detectors should be set to detect reasonably low intensities.Therefore the incoming beam that passes without interaction with the sample has to be blocked by a beam stopper, to avoid possible damaging of the detector.The size of the beam stopper depends on the equipment geometry.In a typical experiment it is necessary to measure the intensity from the system (sample  matrix+particles) and subtract the intensity from the matrix where the particles are immersed (blank).To normalize the data to absolute scale, scattering standards have to be used.In the applications described in the present chapter two procedures were applied.In one procedure a known protein is measured on the same sample conditions (buffer, temperature, etc) and the forward scattering obtained for this sample is used to normalize the other, unknown data.In other procedure water at 20ºC is used as primary standard.This is a convenient standard since the value of the scattering cross section can be calculated with very high accuracy from the fundamental macroscopic properties of water.In both cases, the data has to be normalized by the value obtained from the standard on the same experimental conditions and multiplied by the theoretical intensity value.Assuming that the sample and the blank are measured in the same cell, the treated intensity, normalized to absolute scale can be given by: Where q is modulus of the scattering vector, defined as q = (4/) sin , where 2 is the scattering angle as shown in Fig1 and Fig4 and  is the wavelength of the monochromatic beam; I Treated (q) is the treated scattering intensities for the sample on absolute scale, i.e. the scattering cross section of the sample; I sample (q) is the raw data measured for the sample, I blank (q) is the raw data from the matrix scattering; I(0) std is the value at q = 0 of treated standard data (background subtracted and normalized by flux, transmission and acquisition time);  i is the flux of the incident beam; T i is the sample transmission and t i is the exposure time, where the index i is s (sample), b (blank); (d/d) std is the theoretical scattering cross section for the standard.For water at 20ºC this cross section has the value 0.01632 [cm -1 ].For proteins in typical buffers without high amounts of salt, glycerol or other additives, the theoretical cross section for a system of proteins in solution with mass concentration c (in mg/mL) and molecular weight M W (in kDa) is given by (d/d) std =6.645x10 -4 c M W [cm -1 ] (see equation 15 below).
Having the data on absolute scale, information about its contrast, particle volume or particle concentration can be obtained, depending on the knowledge about the system.One very important parameter when studying proteins in solution is the determination of the molecular weight, which is a direct indication of the oligomerization state of the protein.
Starting from equation ( 13), multiplying and dividing by the particle specific volume v and some simple algebraic manipulations is possible to rewrite the intensity I(q) as: were c is the concentration in mg/mL,  M is the excess scattering length density per unit mass (cm/g), M W is the molecular weight in kDa, N A is the Avogadro's number and P(q) is the normalized form factor (P(0)=1).For proteins, a good approximation of  M is 2x10 10 cm/g (Oliveira and Pedersen, unpublished).The above equation directly shows that the molecular weight of the proteins can be directly estimated from the forward intensity I(0): In general, the precision on the molecular weight determination has an uncertainty of 10% -20%, which enables to check the monodispersity of the sample or to indicate the oligomeric state.However, this approach is very dependent on the knowledge of the scattering contrast and sample concentration.

Modeling methods
From the above considerations it is possible to see that from the analysis of SAXS data it might be possible to obtain structural information about the studied system.There are several methods that can be used, depending on the knowledge about the system.Usually the information that is desired is the scattering length density distribution (r), which might provide the particle shape, size, etc.This approach is the so called "inverse scattering problem", ie, retrieve real space information from the data in reciprocal space.The modeling is based on the comparison of a given model and experimental SAXS data.From the characteristics of scattering experiments the  2 (chi-square) test is a good minimization function for the optimization procedure.Given a set of N experimental points I exp (q i ) with standard deviations (q i ) an the theoretical intensity I teo (q i ) calculated for the same angular values, q i , the  2 function is defined as: , where N is the number of experimental points and M is the number of independent parameters used in the theoretical model.If a good fitting is achieved, the differences between the model and the experimental data will have to be lower or equal to the standard deviations (q i ).Therefore, since 2  for a good fit should be close to 1. Values considerably larger than 1 might indicate important discrepancies between the model and experimental data.However, it can also indicate underestimated uncertainties.
On the other hand, values considerably lower than 1 can indicate overestimated uncertainty values.

Indirect fourier transform -model independent approach
In the theoretical description shown above, the pair distance distribution function p(r) was introduced as a natural step on the equation manipulation and, as indicated in equation ( 10), it forms a Fourier pair with the scattering intensity of a single particle I 1 (q).Since the total intensity from a system is proportional to the scattering of a single particle (equation ( 12)), this procedure might be used to calculate the real space function p(r) from measured scattering data.This procedure has intrinsic limitations since the Fourier transformation involve integrals from 0 to infinity and the measured scattering data is only obtained for a very small region of reciprocal space.As a consequence, direct calculations of the p(r) function from the integral of I(q) are usually not successful since the truncation of the integral leads to strong oscillations of the p(r) function.Another method was introduced by Glatter (Glatter, 1977) and it is known as Indirect Fourier Transformation method (Program ITP and GIFT;Glatter, 1977;Bergmann et al, 2000;Fritz and Glatter, 2006).In this approach one starts from the p(r) function, describing it using a set of base functions (in the Glatter method, spline functions) and perform the Fourier transformations on those functions in order to have a similar set of base functions in reciprocal space.Since all operations are linear, the coefficients of the p(r) base functions are the same as the ones for the I(q) base functions and therefore by the fitting of the experimental data one can direct obtain the best set of coefficients and consequently the best p(r) functions.Since the interval of I(q) is still limited, this operation also leads to oscillating p(r) functions.In order to avoid this problem, Glatter introduced a damping parameter that is selected in the fitting procedure in order to provide a smooth p(r) function.A similar approach was used by Svergun and co-workers (Semenyuk and Svergun, 1991) in the program package GNOM.In both cases the fitting process is iterative and the user has to obtain the maximum particle size D MAX that gives the best fit and p(r) function.In an interesting development Hansen (Hansen, 2000) proposed a method where the maximum dimension is obtained using Baesyan probabilities.Recently, performing a procedure based on the Glatter method (Pedersen et al, 1994), Oliveira and Pedersen developed a procedure that enabled the calculation of the p(r) function from both diluted (program WIFT) and concentrated systems (program WGIFT), where structure factors are taken into account in the optimization (Oliveira et al, 2009).The calculation of the p(r) function for concentrated systems was also implemented by Glatter in a new implementation of his approach (Program GIFT) by optimization using simulated annealing.
A common result of all the above program packages is the pair distance distribution function p(r).As mentioned above, this function is a histogram of pair distances inside of the particle, weighted by the distance length and by the product of the electron densities of the infinitesimal elements of the pair.For particles with finite size, it will exists a maximum distance from which the p(r) function is zero.This corresponds to the maximum size of the particle.Since the histogram is weighted by the distance length, the p(r) function also might starts from zero.In this way, it is easy to see that the p(r) should start from zero and ends at zero when reach the maximum particle size.The shape of the function will be a consequence of the particle shape and electron density distribution.A set of theoretical calculations for the p(r) function is shown in Fig5, Fig6 and Fig7.In Fig5 one can see that globular particles  will have a p(r) function with a bell shape, with the maximum close to (r/D MAX )/2.Any anisotropy will move the maximum to the left, towards lower r/D MAX values.Elongated (prolate) particles with constant cross-section like cylinders or prisms will have p(r) functions with linearly descent regions.Flat (oblate) particles will have p(r) functions with shapes different from the two previous cases.Hollow particles will have p(r) functions with the maximum moved to the right, towards higher r/D MAX values.Dimeric particles will have p(r) with shoulders, as viewed in Fig6.Interestingly, the differences in the opening angle of a dimeric particle are easier to detect in the p(r) function than in the intensity I(q).Finally, particles with differences in the scattering length contrast might have p(r) functions with negative portions as indicated in Fig7.For a broader and deeper review on the p(r) interpretation the reader is invited to read several works in the literature (Glatter, 1979;Glatter and Kratky, 1982).The important point of this modeling approach is that, apart from the assumption that the system is composed of identical particles, no other hypothesis are made and the p(r) f u n c t i o n p r o v i d e s a d i r e c t i n s i ght about the particle shape and dimensions.This approach is widely used in analysis of SAXS data because it provides a first guess about the particle shape.r) functions for a core-shell particle with different scattering length contrasts.

Model dependent approach -assuming a known form factor
For simple particle shapes it is possible to integrate equation ( 2) and obtain the amplitude form factor   fq  .Then, performing the angular integral given in equation ( 6) it is possible to obtain a analytical or semi-analytical expression for   . Few examples of semi-analytical expressions of the scattering intensity calculated for particles with simple shapes.
The main advantage of the use of analytical or semi analytical expressions describing the form factor is that, usually, there is a low number of parameters to adjust against experimental data, permitting the determination of structural information with reasonable reliability.Also, if the model does not fit the data correctly, this directly indicates that the particle shape is different from the one that is been assumed.One example of application si presented in Fig8 where the model of an elongated cylinder was used to described the SAXS data of mature glucagon fibers.In several cases, the particle possible shape is known but the calculations of the integrals is impractical.In these cases it is possible to use the finite element method which consists of build up the particle shape using known subunits.One approach is to use spherical subunits and apply the Debye formula to calculate the intensity (Debye, 1915;Glatter, 1980): This procedure enables the calculation of very complicated models.From this calculation the model parameters can be optimized against experimental data (Oliveira et all, 2009(Oliveira et all, , 2010)).

Deconvolution square root -obtaining the electron density profile
In some applications, the particle shape is known but the electron density profile and overall dimensions have to be determined.Amphifilic molecules like surfactants and several types of diblock copolymers self assemble into structures that can be analyzed in this way.Several propositions for the deconvolution square root procedure can be found in the literature (Pape and Kreutz, 1978;Nagle and Wierner, 1989).An initial approach was to take the square root of the scattering intensity, which gives an absolute value for the amplitude function f(q).Then, by Fourier transforming this function is possible to retrieve the electron density distribution (r).This procedure has serious problems since the signals of the f(q) function has to be guessed an also the very short interval of data on reciprocal space precludes a trustful calculation of the inverse Fourier transformation.A more stable process was proposed by Glatter (Glatter 1981;Glatter 1984, Bergmann et al, 2000) where the deconvolution is made by the use of the p(r) function.Apart of the overall sign of the electron density profile (1 factor), this procedure enables a correct estimation of the electron density profile, and has been used in several applications (Rathgeber et al, 2002).One example of application of this method is presented in Fig9, where the radial electron density of SDS micelles could be obtained.

Ab initio modeling -an overview
The shape of the scattering function is directly related to the three-dimensional shape of the particle.However, since the particles are randomly oriented (equation 6) and there is only a limited measurable region in reciprocal space, the information content in a SAXS curve is very low (Patel and Schimidt, 1971).Nevertheless, even with these limitations important developments occurred in the last decades have proof that it is possible to obtain a 3D model from the 1D SAXS curves.Starting from the seminal work from Sturhmann in late 70's (Sturhmann, 1973;Sturhmann and Miller, 1978), Svergun and co-workers had used a set of spherical harmonics (multipolar expansion) to describe the particle electron density and by the use of a nonlinear minimization procedure it is possible to obtain the set of spherical harmonics coefficients that gives the best fit of the scattering data.Details on the calculations and the representation of the scattering intensity using spherical harmonics can be found in the original articles (Svergun and Sturhmann, 1991).The success of this method has shown that, even though it is not possible to obtain a unique solution for the particle shape, the fitting of the experimental scattering data enabled a direct ab initio determination of the three-dimensional shape.Since the representation using spherical harmonics only enables the construction of smooth shapes without sharp edges or corners, this approach provides a rough representation of the particle shape.In this way it can be said that this method provides a very low resolution approximation of the scattering data and usually enables the fitting of only the initial part of the scattering intensity.Program packages that enables the fitting of experimental data are available in the literature (programs ASSA and SASHA, Kozin et al, 1997).An example of this procedure is shown in fig10 where the experimental data for lysozime in solution was adjusted using multipolar expansion by using the program SASHA.It can be directly seen that the correct anisotropy and overall shape can be obtained from this approach.In all the examples shown in figures 10-14 the measurements were performed in the SAXS beamline of Brazilian Synchrotron Light Laboratory.In this method the particle is build using the finite element approach, by the use of a closed packing arrangement of spherical subunits.Since the number of possible solutions is very large, Monte Carlo based optimization methods are used to obtain the set of spherical subunits that gives the best fitting of the scattering data.The program DAMMIN is widely cited in the literature and starts by creating a spherical search space with diameter equal or slightly larger than the particle maximum diameter D MAX (obtained from the p(r) curve).By the application of a simulated annealing procedure, constrained by penalty functions that ensure particle compactness and smoothness (Volkov and Svergun, 2002), a subset of the initial search space can be obtained providing a three-dimensional model that represents the particle shape.Due to the intrinsic randomness of Monte Carlo approaches, several independent runs of these model procedures will lead to different models.However, it is possible to show that all models might share similar features like overall anisotropy, size, etc.This model approach permits a better representation of the particle shape than the multipolar expansion since it does not have the above mentioned limitations for the shape description.However, since the internal structure is not represented, this method cannot describe data up to high q values (Volkov and Svergun, 2002).One example of this so called "dummy atom modeling" is shown in Fig11 where the ab initio model was obtained from SAXS data of lysozyme in solution.When dealing with SAXS data from proteins, one additional very useful constraint can be used for the model building.Proteins are composed of a sequence of aminoacids, which forms its backbone, known as primary structure.This primary sequence folds into specific patterns like -helices, -sheets, turns, etc, composing the secondary structure.Finally, the secondary structure folds into a specific three-dimensional arrangement, known as tertiary structure.In some cases this protein can even be of a supramolecular complex which comprises the quaternary structure (Voet et al, 2008).Due to the intrinsic low resolution and information of a SAXS data, the information about the atomic resolution structure or secondary structure cannot be accessed but instead, the overall shape and size.However, the information of sequence continuity can be used as a constraint to enable a better modeling of proteins in solution.This procedure was implemented by Svergun, Pethoukov and co-workerks in the "dummy chain model" approach (Program GASBOR, Svergun et al, 2001).In this method a sequence of interconnected chains is used to represent the protein backbone.Each sphere corresponds to one amino acid and therefore the total number of spheres is identical to the number of protein residues.Starting from a spherical arrangement of the backbone the optimization program performs a simulated annealing optimization in which the backbone three-dimensional arrangement is changed in order to provide the best fitting of the scattering data.Similarly to the dummy atom approach, the theoretical intensity is calculated using a variation of the Debye formula (equation 18).The natural constraint imposed by the continuity of the backbone makes leads to better representations of protein structures.Also, this approach can fit experimental data up to higher q values than the previous ones since the internal structure of the protein is somehow represented by the backbone.One example of this so called "dummy chain modeling" is shown in Fig12 where the ab initio model was obtained from SAXS data of lysozyme in solution.
The previous examples showed the possibility of apply ab initio methods to retrieve the three-dimensional structure.Although the model results for the dummy atom and dummy chain methods are not unique due to the heuristic nature of the optimization methods, the  Svergun et al, 1995) where it was demonstrated that a hydration shell around the protein with slightly higher electron density than the one from the bulk was necessary to be considered.One example of this comparison between experimental data and theoretical SAXS intensity calculated from atomic coordinates is shown in Fig13 where the crystallographic structure for the protein lysozyme was used for the comparison with experimental data of lysozyme in solution.One of the major applicability of the use of SAXS data and the knowledge about atomic resolution models for proteins is for the cases where just part of the structure is known.In these situations the SAXS data can be used to generate (using the dummy chain approach) the missing aminoacid loops in the known structure (program BUNCH, Petoukhov and Svergun, 2005) or/and to obtain the spatial arrangement of known domains in order to form the full structure (program SASREF, Petoukhov and Svergun, 2005).Both the generation of the missing loops and the optimization of domains are performed by the use of Monte Carlo methods which, similarly to the previous cases, do not lead to a unique solution.However, even though the solution is not unique, the obtained model is a very good representation of the overall structure.Test examples are shown in Fig14 and Fig15.In Fig14 part of the lysozyme structure was clipped and as it can be seen in the curve, without the loop the atomic model cannot fit the experimental data correctly.With the addition of a dummy chain loop and its optimization it is possible to obtain a very good fit of the experimental data.The generated loop (blue loop in the model) is a reasonable approximation of the real loop superposed to it.In Fig15 a hypothetical situation of a heterodimer is shown.The optimization of the structure components does not give a perfect agreement with the initial structural but there is a remarkable similarity, indicating that SAXS data can also be applied in these cases.
The situations presented here are just a small representation of possibilities for the applications of these modeling tools.Advanced modeling examples based on these procedures can be found in several articles in the literature (Svergun, 2007).An intrinsic problem of any SAXS modeling is the ambiguity that might arises in the results.In general, it is not possible to obtain a unique solution from the modeling procedure.Therefore it is necessary to complement any scattering modeling with additional information in order to reduce the number of possible solutions.There are several ways on doing this.When available, information about binding sites or specific arrangement of domains can be used as constraints in the modeling.Results from biochemical/biophysical techniques can provide useful information about structure change or binding.For example, fluorescence spectroscopy and isothermal titration calorimetry can provide important information on binding and stoichiometry.In recent applications the simultaneous modeling of scattering data and other experimental data is been tried.A simultaneous modeling of SAXS and NRM data was proposed by Mareuil and co-workers in the program DADIMODO (Mareiul et al, 2007).Also, automatic tools for the use of complementarity between SAXS and NMR is been currently developed in connection with the SAXIER project (Svergun, 2007;Svergun, 2009).

Applications
Two applications of SAXS analysis will be presented.In the first case, an in-situ aggregation study of lysozyme is presented.As a second example a structural characterization of a giant protein complex is described.These two cases are good examples of the application of the SAXS technique to investigate biological systems.

Lysozyme denaturation and aggregation induced by heat
The structure of proteins is intrinsically related to its shape.The protein shape, on the other hand, is a result of the protein folding.In the native state, proteins are known to adopt hierarchical structures, which might be a result of a multistep folding process.One possible way to investigate this characteristic is to induce protein denaturation.The denaturation or unfolding can be induced by changes in temperature, pH, or even by the addition of denaturant agents like sodium dodecyl sulfate (SDS).A study of denaturation induced by heat will be presented here.
The experiments were performed at the SAXS beamline of the Brazilian Synchrotron Light Laboratory, Campinas, Brazil.The wavelength selected for the experiments was λ =1.49Å and the distance between the sample and detector was 745 mm.The measurements were performed using a 1D Gabriel-type detector.The samples were exposed in a 1.5mm capillary tube in a thermally controlled sample holder directly connected to the evacuated beam path.These experiments were performed with lysozyme samples at 10 mg/mL and pH 7.0 in a 10mM phosphate buffer with 50mM of NaCl.Indirect Fourier transformations were performed using program package GNOM which enabled the correction of smearing effects.Ab initio models were built using program DAMMIN.The results are shown in Fig16.As can be seen, when the protein solution is subjected to 80ºC an evolution of the SAXS profiles as a function of time is observed.As shown in equation 13 and 14, the forward scattering I(0) can provide an estimation for the molecular weight.For a system that presents the formation of aggregates over time, the obtained molecular weight will be an averaged value since a distribution of sizes can be present in the system.However, because the forward scattering is proportional to the square of the particle volume, large particles will have a higher contribution to the final intensity.If one assumes that in each stage the aggregates have a similar size, since the total mass of proteins is constant, it is possible to write, If we normalize I(0) agg by the forward scattering of the lysozyme measured at room temperature (native state) at the same concentration, this fraction will be a good estimation for the average number of monomers per aggregate:

Shape and low resolution structure of extracellular hemoglobins calculated from SAXS data
Given the inherent difficulties to obtain the crystallization of proteins with high molecular weights, low resolution studies of extracellular hemoglobins in solution have been the main tool of its structural studies.The physicochemical properties of extracellular hemoglobins (erythrocruorins) have been under study since the 1930s.In particular, different oxygen affinities and cooperativities were reported for molecules with very similar heme content, dimensions and molecular weight.This fact has led the investigators to focus attention on the possible structural differences that could explain this diverse functional behavior.Two very comprehensive reviews on the structure of extracellular hemoglobins have been published by Chung (1979) and Weber (2001).The challenge has always been the elucidation of the interaction among the more than 200 subunits of these respiratory proteins, which lead to the spontaneous, self-limited assembly and cooperative oxygen binding, which are not yet completely understood.In this section the results of the study of extracellular hemoglobins from Glossoscolex paulistus with molecular weight of ~3,100 kD will be presented.Advanced methods of shape restoration from the X-ray scattering data allowed a description of the subunit arrangement of these molecules as well as the determination of dimensional parameters which could also be confirmed by the results of hydrodynamic measurements and calculations for the models proposed.There are only minor differences in the properties already reported on the subunit structure of Lumbricus terrestris hemoglobin (Fushitani et al., 1991) and the previous works on the structural subunits of G. paulistus studied by pH induced and high pressure dissociation by Bonafe et al.,1991 andSilva et al., 1989, indicating the similarity of these proteins, spite of the differences in molecular weight.Samples of G. paulistus were purified according to a standard procedure (Silva et al., 1989, Bonafe et al., 1991) in several concentrations.SAXS measurements were made using synchrotron radiation at Brazilian Synchrotron Light Laboratory, with hemoglobin in 0.05 M TRIS-HCl buffer pH 7.5.The hemoglobin concentrations used in the experiments varied from 0.5 to 40 mg/mL and the final combination of the frames enabled the extrapolation to zero concentration.The scattered intensities were recorded with a linear position sensitive detector and the primary data correction was done using standard procedures.The q range was from q = 0.005 to 0.1882 Å -1 , with radiation wavelength of  = 1.74Å.To collect the low and high angles scattering data, two sample-detector distances were used (1.74m and 0.84 m).The samples were kept in a 1.5 mm diameter capillary tube sample holder, kept at a constant temperature (20ºC).Indirect Fourier Transformation was performed using the GNOM program package.Ab initio calculations were performed using program DAMMIN.The experimental scattered intensity was normalized to absolute scale using water as a primary standard, which enabled the calculation of the protein molecular weight and volume.Finally, hydrodynamic properties of molecular models can be calculated using an approach initially developed for crystallographic structures (program HYDROPRO, de La Torre et al., 2000), which can be easily extended to dummy atom models when the molecular mass and partial specific volume of the protein are known (Arndt et al., 2002).As a result, several hydrodynamic parameters can be calculated and compared with the values obtained by other experimental methods.This comparison can be very useful in order to check the validity of the molecular conformation represented by the 3D models proposed.From the p(r) function we obtain a radius of gyration of 113.6 +/-0.7 Å and maximum dimension of 300 +/-10 Å .In Fig. 18 we see the excellent fitting of the intensity curve and the p(r) function from which the values of Rg and Dmax were calculated.The molecular mass and particle volume where calculated for G. paulistus using the I(0) value giving 3.1  0.2 MDa and 3.8  0.1 x 10 6 Å 3 , respectively.These values are compatible with the dimensions obtained with SAXS and from electron micrografs (EM) from G. paulistus (Souza, 1990).The overall shape of the particle as present in the EM analysis shows a P62 symmetry, which can be used as a constraint in the model calculation.The introduction of symmetry constraints decreases the number of degrees of freedom, and consequently leads to the restoration of a better three-dimensional model (Svergun, 2000;Oliveira, 2001).In this way it was used a P62 symmetry in the model optimization.In fig.16 we present one of the best results of the three-dimensional molecular models.Several runs of the optimization program were performed.For each obtained model hydrodynamic parameters were calculated and the ones that provided values not in agreement with the experimental were excluded.As a result, a model based in the SAXS results and also in agreement with other experimental data could be selected (Table 2).For comparison it is shown in Fig16 the result obtained by Royer et al. (Royer et al,2000) for the hemoglobin of Lumbricus terrestris using protein crystallography and electron microscopy.These results showed that the proteins are quite similar in quaternary structure.Due to the inherent difficult of make crystals of proteins, particularly for large proteins like the G. paulistus, the presented results demonstrated the capability of SAXS technique and the new optimization methods to provide a fast and reliable procedure to investigate the shape and quaternary structures of large protein complexes.Also, the correlation of SAXS results with the hydrodynamic properties increases the reliability of the results and makes possible to perform a model search integrated with hydrodynamic calculations.

Conclusions
General aspects of Small Angle X-Ray applied to the study of colloidal particles in solution were presented.Two examples of application were shown, demonstrating the versatility of the SAXS technique.One of the main strengths of this technique is the possibility of investigate systems directly in solution, close to the native state, in a broad range of sizes and molecular weights.On the other hand, due to the low information content of a typical SAXS data, the scattering data has to be correlated and supported by additional information, obtained from other experimental techniques.In this way, even thought SAXS data can provide a valuable and important structural information, the technique and the modeling methods has to be applied with extreme precaution and always cross checked with several additional results in order to provide relevant, unambiguous information and, most importantly, avoid wrong data interpretation.As shown in this chapter, absolute scale calibration and the comparison of hydrodynamic properties of the obtained models with the ones obtained experimentally are two very useful tools for results checking and model validation.

Acknowledgment
The author acknowledges FAPESP for financial support (Proj.#2000/15087-4 and #2010/09277-7).University of São Paulo is acknowledged for the financial support of this book chapter.Prof. Carlos Bonafé is kindly acknowledged for his help and support to the preparation of the G. Paulistus samples.The author is grateful to Profa.Iris Torriani for several valuable discussions.
www.intechopen.comInvestigatingMacromolecular Complexes in Solution by Small Angle X-Ray Scattering 369 average self-correlation function.With these substitutions the intensity for a single particle randomly oriented is given by

Fig. 3 .
Fig. 3. Theoretical calculation for a two dimensional scattering profile for an ellipsoidal particle randomly oriented.The intensity is given in logarithm scale.Insert: 1D-profiles of the intensity.One usual procedure is to define the so called pair distance distribution function p(r),     2 p rrr   which is a histogram of pair distances inside of the particle, weighted by the distance length and by the product of the electron densities of the infinitesimal elements of the pair (Glatter, 2002).The p(r) function permits the definition of the Fourier pair:        1 0 2 1 2 0

Fig. 5 .Fig. 6 .
Fig. 5. Theoretical calculations for scattering intensities and corresponding p(r) functions for bodies with simple shapes.The form factors were normalized to one.

Fig. 7 .
Fig. 7. Theoretical calculations for scattering intensities and corresponding p(r) functions for a core-shell particle with different scattering length contrasts.

Fig. 8 .
Fig. 8. Example of application of the use of a known form factor .Experimental data (open circles) of a mature fiber of Glucagon (Oliveira et al, 2009) and the theoretical fit (solid line), assuming a form factor of cylinders, with radius R and length L. The SAXS data was measured at the laboratory SAXS instrument Nanostar TM , from Professor Jan Skov Pedersen at University of Aarhus, Denmark.

Fig. 9 .
Fig. 9. Example of application of the deconvolution method.A) Experimental SDS data (open circles) and the IFT fit (solid line).B) IFT Calculated p(r) function (open circles) and the theoretical p(r) obtained from the deconvolution method (solid line).C) Restored radial electron density profile presented as step functions (dashed line) and by the use of a smooth approximation (solid line).The SAXS data was measured at the laboratory SAXS instrument Nanostar TM , from Professor Jan Skov Pedersen at University of Aarhus, Denmark.

Fig. 10 .
Fig. 10.Ab initio modeling of experimental SAXS data using multipolar expansion.Left: on red it is shown the model obtained superposed with the backbone of the protein obtained from its known crystallographic structure (pdb entry 6lyz.pdb).Right: Fit of experimental data.Open circles -experimental data.Solid line -model fit.A further improvement on the ab initio procedure for modeling SAXS data was proposed initially byChacón (program DALAI, Chacón et al, 1998), and later bySvergun (Program  DAMMIN, Svergun, 1999), Doniach (program SAXS3D, Walter et al, 2000)  among others.In this method the particle is build using the finite element approach, by the use of a closed packing arrangement of spherical subunits.Since the number of possible solutions is very large, Monte Carlo based optimization methods are used to obtain the set of spherical subunits that gives the best fitting of the scattering data.The program DAMMIN is widely cited in the literature and starts by creating a spherical search space with diameter equal or slightly larger than the particle maximum diameter D MAX (obtained from the p(r) curve).By the application of a simulated annealing procedure, constrained by penalty functions that ensure particle compactness and smoothness(Volkov and Svergun, 2002), a subset of the initial search space can be obtained providing a three-dimensional model that represents the particle shape.Due to the intrinsic randomness of Monte Carlo approaches, several independent runs of these model procedures will lead to different models.However, it is possible to show that all models might share similar features like overall anisotropy, size, etc.This model approach permits a better representation of the particle shape than the multipolar expansion since it does not have the above mentioned limitations for the shape description.However, since the internal structure is not represented, this method cannot describe data up to high q values(Volkov and Svergun, 2002).One example of this so called

Fig. 11 .
Fig. 11.Ab initio modeling of experimental SAXS data using dummy atom modeling.Left: model results.Semitransparent spheres -initial search space.Solid spheres -selected subset that gives the best fit.Blue backbone -protein backbone obtained from its known crystallographic structure (pdb entry 6lyz.pdb).Right: Fit of experimental data.Open circles -experimental data.Solid line -model fit.

Fig. 12 .
Fig. 12. Ab initio modeling of experimental SAXS data using dummy chain modeling.Left: optimized backbone structure (solid spheres) superimposed by the backbone obtained from the protein known crystallographic structure (pdb entry 6lyz.pdb).Right: Fit of experimental data.Open circles -experimental data.Solid line -model fit.overall size, shape and anisotropy can be obtained from these approaches.Another very useful application of the study of proteins in solution is the use of known atomic resolution data in connection with SAXS data.If the full atomic model for the protein is known, the comparison of the theoretical scattering intensity against experimental data provides direct information about the conformation of the protein in solution in comparison with the atomic resolution structure.A good fit indicates that the structure of the protein in solution is similar to the given by the atomic resolution model.Discrepancies in the fit indicate differences between the atomic resolution model and the protein structure in solution.A widely cited procedure that enabled a successful comparison between experimental data and atomic resolution structures was developed bySvergun and co-workers (program  CRYSOL, Svergun et al, 1995)  where it was demonstrated that a hydration shell around the protein with slightly higher electron density than the one from the bulk was necessary to be considered.One example of this comparison between experimental data and theoretical SAXS intensity calculated from atomic coordinates is shown in Fig13 where the crystallographic structure for the protein lysozyme was used for the comparison with experimental data of lysozyme in solution.

Fig. 13 .
Fig. 13.Ab initio modeling of experimental SAXS data using dummy chain modeling.Left: representation of the crystallographic structure of lysozyme (pdb entry 6lyz.pdb).Right: Fit of experimental data.Open circles -experimental data.Solid line -theoretical fit.

Fig. 14 .
Fig. 14.Ab initio modeling of missing loop of a hypothetical structure using experimental SAXS data.Left: crystallographic structure of lysozyme (pdb entry 6lyz.pdb) and the restored loop.Semitransparent structure -lysozyme structure with a missing part.Blue blackbonerestored loop superposed to the real, clipped loop.Right: Fit of the experimental data.Open circles -experimental SAXS data for lysozyme in solution.Dotted line -fitting of the scattering data for the structure without the loop.Solid line -fitting of the scattering data for the structure with the optimized loop.

Fig. 15 .
Fig. 15.Rigid body modeling of a hypothetical structure using calculated SAXS data.Left: hypothetical heterodimer built using two atomic resolution structures.Semitransparent spheres: original structure.Blue and green strands: optimized heterodimer.Right: Fit of the generated data.Open circles -generated SAXS data for the heterodimer.The data was created using program CRYSOL from the built model.Standard deviations were added in order to mimic experimental uncertainties.Solid line -fitting of the scattering data for the optimized structure.

Fig. 16 .
Fig. 16.Aggregation of Lysozyme induced by heat.Top left: scattering data (open circles) and desmeared IFT fits (solid lines).The frames were collected at 80 o C in intervals of 5 minutes (first-5min, last-60min).A frame of lysozyme at room temperature (open triangles and dotted line) was added for comparison.Top right: pair distance distribution functions p(r) for each dataset.A frame obtained from the SAXS data for lysozyme at room temperature (dotted line) was added for comparison.Bottom: Ab initio models restored for each frame.It is possible to see the increase in size for the average model.For comparison, the crystal structure of lysozyme is shown on the left as ribbons.

Fig. 17 .
Fig. 17.Average number of monomers per aggregate as a function of time.At least two aggregation rates can be found.The lines are just for eye guide.The forward scattering for the native protein and the first 5 min frame at 80 o C are almost identical indicating that at this stage the protein is still in monomeric state.However the differences in the scattering curve and in the p(r) functions when compared to the native state (Fig15) indicates that the protein has adopted a different conformation.This state is known as molten-globule which is a state where the protein is partially denaturated.Interestingly, the protein starts to be denaturated at 80ºC, being stable over lower temperatures (data not shown).Using equation 18, it was possible to calculate the average number of monomers per aggregate, which is shown as a function of time in Fig 17.From this graph, at least two aggregation rates can be identified in the graph, which might indicates that initially the aggregation process is slow but after 30 minutes at 80ºC and with around 5 monomers per aggregate the aggregation is accelerated reaching a number of around 45 monomers per aggregate for 1h at 80 o C. A visualization of the obtained aggregates is shown in Fig 14whichagrees with above conclusions.The obtained results confirms other results from the literature which indicates that the denaturation of the lysozyme can be understood as one stage process(Hirai et al, 1998)

Fig. 19
Fig. 19.A) Crystallographic structure of Lumbricus terrestris -from Royer et al.(2000).B) Calculated dummy atom models for the hemoglobin from G. paulistus with the computer program DAMMIN using a P62 symmetry.The models are in the same scale.

Table 1 .
A more complete list of analytical expressions for form factors can be found in 1I q .Some examples are shown in