Unified oligonucleotide
1. Introduction
Microarrays make the use of hybridization properties of nucleic acids to monitor Deoxyribonucleic acid (DNA) or Ribonucleic acid (RNA) abundance on a genomic scale in different types of cells. The hybridization process takes place between surface-bound DNA sequences - the probes, and the DNA or RNA sequences in solution - the targets. Hybridization is the process of combining complementary, single-stranded nucleic acids into a single molecule. Nucleotides will bind to their complement under normal conditions, so two perfectly complementary strands will bind to each other readily. Conversely, due to the different geometries of the nucleotides, a single inconsistency between the two strands will prevent them from binding.
In oligonucleotide microarrays hundreds of thousands of oligonucleotides are synthesized
The dynamics of the hybridization process underlying genomic expression is complex as thermodynamic factors influencing molecular interaction are still fields of important research [1] and their effects are not taken into account in the estimation of genetic expression by the algorithms currently in use.
2. State of the art
Many techniques have been developed to identify trends in the expression levels inferred from DNA microarray data, and recently the attention was devoted to methods to obtain accurate expression levels from raw data on the underlying principles of the thermodynamics and hybridization kinetics. The development of DNA chips for rapidly screening and sequencing unknown DNA segments mainly relies on the ability to predict the thermodynamic stability of the complexes formed by the oligonucleotide probes.
The thermodynamics of nucleic acids have been studied from different points of view. Wu
An early study on DNA microarray hybridization [4] found that it was strongly dependent on the rate constants for DNA adsorption/desorption in the non-probe covered regions of the surface, the two-dimensional diffusion coefficient, and the size of probes and targets and also suggested that sparse probe coverage may provide results equal to or better than those obtained with a surface totally covered with DNA probes. A theoretical analysis of the kinetics of DNA hybridization demonstrated that diffusion was important in determining the time required to reach equilibrium and was proportional to the equilibrium binding constant and to the concentration of binding sites [5].
Newer studies on hybridization kinetics and thermodynamics reveal that perfect match sequences require less time to reach saturation than mismatches. The experimental results of Dai
The hybridization of nucleic acids was modelled [9] according with the supposition that the process of hybridization goes through an intermediate state in which an initial short contact region has a single-stranded conformation prior to binding.
The hybridization theory gave the possibility of developing models that can be used to obtain improved measures of expression useful for data analysis. Naef and Magnasco [10] propose a simpler model to describe the probe effect that considers only the sequence composition of the probes. They demonstrate that the interactions between nearest neighbours add much predictive power for specific signal probe effects. The stochastic model proposed by Wu and Irizarry [11] can be used to improve the expression measure or in the normalization and summarization of the data.
3. DNA hybridization
DNA is a nucleic acid that contains the genetic instructions monitoring the biological development of all cellular forms of life, and many viruses. DNA is a long polymer of nucleotides and encodes the sequence of the amino-acid residues in proteins using the genetic code, a triplet code of nucleotides. DNA it is organized as two complementary strands, head-to-toe, with the hydrogen bonds between them. Each strand of DNA is a chain of chemical “building blocks”, called nucleotides, of which there are four types: adenine (A), cytosine (C), guanine (G) and thymine (T). Between the two strands, each base can only bond with one single predetermined other base: A with T, T with A, C with G, and G with C, being the only possible combination.
Hybridization refers to the annealing of two nucleic acid strands following the base pairing rule. As shown in Figure 1, at high temperatures approximately 90°C to 100°C the complementary strands of DNA separate, denature, yielding single-stranded molecules. Two single strands under appropriate conditions of time and temperature e.g. 65°C, will re-naturate to form the double stranded molecule. Nucleic acid hybrids can be formed between two strands of DNA, two strands of RNA or one strand of DNA and one of RNA. Nucleic acids hybridization is useful in detecting DNA or RNA sequences that are complementary to any isolated nucleic acid.
Finding the location of a gene or gene product by adding specific radioactive or chemically tagged probes for the gene and detecting the location of the radioactivity or chemical on the chromosome or in the cell after hybridization is called
In the same way, in microarray technology, hybridization is used in comparing mRNA abundance in two samples, or in one sample and a control. RNA from the sample and control are extracted and labeled with two different fluorescent labels,
In the oligonucleotide microarrays the hybridization process occurs in the same way, the only difference here is that the sequences to be laid over the chip are sequences of 25 nucleotides length, perfect complementary to same length sequence of the gene, PM – perfect match, and sequences of 25 nucleotides length, designed to correspond to PM, but having the middle base - the 13th one, changed by its complementary base, MM – mismatch, as in Figure 2. The MM probes give some estimates of the random hybridization and cross hybridization signals. One principle to be followed in the design of oligonucleotide arrays is ensuring that the probes bind to their target with high accuracy. When the two strands are completely complementary they will bind by a specific hybridization, as it can be seen in Figure 3. On the contrary if there are mismatches between the nucleotides of the strands and they bind, a process called non-specific hybridization or cross-hybridization occurs.
The hybridization process has been studied from point of view of interaction between base pairs, the interaction with unintended targets and also from its kinetics processes. Because in practice the DNA chips are immersed in the target solution for a relatively short time, the arrival to equilibrium is not guaranteed. Yet full analysis of the reaction kinetics requires knowledge of the equilibrium state. An understanding of the equilibrium state is also necessary to identify the relative importance of kinetic controls of the performance of the DNA microarrays. The effect of the cross-hybridization on probe intensity is predictable in the oligonucleotide microarrays, and models for avoiding this have been developed [14], [15], [16] some aspects of it going to be described in the following section.
4. Technical factors affecting gene expression
4.1. Thermodynamics parameters
Black and Hartley [18] define enthalpy as the sum of the internal energy of a thermodynamic system plus the energy associated with work done by the system on the atmosphere, which is the product of the pressure times the volume, as in equation (1)
Because enthalpy is a property, its value can be determined for a simple compressible substance once two independent, intensive thermodynamic properties of the substance are known, and the change in enthalpy is independent of the path followed between two equilibrium states
In [18] the entropy,
where
The following models to be described use the state function parameters, enthalpy and entropy. State functions define the properties of a thermodynamic state. In a change between two thermodynamic states, the change in value of the state function is given by the symbol
The standard enthalpy change,
The standard entropy change,
4.2. Interaction between pairs
The nucleic acid duplex stability can be endangered by the interaction between the nucleotide bases. Thermodynamics for double helix formation of DNA/DNA, RNA/RNA or DNA/RNA can be estimated with nearest neighbour parameters. Enthalpy change,
The nearest-neighbour model for nucleic acids, known as the NN model, assumes that the stability of a given base pair depends on the identity and orientation of neighbouring base pairs [3]. Previous studies in NN model parameters were brought forth in [15] and [19].
In the NN model, sequence dependent stability is considered in terms of nearest-neighbour doublets. In duplex DNA there are 10 such unique internal nearest-neighbour doublets. Listed in the 5’-3’ direction, these are AT/AT TA/TA AA/TT AC/GT CA/TG TC/GA CT/AG CG/CG GC/GC and GG/CC. Dimmer duplexes are represented with a slash separating strands in antiparallel orientation
The total difference in the free energy of the folded and unfolded states of a DNA duplex can be approximated at 37o, with a nearest-neighbour model:
where
For a specific temperature one can compute the total free energy using the values from Table 1. As described in [19] the melting temperature
For self-complementary oligonucleotides, the
where
kcal/mol | kcal/mol | |
AA/TT | -7.9 | -22.2 |
AT/TA | -7.2 | -20.4 |
TA/AT | -7.2 | -21.3 |
CA/GT | -8.5 | -22.7 |
GT/CA | -8.4 | -22.4 |
CT/GA | -7.8 | -21.0 |
GA/CT | -8.2 | -22.2 |
CG/GC | -10.6 | -27.2 |
GC/CG | -9.8 | -24.4 |
GG/CC | -8.0 | -19.9 |
Init. w/term G•C | 0.1 | -2.8 |
Init. w/term A•T | 2.3 | 4.1 |
Symmetry correction | 0 | -1.4 |
The nearest-neighbour parameters of Delcourt et al. (1991) [20], SantaLucia et al. (1996) [19], Sugimoto et al. (1996) [15] and Allawi et al. (1997) [21] were evaluated from the analysis of optical melting curves of a variety of short synthetic DNA duplexes in 1 M Na+.
The observed trend in nearest-neighbor stabilities at 37°C is GC/CG = CG/GC > GG/CC > CA/GT = GT/CA = GA/CT = CT/GA > AA/TT > AT/TA > TA/AT, as in Table 2. This trend suggests that both sequence and base composition are important determinants of DNA duplex stability. It has long been recognized that DNA stability depends of the percent G-C content.
Sequence | ||||
AA/TT | -0.67 | -1.02 | -1.20 | -1.00 |
AT/TA | 0.62 | -0.73 | -0.90 | -0.88 |
TA/AT | -0.70 | -0.60 | -0.90 | -0.58 |
CA/GT | -1.19 | -1.38 | -1.70 | -1.45 |
GT/CA | -1.28 | -1.43 | -1.50 | -1.44 |
CT/GA | -1.17 | -1.16 | -1.50 | -1.28 |
GA/CT | -1.12 | -1.46 | -1.50 | -1.30 |
CG/GC | -1.87 | -2.09 | -2.80 | -2.17 |
GC/CG | -1.85 | -2.28 | -2.30 | -2.24 |
GG/CC | -1.55 | -1.77 | -2.10 | -1.84 |
Average | -1.20 | -1.39 | -1.64 | -1.42 |
Init. w/term G•C | NA | 0.91 | 1.70 | 0.98 |
Init. w/term A∙T | NA | 1.11 | 1.70 | 1.03 |
On the other hand, the nearest neighbour
4.3. Interaction with unintended targets
As seen in previous sections the major issue in microarray oligonucleotide technology is the selection of probe sequences with high sensitivity and specificity. It has been shown [22] that the use of MM probes for assessment of non-specific binding is unreliable. Since the duplex formation in solution has been studied using the nearest neighbour model [3], [15] the microarray design in terms of probe selection has been achieved by using a model based on the previously mentioned nearest neighbour model [16]. The model of Zhang
According with their method, the observed signal
where
where
The positional-dependent-nearest-neighbour model appears to indicate that the two ends of a probe contribute less to binding stability according to their weight factors, see Figure 4. a). It also can be observed that there is a dip in the gene specific binding weight factors of MM probes around the mismatch position, probably due the mismatch which destabilizes the duplex structure. In Figure 4. b) it can be noted that stacking energies in the positional-dependent-nearest-neighbour model can give an explanation for the presence of negative probe pair signals.
This model, together with the nearest neighbour model solves the problem of binding on microarrays, but still there are factors that affect the gene expression measuring. One of them affects the process of competing adsorption and desorption of target RNA to from probe-target duplexes at the chip surface.
4.4. Kinetic processes in hybridization thermodynamics
4.4.1. Derivation of the Langmuir isotherm
For molecules in contact with a solid surface at a fixed temperature, the Langmuir Isotherm, developed by Irving Langmuir in 1916, describes the partitioning between the gas phase and adsorbed species as a function of applied pressure.
The adsorption process between gas phase molecules, A, vacant surface sites, S, and occupied surface sites, SA, can be represented by the following chemical equation, assuming that there are a fixed number of surface sites present on the surface, as in Figure 5.
When considering adsorption isotherms it is conventional to adopt a definition of surface coverage (
4.4.2. Thermodynamic derivation
An equilibrium constant
where:
[SA] is proportional to the surface coverage of adsorbed molecules, or proportional to
[S] is proportional to the number of vacant sites, (1 –
[A] is proportional to the pressure of gas,
Thus it is possible to define another equilibrium constant,
Rearranging the equations (10) and (11) one can obtain the expression for surface coverage:
4.4.3. Kinetic derivation
The equilibrium that may exist between gas adsorbed on a surface and molecules in the gas phase is a dynamic state,
The rate of adsorption will be proportional to the pressure of the gas and the number of vacant sites for adsorption. If the total number of sites on the surface is
The rate of change of the coverage due to the adsorbate leaving the surface (desorption) is proportional to the number of adsorbed species:
In these equations,
4.4.4. Dynamic absorption model
Burden
For the initial condition
where
Using equation (16) Burden
At equilibrium, the intensity
5. Hybridization dynamics compensation
5.1. Modelling hybridization by thermodynamics
It is well known that hybridization processes may be seen under the point of view of general thermodynamic conditions [23], meaning that the hybridization probability of a given test segment will be defined by its thermodynamic conditions,
where
where
Recent studies [24], [25] confirm the hypothesis that the hybridization process for the each of the probe pairs follows a time model according to the one from Figure 7. This model of evolution predicts that the probability of hybridization will be almost zero if not enough time interval is provided for the experiment to take place, and that in the limit, if enough time is allowed saturation will take place.
A practical solution to the different hybridization dynamics can be solved by using multiple regressions to convey PM-MM probe pairs to equivalent thermodynamic conditions by processing diachronic hybridization experiments [26].
The last procedure will be explained in more detail in the following paragraphs.
5.2. Exponential regression model
From equation (20) one can assume that a model to solve the multiple regression problem implicit in this study will have the following form:
where
Vertical least square fitting proceeds by finding the sum of the squares of the vertical deviations
where:
is the estimation error incurred for each component.
With this notation equation (22) will became:
The condition of
From equations (24), (25) and (26) one will obtain:
A solution for equations (27) and (28) can be found using the gradient method. In this case the parameters are going to be computed adaptively:
where
5.3. Application for experimental data
The experimental part has been complemented with artificially simulated test probes used for algorithmic validation. A diachronic database was also being produced to estimate hybridization time constants for different gene segments.
Considering these assumptions data records have been created from experimental data fitted by the above mentioned models. The time dynamics of hybridization for both probe sets and their profiles were evaluated at certain time intervals.
Firstly, the diachronic data distribution for an evolution from 0 to 30 minutes is shown in Figure 8 in both cases, for the PM probe set and the MM probe set, and in the following figures,
The next step on data analysis was to look at the probe profiles, at certain times. Figure 11 shows the regression parameters obtained for time constants. The profiles of the perfect and mismatch were extracted for two different time values underlining the fact that if enough time is allowed to some probes, the mismatches will also hybridize completely.
Considering this and applying the regression algorithm, we observed that this algorithm searches for the matching values of expression levels of probes sets and for estimated values of perfect and mismatch probes. One of the steps of this iterative algorithm can be seen in Figure 12.
Once the iterative process was complete, certain probes have reached their target. In the expression level estimation most of the perfect match probes obtained the expected values, while some of the mismatch probes did not reach their target, Figure 13. Similar results were obtained in the case of matching hybridization for time constants.
6. Conclusions
The thermodynamics of oligonucleotide hybridization processes where PM-MM results do not show the expected behaviour, thus affecting to the reliability of expression estimation, was studied in this chapter and the following conclusions were emphasized:
Modelling the hybridization process through thermodynamical principles reproduces exponential-like behaviour for each P-T segment pair.
The hybridization process should be confined to the time interval where linear growth is granted, this is, at the beginning of the exponential curve shown in Figure 6.
Adaptive fitting may be used to predict and regress expression levels on a specific test probe to common thermodynamic conditions. Time constants may be inferred from the regression parameters adaptively.
The main features of the PM-MM probe sets may be reproduced from probabilistic modelling.
It may be expected that more precise and robust estimations could be produced using this technique with diachronically expressed hybridization experiments.
Acknowledgement
This work was supported by the project "Development and support of multidisciplinary postdoctoral programmes in major technical areas of national strategy of Research - Development - Innovation" 4D-POSTDOC, contract no. POSDRU/89/1.5/S/52603, project co-funded by the European Social Fund through Sectoral Operational Programme Human Resources Development 2007-2013.
References
- 1.
Malutan R. Gómez Vilda. P. Berindan Neagoe. I. Borda M. 2011 Thermodynamics of Microarray Hybridization 93 255 261 - 2.
Wu P. Nakano S. Sugimoto N. 2002 Thermodynamics of Microarray Hybridization 269 2821 2830 - 3.
Santa Lucia. Jr J. 1998 Thermodynamics of Microarray Hybridization PNAS on Biochemistry.95 1460 1465 - 4.
Chan V. Graves D. J. Mc Kenzie S. E. 1995 The Biophysics of DNA Hybridization with Immobilized Oligonucleotides Probes. Biophysical Journal.69 2243 2255 - 5.
Livshits M A, Mirzabekov A D 1996 Thermodynamics of Microarray Hybridization 71 2795 2801 - 6.
Dai H. Meyer M. Stepaniants S. Ziman M. Stoughton R. 2002 Thermodynamics of Microarray Hybridization e86.1 e86.8 - 7.
Dorris D. R. et al. 2003 Thermodynamics of Microarray Hybridization 6 EOF - 8.
Binder H. Preibisch S. 2005 Thermodynamics of Microarray Hybridization Biophysical Journal.89 337 352 - 9.
Wang J. Y. Drlica K. 2003 Modelling hybridization kinetics. Mathematical Bioscience.183 37 47 - 10.
Naef F. Magnasco M. O. 2003 Thermodynamics of Microarray Hybridization Physical Review E., 68:011906-1- 011906-4 - 11.
Wu Z. Irizarry R. A. 2004 Thermodynamics of Microarray Hybridization Proc. of the 8th Annual International Conference on Research in Computational Molecular Biology.98 106 - 12. www.accessexcellence.org/RC/VL/GG/index.html
- 13.
Lipshutz R L, Fodor S P A, Gingeras T R, Lockhart D J 1999 High density synthetic oligonucleotide arrays. Nature Genetics Supplement.21 20 24 - 14.
Burden C. Pittelkow Y. E. Wilson S. R. 2004 Thermodynamics of Microarray Hybridization - 15.
Sugimoto N. et al. 1996 Thermodynamics of Microarray Hybridization Nucleic Acids Research.24 4501 4505 - 16.
Zhang L. Miles M. F. Aldape K. D. 2003 Thermodynamics of Microarray Hybridization 21 7 818 821 - 17.
Huang J C, Morris Q D, Hughes T R, Frey B J 2005 Thermodynamics of Microarray Hybridization i222 i231 - 18.
Black W Z, Hartley J G 1991 Thermodynamics. Second Edition. SI Version. Harper Collins Publisher - 19.
Santa Lucia. Jr Allawi J. Seneviratne H. T. P. A. 1996 Thermodynamics of Microarray Hybridization 35 11 3555 3562 - 20.
Delcourt S G, Blake R D 1991 Thermodynamics of Microarray Hybridization 266 15160 15169 - 21.
Allawi H. T. Santa Lucia. Jr Thermodynamics J. of N. M. R. Internal G•. T. Mismatches in. D. N. A. Biochemistry 36 10581 10594 - 22.
Li C. Wong W. H. 2001 Thermodynamics of Microarray Hybridization PNAS USA.98 1 31 36 - 23.
El Samad H. Khammash M. Petzold L. Gillespie D. 2005 Thermodynamics of Microarray Hybridization Int. Journal of Robust and Nonlinear Control.15 15 691 711 - 24.
Dai H. Meyer M. Stepaniants S. Ziman M. Stoughton R. 2002 Thermodynamics of Microarray Hybridization e86.1 e86.8 - 25.
Zhang Y. Hammer D. A. Graves D. J. 2005 Thermodynamics of Microarray Hybridization Biophysical Journal.89 2950 2959 - 26.
Diaz F. Malutan R. Gomez P. Martinez R. Stetter B. Fe Paz. M. Garcia E. Pelaez J. 2006 Estimating Oligonucleotide Microarray Expression by Hybridization Process Modelling. Proc. of IEEE/NLM Life Science Systems and Applications Workshop.1 2