## 1. Introduction

Many DNA biotechnological applications, such as PCR or cDNA expression profiling, depend on thermodynamic parameters, which are sequence dependent. We could cite the strand melting temperature as an example of such thermodynamic parameters. In a general way, physical properties of DNA or RNA sequences can be calculated, in a very simple form, from algorithms in the context of nearest-neighbor (NN) models, whose core characteristic is providing linear representations for experimental measurements on nucleotide chains always in terms of pairwise (dimer) sequence contributions.

However, NN dimer parameters cannot be assigned from experiments by solving a set of simultaneous linear equations. This is known since the beginning of the development of these models in the context of polynucleotide thermodynamic studies [1]. In fact, when we consider intrinsic composition closure constraints, the number of degrees of freedom of the model is effectively reduced.

Dimer occurrence relations are well known, thus allowing for decomposition of sequence properties into dimer contributions. Many authors, because of that, have preferred to use dimers as fundamental units because they provide the most straightforward decomposition scheme [2–6]. Although the dimer set values fit easily into the theoretical NN model approximation, the dimer composition is overstated. In fact, the dimer set size, which is equal to 16 (in the case of a simple chain) and 10 (in the case of double chains) [2–7], is greater than the number of degrees of freedom of the problem. However, the extraction of dimer set contributions has remained an ill-posed problem. To accomplish this task further, ad hoc regularization hypothesis has been used so far. As a corollary, so-far-unknown constraints must also link the full dimer set properties in some hidden way to restore full set unity. Alternative approaches have considered decompositions into irreducible and hence smaller sets of short sequences or dimer combinations [8–11]. Comparison between different laboratory sets and physical interpretation of set values becomes a difficult task due to the arbitrariness of possible renderings. The extraction of simpler and more direct dimer contributions from such sets has remained an ill-posed problem with no unique solutions but still embraced by a large community of biochemists [2–6]. To adopt the dimer set formulation further, ad hoc regularization hypotheses have been taken by different authors, such as the singular value decomposition method [4, 12].

In this review, among other objectives, we present an approach to this problem based on the analysis of how the nucleotide intrinsic intermolecular symmetries contribute to the structure of NN sets, as proposed by Licinio and Guerra [13]. Therefore, to achieve that, initially, it is introduced to a general quantum mechanics statement, giving physical properties for a sequence of heterogeneous molecules treated as subsystems assuming any of a given complete set of molecular states. The four-nucleotide set has a corresponding four-state representation. At this point, a careful choice of the number of degrees of freedom is made in order to project the representation into a three-dimensional molecular class space. Luckily, the three independent molecular classes are readily associated to the main biochemical classification of nucleotides as comprising purine–pyrimidine, amino–keto, and strong–weak bases. The representation of the four-nucleotide set as a tetrahedron in the three-dimensional space is at the heart of the approach, as proposed by Licinio and Guerra [13]. This representation has been used to generate DNA walks for sequence composition analysis or display. The corresponding proper space metrics have also been recently used for phylogenetic sequence comparisons [14]. In the following, we proceed to contract the original quantum mechanics statement into an irreducible formulation using the four-nucleotide tetrahedron representation. This molecular symmetrical decomposition is found to provide the right number of fundamental properties (free parameters), which is equal to 8, for the case of DNA double strands. We shall refer to these fundamental properties as constituting a symmetrical set of irreducible tensorial parameters. Next, we relate this decomposition to the dimer set formulation. The comparison uncovers useful and so far hidden self-consistency relations among dimers.

However, an important point still would need to be clarified. In fact, in many publications, one finds datasets that include experimental values for duplex oligonucleotides, where end effects were believed to be important [2–6]. Nevertheless, such initiation and termination parameters would seem to be very sensitive to the modeling and have changed a lot even inside the same research group [3–6]. In fact, Xia et al. had already argued that data from melting experiments of RNA duplexes are of insufficient accuracy to distinguish end effects [15]. With this motivation, as a second step in the development of the approach proposed by us and presented in this review, we proposed to extend the irreducible model to investigate how it would accommodate end effects. Guerra and Licinio in fact performed such extension and calculated the irreducible parameters for free energy, entropy, enthalpy, and the respective end contributions [16]. Later, a detailed algorithm for performing such calculations is described. However, at this point, it is necessary to anticipate some conclusions. For example, Guerra and Licinio obtained values for the end effects with relatively large errors. In addition, specifically for free energy, they could not distinguish between the weak and strong terminal base pairs. In the light of their finding, one simple statistical mechanics approach, when applied to the melting transition, shows that the approach based on end effects, according to the NN approach, proves to be naive, even heuristic. In fact, since the end effects were initially (wrongly) identified as the nucleation free energies, they should be dependent on the mean global composition of the chain. However, an only slightly more detailed statistical mechanics approach can show that, summed to the eight (polymeric) irreducible parameters for free energy, as already mentioned, there are other two parameters related to the initiation of the double helix (related to two possible base pairings). That is, in the light of the NN approach, there are 10 parameters, which expand the free energy of any DNA oligomers [17].

Before we continue our discussion throughout the forthcoming sessions, it is important to inform the reader that all theoretical results we obtained were applied to the analysis of DNA free energy by introducing, initially, the formulation of end contributions to the model, which will be presented later in this chapter. A simple statistical mechanics approach is then applied to the problem. As a result, a second set of parameters, including this time the initiation parameters, will be obtained. Anyway, a self-consistent set has thus been fit to free energy data from 108 short duplex oligomer sequences as available in the literature. We will show that, using both the modeling, the first based on end effects and the second based on the use of double helix initiation parameters, the more compact and symmetrical self-consistent set is shown to provide at least as good modeling for oligomer free energy as standard NN dimer models. The far-reaching strength of the theoretical modeling frame for DNA or RNA sequences as proposed by us resides in its compactness and symmetry. As will be discussed later in this review, one of the immediate and practical consequences of the use of the tetrahedral model is the disclosure of the initially hidden dimer self-consistency relations.

## 2. A quantum mechanics formulation for sequence properties

Complexity in biological phenomena represents an enormous challenge and a rich field for the application and development of physical methods. To unfold simple biopolymer phenomena, we start by a biochemical meaningful nucleotide representation into molecular classes and count on tools provided by the quantum mechanics. Here, we shall use the quantum mechanics formulation based on the matrix representation. What is needed from start is some base set for the description of the states of the system, which, for us, is a DNA or RNA sequence. The ensemble of sequence states is given by allowable sequence composition alone. We want to describe and isolate gross composition states. Inner electronic states or molecular conformation contributions, which would require a much finer level of quantum description, are so far intrinsically averaged. State transitions are of course forbidden if one neglects mutations. The sequence state will be given in terms of its molecular constitution, and a nucleotide set representation will condition the sequence representation.

The quantum mechanics expectation for any observable is given in terms of the corresponding operator * Θ* and system state

*N*particles or molecules is usually expressed as the tensorial product of their component states

*≤ i≤ N*):

For *d*-dimensional component states, this would lead a priori to the specification of *(Nd)*^{²} operator matrix elements

Here, submatrix elements pertaining to the same component at position *i* (diagonal or self-matrices *i ≠* 1, *N*), should be halved because they are counted twice in this formulation (see Fig. 1). We hope further reduction of this development can be obtained considering implicit symmetries of the Hermitian * Θ* matrix and its invariants under orthonormal base representations.

## 3. Nucleotide class-state representation

The most straightforward representation for a four-nucleotide set is, obviously, a four-dimensional vector. This “independent-nucleotide” representation has been implicitly adopted by many authors and leads to 4 × 4 matrices or 16 parameter sets when considering nucleotide pairwise properties [11]. This representation, however, already overstates the nucleotide composition problem from the beginning. The set representation should be more concisely established in a three-dimensional space. Thus, a complete and symmetrical representation for the usual DNA (or RNA) four-nucleotide set can be given within a tetrahedral decomposition scheme into a three-dimensional orthonormal base set

The nucleotides themselves are represented as a nonorthogonal (tetrahedral) *z*-component discriminates weak (two bridges, AT) versus strong (three bridges, CG) hydrogen bonding for Watson–Crick (WC) pairing; *x*-component discriminates purine (double ring, AG) versus pyrimidine (single ring, CT) nucleotide sizes; and *y*-component discriminates amino (nitrogen containing, AC) versus keto (oxygen containing, GT) nucleotide radicals.

In quantum mechanics language, a

Each possible nucleotide pair shares one of its fundamental molecular structural characteristics as a group in a given class, which differs from the complementary pair as another group in the same class. This is latent when we observe Eq. 3, which translates perfectly well the intrinsic cubic symmetry of the tetrahedron. From now, we proceed to construct our approach, which will use a complete nucleotide representation, and, then, having seen based on this representation, it will provide properties associated to each molecule decomposing them in terms of three differential affinity groups or classes. Therefore, the choice of a tetrahedral set is thus natural and convenient for its intrinsic orthogonality and symmetry properties, which are related to common molecular group classifications. Nevertheless, its main advantage is to fulfill the necessity for a three-dimensional bijective representation of a four-set composition.

## 4. Irreducible representation

Returning to the quantum mechanics formulation, our intention is to exploit remaining invariants and redundancies from the structure of the matrix operator present in Eq. 2 in order to further reduce its number of parameters. The three-dimensional nucleotide basis should be kept in mind. The sequence-dependent states of an observable will then assume discrete values given by a most compact expansion of its expectation as follows:

in substitution to Eq. 2; in Eq. 4, *i*, which are given in terms of class states by Eq. 3.

The bracket notation indicates vector and dyadic contractions as usual. The expansion in Eq. 4 is quite intuitive, in the sense that the first two terms represent linear contributions to a property from the sequence composition, whereas the third term comprises nonlinear effects due to NN interference or differential stacking interactions. Comparison with Eq. 2 allows the identification of its components. The first term is a constant or mean contribution to the observable, given as the invariant trace of the square expectation periodic matrix *M* is a second-rank tensor and has its elements given from the cross expectation matrix as

Decomposition of nucleotide sequence observable expectation as given in Eq. 4 naturally leads to an irreducible 13-parameter description of physical properties (*S*, *′-*3*′* NN dimer set

However, the NN dimer set is overspecified, that is, only a smaller set of NN combinations can be a priori obtained from inversions of Eq. 5 because Eq. 5 is supplemented by independent composition closure relations. For implicit circular sequences (or for very long sequences, i.e., polynucleotides), these can be taken as any three of the following:

(6) |

reducing the number of independent dimers in the set to arbitrary 13. Similar arguments hold for linear oligomers. In comparison, the decomposition of physical properties in the symmetrical set proposed here is in a fundamental level; since from the beginning, it includes only a priori linearly independent terms and gives contributions to the observable in the hierarchic form of three expectation tensors of increasing rank, corresponding to different levels of analysis. The 16-NN expectations can otherwise be easily obtained as a linear combination of the 13 symmetrical-set tensor components. In that case, it is useful to rewrite Eq. 4 in a form appropriate for NN dimer decomposition as follows:

where, to correctly account for additivity, as given by Eq. 5 for each dimer in a sequence, the two nucleotide linear contributions are halved. Explicitly, one has applying Eq. 3 to Eq. 7:

(8) |

and so on. Tensor elements can be either conversely determined from reported dimer values or self-consistently derived from fits to raw polynucleotide data using Eqs. 8 and 5, or directly from Eq. 4, while from a theoretical point of view, molecular symmetry arguments or ab initio calculations could be used to guess tensor structure and values.

### 4.1. Double strands

For measurements concerning double strands, aside end effects, it is well known that complementary strand symmetry further reduces the problem to the statement of only 10 conjugated NN dimer pair values (see the expressions in Eq. 12) linked through two independent composition closure relations as follows:

so that only eight independent parameters should result, while the difficulties in defining a 10-dimer set of parameters from a given set of experimental data persist. In that case, complementary strand A/T and C/G pairing symmetry in a dimer, as expressed in Eq. 3, gives the conjugate NN base component relations as follows:

where primed bases correspond to the complementary dimer and numerals correspond to the first and second nucleotides along 5′-3′ direction for each strand, that is, both order and *x*,*y* coordinates are inverted for the conjugate pair.

The double-strand expansion can be given as a function of a single-strand sequence taking into account the aforementioned implicit symmetries (by adding contributions from both strands to Eq. 7 taking into account Eq. 10 and then redefining the tensor set, that is,

correctly reducing the number of independent elementary tensor set values to 8. From Eqs. 7 and 11, the decomposition for the 10 paired NNs gives a self-consistent set of expectations obeying

(12) |

while the symmetrical set of eight tensor parameters can be inferred from the inverse relations

(13) |

This decomposition enlightens the meaning of the composition-free *S* term as the 16-dimer ensemble mean expectation value and of

Note that, analogous to the composition closure relations (Eq. 9), the dimer expectation self-consistency relations (Eq. 14) may also be combined to read as follows:

## 5. The modeling based on end effects

From now, we proceed to extend the irreducible model to investigate how it accommodates end effects. For the case of circular DNA, or even, for a DNA polymer, knowing the eight (polymeric) irreducible parameters (*S*, *M* matrix) is sufficient for the prediction of additive physical properties. For an oligomer, additional end effects would become important and would need to be accounted for. Thus, to correctly account such effects for, consider the following duplex sequence as follows:

where, according to the notation introduced by Gray [10, 11], *E* is a pseudo-base indicating the terminations of the sequence. Pseudo-base *E* simply would represent one of the NNs to the end base pairs, and, under this viewpoint, it indicates interactions between the end base pairs and the surrounding solvent. Following the reasoning line suggested by Licınio and Guerra [13] and introduced in Section 3 of this review,

Then, applying Eq. 7, and, considering Eq. 11, for the duplex dimer

where *A*, *B*, *C*, and, *D* are parameters that determine the property under consideration. And for the pseudo-duplex dimer

where, in Eqs. 18 and 19, *x*-component of the vector

Finally, we can conclude that the four possible end base pairs in Eq. 20 can be expanded in terms of four parameters, namely *A*, *B*, *C*, and *D*. Consequently, for a duplex oligomer, the additional four parameters related to the ends should be added to the eight polymeric parameters already known, producing a total of 12 irreducible parameters, in the light of the modeling based on the end effects.

## 6. Results and discussion for the modeling based on the end effects

From now on, the thermodynamical property *N* bases in the NN approximation could be calculated as the pairwise sum including end effects as a function of 12 irreducible parameters from Eqs. 12, 18, and, 19, as follows:

where

Simultaneous least-mean-square-deviation fit of this model to the 108 sequence data compiled by Allawi and SantaLucia [12] gave the values for the free energies, which are listed in Table 1. Guerra and Licinio [16] calculated irreducible parameters for the thermodynamic properties of free energy, entropy, and enthalpy but, in Table 1, only the irreducible parameters for free energy are shown. In Table 1, *A*, *B*, *C*, and *D* as defined in Eq. 18 or 19. In fact, there is no loss of generality in the use of the firsts once that they are linear combinations of the seconds.

For comparison, we performed another calculation, supposing that the contributions from the ends do not depend on the orientation of the end base pairs, that is, an A/T end pair would contribute in the same way as a T/A end pair, as it is usually found in the literature [2–6]. As a result, we obtained Table 2.

Considering the values obtained for the irreducible parameters for free energy presented in Tables 1 and 2, some observations must be carried out:

the free energy irreducible parameters in Tables 1 and 2 are such that they minimize *χ.* The quantity*χ* defines a global minimal deviation, between the theoretical values calculated from the irreducible parameter set for the free energies of the 108 sequences and the experimental values. In Eq. 22, *i*th sequence, *χ* considering 10 (or 12) parameters is precisely the same, namely 0.14 kcal/mol per dimer [16], which also coincides with the 12-parameter model using values reported by SantaLucia for the free energies for the 10 duplex dimers [3–6]. This means that, considering only the overall data ensemble quality, there is no practical reason to prefer a model with a greater number of parameters.

The intrinsic errors obtained for the contributions by the ends are sensibly larger than the errors for the other irreducible parameters. In this way, in all the decomposition schemes, the contributions of the ends are not so well defined, that is, we could not differentiate its orientation (for example, we could not differentiate A/T from T/A). Thus, or the available experimental data are not still sufficiently precise or even this modeling is still inadequate to account for end effects.

It is also verified that the C/G or G/C end pairing is only slightly more stable than the A/T or T/A end pairing. However, the intrinsic errors in data shown in Tables 1 and 2 are considerable, allowing for portions of the ranges of possible values of the end parameters to coincide. Thus, strictly speaking, in the modeling based on end effects, there is no differentiation between the terminal base pairs.

The errors of the irreducible parameters for free energy were estimated in the following way: Guerra and Licinio selected 100 sets of 80 sequences chosen randomly and then calculated the mean deviation for the parameters obtained from each set [16].

As shown, end contributions are fit with large errors to experimental data, as compared to the fits of other NN or dimer contributions. Besides A/T from T/A as well as C/G from G/C, ending contributions could not be respectively differentiated. More than that, we could not distinguish between the weak and the strong terminal base pairs. However, using both the sets, one can calculate free energies for DNA oligomers at least as well as standard models considering a larger set of parameters do [3–6]. Guerra and Licinio [16] also extended their analysis and obtained equivalent sets of irreducible parameters for enthalpy and entropy. By simultaneously minimizing the deviations from melting temperatures and entropies of the chains, they obtained the most precise set, which is capable of predicting melting temperatures for DNA chains with a standard deviation of 2.2°C for sequence against a deviation of 2.5°C for previous parameters found in the literature [3–6].

In the light of our finding, the formulation based on the use of end effects, according to the NN approach, proves to be naive, even heuristic. The extra parameters (up to now, the end parameters), which must be summed to the eight (polymeric) irreducible parameters for predicting thermodynamical properties of duplex oligomers, seem not to depend on the composition of the terminal base pairs. From now, we will invoke a new hypothesis, which will be detailed later in this review. With base on this hypothesis, we will conclude that, in the light of the NN model, 10 is the number of parameters expand the free energy of any DNA oligomers: eight (polymeric) irreducible parameters for free energy, already described, plus two parameters related to the initiation of the double helix (related to two possible base pairings).

## 7. The modeling based on double helix initiation parameters

Equation 21 establishes how to calculate the total free energy of a sequence of length *N*, according to the NN model, using the methodology based on the modeling by end effects.

On the other hand, in the statistical mechanics viewpoint, the free energy of the duplex formation

Whenever nucleation is the limiting process, the two-state model establishes that once the process is initiated, the helix extends to both ends of the chain [7]. The partition function or the equilibrium constant

where σ is the nucleation equilibrium constant and *i*th base pair to the preexisting duplex. For heteropolymers, *σ* and

that is,

Equation 26 can be conveniently rewritten as follows:

Eqs. 26 and 27 have the same signification, but when writing Eq. 27 in the form shown, we suppose that the formation of the first base pair of the duplex occurs in the *k*th site. Therefore, we can see that, by comparing Eq. 27 with Eq. 21, the nucleation free energy corresponds to the end effects in the NN approach, except by the term

Quantity

Recently, Guerra and Licinio connected to the two approaches, namely the NN and the statistical mechanics approaches, and they calculated the equilibrium constants and free energies for nucleation and propagation of a double helix in the following transition reactions [16]:

For the above homopolymers, they obtained the following nucleation free energies, at standard 1 mol concentration:

These values were obtained using values obtained for end effects calculated from the simultaneous least-mean-square-deviations fit of the NN model to the 108-sequence data compiled by Allawi and SantaLucia [2] and listed in Tables 1 and 2, and values experimentally obtained for A/T and C/G base pairings compiled by the Frank-Kamenetskii Group [18]. Once they obtained intrinsically large errors for the end effects, the nucleation free energies for

As a more appropriate modeling is a necessity, we will look for a more precise interpretation for the nucleation free energy term in the expansion of the free energy of a duplex oligomer. For this, initially, we will write the free energy for the formation of a duplex oligomer as found in some approaches in the literature [4, 6, 19]:

where, according such references,

The question posed in the last paragraph will guide us throughout this section. To answer it, consider, initially, the general reaction of formation of a double helix of length *N.* Such duplex is formed from two separated and complementary strands *S* and *S’*. This process is the chemical reaction *≤ k ≤ N*). We suppose, with this, that the nucleation occurs in the *k*th site of the double chain. Finally, after the nucleation, the formation of the WC first base pair occurs, that is, the base pair

that is,

In order to consider the propagation of the double helix from the nucleating base pair

In Eq. 35, σ is the nucleation equilibrium constant, *κ =* *i < k*) and *i > k*) are the propagation equilibrium constants related to the propagation of the double helix, by stacking of the base pair

As the propagation equilibrium constant depends on the local composition, we associate to the propagation equilibrium constant for the addition of the *i*th base pair, in downward direction, a value such that

Analogously, the propagation equilibrium constant for the addition of the *i+*1th base pair, in upward direction, assumes a value such that

Thus, from Eqs. 37 and 38, the propagation equilibrium constants would be, to the light of the NN approach, given by

The first summation in Eq. 36,

that is,

where

Equation 42 shows that the free energy for the duplex formation can be written in terms of the initiation or the nucleation free energy, producing two approaches completely equivalent (the two equalities in Eq. 42). We will prefer, however, the first because it permits to obtain directly the initiation free energy for the duplex formation, as it will be shown in the next section. In addition, the nucleation free energy can be calculated from the initiation free energy, as shown in Eq. 34. Then, for applying Eq. 42, we will assume that the event of nucleation can occur by approaching the strands to each other via juxtaposition between any bases *≤ k ≤ N*), with equal probability. The “nucleating” base pair, in turn, can be an A/T or C/G base pair. Thus, if the event of the formation of the first base pair can occur at any site along the double chain with the same probability, we can write the observable initiation free energy as follows:

In Eq. 43, *X*/*Y* base pairs occurring along the duplex oligomer in question. Equation 34 shows how the nucleation free energy can be calculated from the initiation free energy. Therefore, the observable nucleation free energy can be written as

The equilibrium constant

where the summation is over all the possible duplex dimers occurring along the chain, that is,

Equation 44 can be rewritten as follows:

From Eq. 47, it becomes clear that the nucleation free energy depends on the composition of the DNA double strand due to the presence of the terms

where

Inserting Eq. 49 into Eq. 48, we can obtain:

Equation 42 can be used to predict the free energy of any duplex oligomer if we know the values of all the polymeric irreducible parameters for free energy plus the free energy changes associated to the formation of the first base pair. Now, we can return to the set of 108 sequences compiled by Allawi and SantaLucia to obtain the set of eight polymeric irreducible parameters together with these two additional parameters. This will be done in the following section.

## 8. Results and discussion for the modeling based on double helix initiation parameters

The free energy for a duplex sequence of *N* bases in the NN approximation can be calculated as the pairwise sum, using the initiation free energy, as a function of the 10 parameters for free energy from Eqs. 12 and 43 as follows:

Simultaneous least-mean-square-deviation fit of this model to the 108 sequence data set compiled by Allawi and SantaLucia [2] gave the values for the free energy parameters, as listed in Table 3 [17].

Given the root-mean-square deviation per dimer, as defined in Eq. 22, the parameters for free energy in Table 3 are those that minimize *χ*. The value obtained for *χ* was 0.14 kcal/mol per dimer [17], which coincides precisely with that obtained for the 12 parameter models using values reported by SantaLucia for the free energies of the 10 duplex dimers [2, 4–6]. Thus, how it happened for the modeling by end effects, from the overall data ensemble quality, there would not be practical reason to prefer a model with a greater number of parameters. The mean values and the errors of the parameters for free energy, as listed in Table 3, were estimated by Guerra in the following way [17]: he selected 1000 sets of 70 sequences chosen randomly, and then he calculated the mean and the deviation for the parameters obtained from each set. Some immediate conclusions can be done with respect to the data contained in Table 3. First, the intrinsic errors of the free energies related to the formation of the first base pairings are only a little larger than the errors of the other irreducible parameters for free energy. Second, considering only the bar of errors, the free energy changes for the initiation of a double chain through the formation of an A/T or C/G base pair are essentially similar. Thus, if it is correct the hypothesis of that the duplex formation can be initiated by the formation of a base pair at any site along the double helix with equal probability (independently of the local composition), then, the initiation free energy is essentially independent on the local composition. Finally, once we have obtained the initiation free energy parameters, as listed in Table 3, we are ready for to estimate the nucleation free energy of any duplex oligomer, using Eq. 50. Equation 50 establishes that observable nucleation free energies depend on the mean global composition of the DNA double strand and vary within a range that goes from

for a poly *A*⋅*T* homopolymer to

for a poly *C*⋅*G* homopolymer. Observe that the difference between these values for nucleation free energies, which is ∼1.3 kcal/mol, is greater than the bar of errors estimated for nucleation free energies, which is ∼0.7 kcal/mol. On the another hand, the results obtained above, for the poly *A*⋅*T* and poly *C*⋅*G* homopolymers, are in total discordance with results obtained previously using the modeling by end effects [16], as was to be expected. In fact, heteropolymers must have one value for the nucleation free energy that must be inside such interval, and it must depend on their composition. Finally, the mean observable nucleation free energy is

Comparing the results obtained for the eight polymeric irreducible parameters for free energy, as listed in Table 3 of this section, with results obtained recently using the end effects [16], and contained in Tables 1 and 2 of the Section 6 of this review, we can conclude that the irreducible parameters are not essentially affected with the alteration in the modeling. In another words, if we substitute one modeling by another, the end effects, which, obviously, do not depend essentially on the compositions of the two terminal base pairs, are substituted by the initiation free energies, which do not depend essentially on the global composition of the chain. Therefore, dimer free energies, which depend only on the irreducible parameters for free energy, also are not essentially affected.

Free energy changes associated to the formation of the second base pair are given by the following equation:

depending if the second base pair formed is located at the *k*+1th site or at the *k−*1th site of the chain. Using Eq. 43 for

and

The values listed above are just the base pairing contributions for the dimer free energies, which were encountered experimentally by the Frank-Kamenetskii Group [18]. Yakovchuk et al. obtained for the A/T and C/G base pairings, the base pairing free energies of 0.57 kcal/mol and −0.11 kcal/mol, respectively [18]. Therefore, we have obtained values that agree reasonably well with those obtained by the Frank-Kamenetskii Group. In addition, the values for the base pairing free energies are reasonably well defined because their ranges of allowable values have only an unique common intercept.

## 9. Conclusions

A geometrical representation of four-nucleotide sets as a tetrahedron (Eq. 3 and Fig. 2) allows for the association of the three most distinctive molecular group classifications with corresponding orthogonal cubic axis. Physical properties of nucleotide sequences may be calculated with an optimal set of tensor coefficients (Eq. 4), assuming projections within this tetrahedral representation. The coefficients are expressed in hierarchical differential form, so lower levels of approximation are explicitly embodied in the description. This includes an ensemble mean expectation from scalar coefficient *S* alone and a global composition approximation, as expressed through *V*-component contributions. The symmetrical set is shown to provide a frame for the analysis of DNA duplex free energy fully compatible with experimental data. Such a symmetrical set of coefficients allows for the translation among different decomposition frames. It also gives a proper irreducible representation for dimer properties (Eqs. 8 and 12). It solves an old indeterminacy of dimer sets by establishing self-consistency relations among the dimer coefficients (Eqs. 14 and 15).

Using the modeling based on end effects, for predicting correctly physical properties of duplex oligomers, we saw that end contributions are fit with large errors to experimental data, as compared to the fits of other NN or dimer contributions. Besides, we could not distinguish between the weak and the strong terminal base pairs. However, using both the sets constituted by two- or four-ending parameters, one calculates free energies for DNA oligomers at least as well as standard models, considering a larger set of parameters do [2, 4–6].

The modeling based on the double helix initiation parameters substitutes the end effects by the initiation parameters. The free energy changes associated to the formation of the first base pair, in the duplex formation, are fit to experimental data with errors only slightly larger than those for the NN or dimer contributions. Furthermore, we obtained that the values for the first base pairing free energies are essentially similar (because the difference between them had a value smaller than the estimated bar of errors). Thus, this could indicate an invariance of the initiation free energy with respect to the composition of the chain. Nucleation free energy, however, depends on the composition, and it can be calculated from the initiation free energy by using Eq. 34. What supports this statement is the fact of that the difference between its maximal and minimal values is larger than the error bars. The model based on the double helix initiation parameters is constructed by using the simplifying hypothesis, which establishes that the nucleation can occur at any site of the chain with equal probability, independently of the local composition. An important result, which becomes such hypothesis quite reasonable, is the fact of that the base pairing contributions for the dimer free energies seem to agree well with values experimentally obtained by the Frank-Kamenetskii Group. Finally, this modeling uses a set of 10 parameters, which is constituted by the eight polymeric irreducible parameters already known plus two parameters related to two possible base pairings (the initiation free energy parameters). With this set, one calculates free energies for DNA oligomers at least as well as standard models considering a larger set of parameters do.