Decoding the Building Blocks of Life from the Perspective of Quantum Information

Physical theories often start out as theories which only embrace essential features of the macroscopic world, where their predictions depend on certain parameters that have to be either assumed or taken from experiments; as a result these parameters cannot be predicted by such theories. To understand why the parameters have the values they do, we have to go one level deeper—typically to smaller scales where the easiest processes to study are the ones at the lowest level. When the deeper level reduces the number of unknown parameters, we consider the theory to be complete and satisfactory. The level below conventional molecular biology is spanned by atomic and molecular structure and by quantum dynamics. However, it is also true that at the lowest level it becomes very difficult to grasp all the features of the molecular processes that occur in living systems such that the complexity of the numerous parameters that are involved make the endeavour a very intricate one. Information theory provides a powerful framework for extracting essential features of complicated processes of life, and then analyzing them in a systematic manner. In connection to the latter, quantum information biology is a new field of scientific inquiry in which information-theoretical tools and concepts are permitting to get insight into some of the most basic and yet unsolved questions of molecular biology.


Introduction
Physical theories often start out as theories which only embrace essential features of the macroscopic world, where their predictions depend on certain parameters that have to be either assumed or taken from experiments; as a result these parameters cannot be predicted by such theories. To understand why the parameters have the values they do, we have to go one level deeper-typically to smaller scales where the easiest processes to study are the ones at the lowest level. When the deeper level reduces the number of unknown parameters, we consider the theory to be complete and satisfactory. The level below conventional molecular biology is spanned by atomic and molecular structure and by quantum dynamics. However, it is also true that at the lowest level it becomes very difficult to grasp all the features of the molecular processes that occur in living systems such that the complexity of the numerous parameters that are involved make the endeavour a very intricate one. Information theory provides a powerful framework for extracting essential features of complicated processes of life, and then analyzing them in a systematic manner. In connection to the latter, quantum information biology is a new field of scientific inquiry in which information-theoretical tools and concepts are permitting to get insight into some of the most basic and yet unsolved questions of molecular biology.
Chirality is often glossed over in theoretical or experimental discussions concerning the origin of life, but the ubiquity of homochiral building blocks in known biological systems demands explanation. Information theory can provide a quantitative framework for understanding the role of chirality in biology. So far it has been thought that the genetic code is "unknowable" tively assess the complexity of the system. However, recent proposals have formulated this quantity as a product of two factors, taking into account order/disequilibrium and delocalization/uncertainty. This is the case of the definition of López-Mancini-Calbet (LMC) shape complexity [9][10][11][12] that, like others, satisfies the boundary conditions by reaching its minimal value in the extreme ordered and disordered limits. The LMC complexity measure has been criticized (Anteonodo & Plastino, 1996), modified (Catalán et al., 2002;Martin et al., 2003) and generalized (López-Ruiz, 2005) leading to a useful estimator which satisfies several desirable properties of invariance under scaling transfromations, translation, and replication (Yamano, 2004;Yamano, 1995). The utility of this improved complexity has been verified in many fields [8] and allows reliable detection of periodic, quasiperiodic, linear stochastic, and chaotic dynamics (Yamano, 2004;López-Ruiz et al., 1995;Yamano, 1995). The LMC measure is constructed as the product of two important information-theoretic quantities (see below): the so-called disequilibrium D (also known as self-similarity (Carbó-Dorca et al., 1980) or information energy Onicescu, 1996), which quantifies the departure of the probability density from uniformity (Catalán et al., 2002;Martinet al., 2003) (equiprobability) and the Shannon entropy S, which is a general measure of randomness/uncertainty of the probability density (Shannon & Weaver, 1948), and quantifies the departure of the probability density from localizability. Both global quantities are closely related to the measure of spread of a probability distribution.
The Fisher-Shannon product FS has been employed as a measure of atomic correlation (Romera & Dehesa, 2004) and also defined as a statistical complexity measure (Angulo et al., 2008a;Sen et al., 2007a). The product of the power entropy J -explicitly defined in terms of the Shannon entropy (see below)-and the Fisher information measure, I, combine both the global character (depending on the distribution as a whole) and the local one (in terms of the gradient of the distribution), to preserve the general complexity properties. As compared to the LMC complexity, aside of the explicit dependence on the Shannon entropy which serves to measure the uncertainty (localizability) of the distribution, the Fisher-Shannon complexity replaces the disequilibrium global factor D by the Fisher local one to quantify the departure of the probability density from disorder (Fisher, 1925;Frieden, 2004) of a given system through the gradient of the distribution. The Fisher information I itself plays a fundamental role in different physical problems, such as the derivation of the non-relativistic quantum-mechanical equations by means of the minimum I principle (Fisher, 1925;Frieden, 2004), as well as the time-independent Kohn-Sham equations and the time-dependent Euler equation (Nagy, 2003;Nalewajski, 2003). More recently, the Fisher information has been employed also as an intrinsic accuracy measure for specific atomic models and densities (Nagy & Sen, 2006;Sen et al., 2007b)), as well as for general quantum-mechanical central potentials (Romera et al. 2006;Dehesa et al., 2007). The concept of phase-space Fisher information has been analyzed for hydrogenlike atoms and the isotropic harmonic oscillator (Hornyak & Nagy, 2007), where both position and momentum variables are included. Several applications concern atomic distributions in position and momentum spaces have been performed where the FS complexity is shown to provide relevant information on atomic shell structure and ionization processes (Angulo et  In line with the aforementioned developments we have undertaken multidisciplinary research projects so as to employ IT at different levels, classical (Shannon, Fisher, complexity, etc) and quantum (von Neumann and other entanglement measures) on a variety of chemical processes, organic and nanostructured molecules. Recently, significant advances in chemistry have been achieved by use of Shannon entropies through the localized/delocalized features of the electron distributions allowing a phenomenological description of the course of elementary chemical reactions by revealing important chemical regions that are not present in the energy profile such as the ones in which bond forming and bond breaking occur ). Further, the synchronous reaction mechanism of a S N 2 type chemical reaction and the nonsynchronous mechanistic behavior of the simplest hydrogenic abstraction reaction were predicted by use of Shannon entropies analysis (Esquivel et al., 2010a). In addition, a recent study on the three-center insertion reaction of silylene has shown that the informationtheoretical measures provide evidence to support the concept of a continuum of transient of Zewail and Polanyi for the transition state rather than a single state, which is also in agreement with other analyses (Esquivel et al., 2010b). While the Shannon entropy has remained the major tool in IT, there have been numerous applications of Fisher information through the "narrowness/disorder" features of electron densities in conjugated spaces. Thus, in chemical reactions the Fisher measure has been employed to analyze its local features (Esquivel et al., 2010c) and also to study the steric effect of the conformational barrier of ethane (Esquivel et al., 2011a). Complexity of the physical, chemical and biological systems is a topic of great contemporary interest. The quantification of complexity of real systems is a formidable task, although various single and composite information-theoretic measures have been proposed. For instance, Shannon entropy (S) and the Fisher information measure (I) of the probability distributions are becoming increasingly important tools of scientific analysis in a variety of disciplines. Overall, these studies suggest that both S and I can be used as complementary tools to describe the information behavior, pattern, or complexity of physical and chemical systems and the electronic processes involving them. Besides, the disequilibrium (D), defined as the expectation value of the probability density is yet another complementary tool to study complexity since it measures its departure from equiprobability. Thus, measuring the complexity of atoms and molecules represents an interesting area of contemporary research which has roots in information theory (Angulo et al., 2010d). In particular, complexity measures defined as products of S and D or S and I have proven useful to analyze complexity features such as order, uncertainty and pattern of molecular systems (Esquivel et al., 2010f) and chemical processes (Esquivel et al., 2011b). On the other hand, the most interesting technological implications of quantum mechanics are based on the notion of entanglement, which is the essential ingredient for the technological implementations that are foreseen in the XXI century. Up to now it remains an open question whether entanglement can be realized with molecules or not and hence it is evident that the new quantum techniques enter the sphere of chemical interest. Generally speaking, entanglement shows up in cases where a former unit dissociates into simpler sub-systems, the corresponding processes are known quite well in chemistry. Although information entropies have been employed in quantum chemistry, applications of entanglement measures in chemical systems are very scarce. Recently, von Neumann measures in Hilbert space have been proposed and applied to small chemical systems (Carrera et al.

Advances in Quantum Mechanics 644
2010, Flores-Gallegos and Esquivel, 2008), showing than entanglement can be realized in molecules. For nanostructures, we have been able to show that IT measures can be successfully employed to analyse the growing behaviour of PAMAM dendrimers supporting the densecore model against the hollow-core one (Esquivel et al., 2009b(Esquivel et al., , 2010g, 2011c.
In the Chapter we will present arguments based on the information content of L-and Daminoacids to explain the biological preference toward homochirality. Besides, we present benchmark results for the information content of codons and aminoacids based on information-theoretical measures and statistical complexity factors which allow to elucidate the coding links between these building blocks and their selectivity.

Information-theoretical measures and complexities
In the independent-particle approximation, the total density distribution in a molecule is a sum of contribution from the electrons in each of the occupied orbitals. This is the case in both r-space and p-space, position and momentum respectively. In momentum space, the total electron density, ( p ) , is obtained through the molecular momentals (momentum-space orbitals) ϕ i (p) , and similarly for the position-space density, ρ ( r ) , through the molecular position-space orbitals ϕ i (r) . The momentals can be obtained by three-dimensional Fourier transformation of the corresponding orbitals (and conversely) Standard procedures for the Fourier transformation of position space orbitals generated by abinitio methods have been described (Rawlings & Davidson, 1985). The orbitals employed in ab-initio methods are linear combinations of atomic basis functions and since analytic expressions are known for the Fourier transforms of such basis functions (Kaijser & Smith, 1997), the transformation of the total molecular electronic wavefunction from position to momentum space is computationally straightforward (Kohout, 2007).
As we mentioned in the introduction, the LMC complexity is defined through the product of two relevant information-theoretic measures. So that, for a given probability density in position space, ρ ( r ) , the C(LMC) complexity is given by ( which depends on the Shannon entropy defined above. So that, the FS complexity in position space is given by and similarly ( ) in momentum space.

Advances in Quantum Mechanics
Let us remark that the factors in the power Shannon entropy J are chosen to preserve the invariance under scaling transformations, as well as the rigorous relationship (Dembo et al., 1991).
with n being the space dimensionality, thus providing a universal lower bound to FS complexity. The definition in Eq. (8) corresponds to the particular case n=3, the exponent containing a factor 2/n for arbitrary dimensionality.
It is worthwhile noting that the aforementioned inequalities remain valid for distributions normalized to unity, which is the choice that it is employed throughout this work for the 3dimensional molecular case.
Aside of the analysis of the position and momentum information measures, we have considered it useful to study these magnitudes in the product rp-space, characterized by the probability density f ( r, p ) = ρ ( r ) γ ( p ) , where the complexity measures are defined as and From the above two equations, it is clear that the features and patterns of both LMC and FS complexity measures in the product space will be determined by those of each conjugated space. However, the numerical analyses carried out in the next section, reveal that the the momentum space contribution plays a more relevant role as compared to the one in position space.
We have also evaluated some reactivity parameters that may be useful to analyze the chemical reactivity of the aminoacids. So that, we have computed several reactivity properties such as the ionization potential (IP), the hardness (η) and the electrophilicity index (ω). These properties were obtained at the Hartree-Fock level of theory (HF) in order to employ the Koopmans' theorem (Koopmans, 1933;Janak, 1978), for relating the first vertical ionization energy and the electron affinity to the HOMO and LUMO energies, which are necessary to calculate the conceptual DFT properties. Parr and Pearson, proposed a quantitative definition of hardness (η) within conceptual DFT (Parr & Yang, 1989): where ε denotes the frontier molecular orbital energies and S stands for the softness of the system. It is worth mentioning that the factor 1/2 in Eq. (14) was put originally to make the hardness definition symmetrical with respect to the chemical potential ( (Chattaraj et al., 2006). The chemical hardness η is a central quantity for use in the study of reactivity through the hard and soft acids and bases principle Pearson, 1973;Pearson, 1997).
The electrophilicity index (Parr et al., 1999), ω, allows a quantitative classification of the global electrophilic nature of a molecule within a relative scale. Electrophilicity index of a system in terms of its chemical potential and hardness is given by the expression The electrophilicity is also a good descriptor of chemical reactivity, which quantifies the global electrophilic power of the molecules -predisposition to acquire an additional electronic charge- (Parr & Yang, 1989).

Aminoacids
The exact origin of homochirality is one of the great unanswered questions in evolutionary science; such that, the homochirality in molecules has remained as a mystery for many years ago, since Pasteur. On the other hand, the natural amino acids contain one or more asymmetric carbon atoms, except the glycine. Therefore, the molecules are two nonsuperposable mirror images of each other; i.e., representing right-handed (D enantiomer) and left-handed (L enantiomer) structures. It is considered that the equal amounts of D-and L-amino acids existed on primal earth before the emergence of life. Although the chemical and physical properties of L-and D amino acids are extremely similar except for their optical character, the reason of the exclusion of Damino acids and why all living organisms are now composed predominantly of L-amino acids are not well-known: however, the homochirality is essential for the development and maintenance of life (Breslow, 2011;Fujii et al., 2010;Tamura, 2008). The essential property of αaminoacids is to form linear polymers capable of folding into 3-dimensional structures, which form catalytic active sites that are essential for life. In the procees, aminoacids behave as hetero bifunctional molecules, forming polymers via head to tail linkage. In contrast, industrial nylons are often prepared from pairs of homo-bifunctional molecules (such as diamines and dicarboxylic acids), the use of a single molecule containing both linkable functionalities is somewhat simpler (Cleaves, 2010;Weber and Miller, 1981;Hicks, 2002).
The concept of chirality in chemistry is of paramount interest because living systems are formed of chiral molecules of biochemistry is chiral (Proteins, DNA, amino acids, sugars and many natural products such as steroids, hormones, and pheromones possess chirality). Indeed, amino acids are largely found to be homochiral (Stryer, 1995) in the L form. On the other hand, most biological receptors and membranes are chiral, many drugs, herbicides, pesticides and other biological agents must themselves possess chirality. Synthetic processes ordinarily produce a 50:50 (racemic) mixture of left-handed and right-handed molecules (so-called enantiomers), and often the two enantiomers behave differently in a biological system.
On the other hand, a major topic of research has been to study the origin of homochirality. In this respect, biomembranes have played an important role for the homochiraility of biopolymers. One of the most intriguing problems in life sciences is the mechanism of symmetry breaking. Many theories have been proposed on these topics and in the attempt to explain the amplification of a first enantiomeric imbalance to the enantiopurity of biomolecules (Bombelli et al., 2004). In all theories on symmetry breaking and on enantiomeric excess amplification little attention has been paid to the possible role of biomembranes, or of simple self-aggregated systems that may have acted as primitive biomembranes. Nevertheless, it is possible that amphiphilic boundary systems, which are considered by many scientists as intimately

Advances in Quantum Mechanics
In order to perform a theoretical-information analysis of L-and D-aminoacids we have employed the corresponfing L-enantiomers reported in the Protein Data Bank (PDB), which provide a standard representation for macromolecular structure data derived from X-ray diffraction and NMR studies. In a second stage, the D-type enantiomers were obtained from the L-aminoacids by interchanging the corresponding functional groups (carboxyl and amino) of the α-carbon so as to represent the D-configuration of the chiral center, provided that steric impediments are taken into account. The latter is achieved by employing the Ramachandran (Ramachandran et al, 1963) map, which represent the phi-psi torsion angles for all residues in the aminoacid structure to avoid the steric hindrance. Hence, the backbone of all of the studied aminoacids represent possible biological structures within the allowed regions of the Ramachandran. In the third stage, an electronic structure optimization of the geometry was performed on all the enantiomers for the twenty essential aminoacids so as to obtain structures of minimum energy which preserve the backbone (see above). In the last stage, all of the information-theoretic measures were calculated by use of a suite of programs which have been discussed elsewhere (Esquivel et al., 2012).
In Figures 2 through 4 we have depicted some selected information-theoretical measures and complexities in position space versus the number of electrons and the energy. For instance, it might be observed from Fig. 2 that the Shannon entropy increases with the number of electrons so that interesting properties can be observed, e.g., the aromatic ones possess more delocalized densities as the rest of the aminoacids (see Figure 1B) which confer specific chemical properties. On the other hand, the disequilibrium diminishes as the number of electron increases (see Fig.  2), which can be related to the chemical stability of the aminoacids, e.g., cysteine and metionine show the larger values (see Fig. 2) which is in agreement with the biological evidence in that both molecules play mutiple functions in proteins, chemical as well as structural, conferring the higher reactivity that is recognized to both molecules. In contrast, aromatic aminoacids (see Fig 1B) are the least reactive, which is in agreement with the lower disequilibrium values that are observed form      Figures 4, to note that LMC complexity characterizes two different groups of aminoacids where the most reactive (cys and met) possess the largest values, which incidentally hold the largest energies (negatively). A different behavior is observed for the FS complexity in that the smaller values correspond to the less energetic aminoacids. It is worthy to mention that the FS complexity is related to the Fisher information measure (Eq. 7) which depends on the local behavior of the position space density, i..e., simpler molecules present more ordered chemical structures, and hence these kind of aminoacids are expected to be less complex, e.g., the small and the tiny ones (Ser, Ala, Thr).

Codons
The genetic code refers to a nearly universal assignment of codons of nucleotides to amino acids. The codon to amino acid assignment is realized through: (i) the code adaptor molecules of transfer RNAs (tRNAs) with a codon's complementary replica (anticodon) and the corresponding amino acid attached to the 3' end, and (ii) aminoacyl tRNA synthetases (aaRSs), the enzymes that actually recognize and connect proper amino acid and tRNAs. The origin of the genetic code is an inherently difficult problem (Crick, 1976). Taking into a count that the events determining the genetic code took place long time ago, and due to the relative compactness of the present genetic code. The degeneracy of the genetic code implies that one or more similar tRNA can recognize the same codon on a messenger mRNA. The number of amino acids and codons is fixed to 20 amino acids and 64 codons (4 nucleotides, A.C.U.G per three of each codon) but the number of tRNA genes varies widely 29 to 126 even between closely related organisms. The frequency of synonymous codon use differs between organisms, within genomes, and along genes, a phenomenon known as CUB (codon usage bias) (Thiele et al., 2011).
Sequences of bases in the coding strand of DNA or in messenger RNA possess coded instructions for building protein chains out of amino acids. There are 20 amino acids used in making proteins, but only four different bases to be used to code for them. Obviously one base can't code for one amino acid. That would leave 16 amino acids with no codes. By taking two bases to code for each amino acid, that would still only give you 16 possible codes (TT, TC, TA, TG, CT, CC, CA and so on) -that is, still not enough. However, by taking three bases per amino acid, that gives you 64 codes (TTT, TTC, TTA, TTG, TCT, TCC and so on). That's enough to code for everything with lots to spare. You will find a full table of these below. A three base sequence in DNA or RNA is known as a codon. The codes in the coding strand of DNA and in messenger RNA aren't, of course, identical, because in RNA the base uracil (U) is used instead of thymine (T). Table 1 shows how the various combinations of three bases in the coding strand of DNA are used to code for individual amino acids -shown by their three letter abbreviation. The table is arranged in such a way that it is easy to find any particular combination you want. It is fairly obvious how it works and, in any case, it doesn't take very long just to scan through the table to find what you want. The colours are to stress the fact that most of the amino acids have more than one code. Look, for example, at leucine in the first column. There are six different codons all of which will eventually produce a leucine (Leu) in the protein chain. There are also six for serine (Ser). In fact there are only two amino acids which have only one sequence of bases to code for themmethionine (Met) and tryptophan (Trp). Note that three codons don't have an amino acid but "stop" instead. For obvious reasons these are known as stop codons. The stop codons in the RNA table (UAA, UAG and UGA) serve as a signal that the end of the chain has been reached during protein synthesis. The codon that marks the start of a protein chain is AUG, that's the amino acid, methionine (Met). That ought to mean that every protein chain must start with methionine.

Physical and information-theoretical properties
An important goal of the present study is to characterize the biological units which codify aminoacids by means of information-theoretical properties. To accomplished the latter we have depicted in Figures Figure 2) lie between 4.4 to 5.6, whereas the corresponding values for the codons (see Figure 9) lie between 6.66 to 6.82, therefore this information measure serves to characterize all these bilogical molecules, providing in this way the first benchmark informational results for the building blocks of life. Further, it is interesting to note from Figures 9 and 10 that entropy increases with the number of electrons ( Fig. 9) whereas the opposite behavior is observed for the Disequilibrium measure. Besides, we may note from these Figures an interesting codification pattern within each isolelectronic group of codons where one may note that an exchange of one nucleotide seems to occur, e.g., as the entropy increases in the 440 electron group the following sequence is found: UUU to (UUC, UCU, CUU) to (UCC, CUC, CCU) to CCC. Similar observations can be obtained from Figures 10 and 11 for D and I, respectively. In particular, Fisher information deserves special analysis, see Figure 11, from which one may observe a more intricated behavior in which all codons seem to be linked across the plot, i.e., note that for each isoelectronic group codonds exchange only one nucleotide, e.g., in the 440 group codons change from UUU to (UUC, UCU, CUU) to (UCC, CUC, CCU) to CCC as the Fisher measure decreaes. Besides, as the Fisher measure and the number of electrons increase linearly a similar exchange is observed, eg., from AAA to (AAG, AGA, GAA) to (AGG, GAG, GGA) to GGG. We believe that the above observations deserve further studies since a codification pattern seems to be apparent.
In Figures 12 and 13 we have depicted the LMC and FS complexities, respectively, where we can note that as the number of electron increases the LMC complexity decreases and the opposite is observed for the FS complexity. It is worth mentioning that similar codification patternsm, as the ones above discussed, are observed for both complexities. Furthermore, we have found interesting to show similar plots in Figures 14 and 15 where the behavior of both complexities is shown with respect to the total energy. It is observed that as the energy increases (negatively) the LMC complexity decreases whereas the FS complexity increases. Note that similar codification patterns are observed in Figure 15 for the FS complexity.

Concluding remarks
We have shown throughout this Chapter that information-theoretical description of the fundamental biological pieces of the genetic code: aminoacids and codons, can be analysed in a simple fashion by employing Information Theory concepts such as local and global information measures and statistical complexity concepts. In particular, we have provided for the first time in the literature with benchmark information-theoretical values for the 20 essential aminacids and the 64 codons for the nucleotide triplets. Throughout these studies, we believe that information science may conform a new scientific language to explain essential aspects of biological phenomena. These new aspects are not accessible through any other standard methodology in quantum chemistry, allowing to reveal intrincated mechanisms in which chemical phenomena occur. This envisions a new area of research that looks very promising as a standalone and robust science. The purpose of this research is to provide fertile soil to build this nascent scientific area of chemical and biological inquiry through informationtheoretical concepts towards the science of the so called Quantum Information Biology.
Advances in Quantum Mechanics 664