The COVID-19 DNA-RNA Genetic Code Analysis Using Double Stochastic and Block Circulant Jacket Matrix

Sung Kook Lee; Moon Ho Lee

doi:10.5772/intechopen.102342

Abstract

We present a COVID-19 DNA-RNA genetic code where A=T=U=31% and C=G=19%, which has been developed from a base matrix CUAG where C, U, A, and G are RNA bases while C, U, A, and T are DNA bases that E. Chargaff found them complementary like A=T=U=30%, and C=G=20% from his experimental results, which implied the structure of DNA double helix and its complementary combination. Unfortunately, they have not been solved mathematically yet. Therefore, in this paper, we present a simple solution by the information theory of a doubly stochastic matrix over the Shannon symmetric channel as well as prove it mathematically. Furthermore, we show that DNA-RNA genetic code is one kind of block circulant Jacket matrix. Moreover, general patterns by block circulant, upper-lower, and left-right scheme are presented, which are applied to the correct communication as well as means the healthy condition because it perfectly consists of 4 bases. Henceforth, we also provide abnormal patterns by block circulant, upper-lower, and left-right scheme, which cover the distorted signal as well as COVID-19.

Keywords

COVID-19 DNA-RNA
E. Chargaff
DNA-RNA genetic code
double stochastic matrix
symmetric channel
block circulant jacket matrix
general pattern
abnormal pattern

Author Information

Show +

Sung Kook Lee
- Jeju International School, Republic of Korea
Moon Ho Lee*
- Department of Electronics, Jeonbuk National University, Republic of Korea

*Address all correspondence to: moonho@jbnu.ac.kr

1. Introduction

In 1950, Chargaff’s two rules [1] were presented. One is that the percentage of adenine is identical to that of thymine as well as the percentage of guanine is identical to that of cytosine, which gives a hint of the composition of the base pair for the double-strand DNA molecule. The other is that base complementarity is effective for each DNA strand, which gives an explanation for the overall characteristics of fundamental bases. To make an example of COVID-19 DNA, its four bases are satisfied with these two rules analogous to A=T=31% and C=G=19%. In 1953, it was discovered that DNA has a double helix structure [2, 3], which results in an optimal and economical genetic code [4].

A RNA base matrix CUAG was based on stochastic matrices [5], which results in the genetic code [6, 7]. A symmetric capacity is calculated by applying the Markov process to these doubly stochastic matrices, which suggested the symmetry between Shannon [8] and RNA stochastic transition matrix CUAG, which is defined as below. A square matrix of P=pij is stochastic, whose entries are positive as well as its sum in rows and columns is equal to one or constant. In other words, if the sum of all its elements in rows and columns is equal to one or invariable, it is double stochastic, which is able to describe the time-invariant binary symmetric channel. For the input xn and the output xn+1, two states e0 and e1 are able to depict Markov processes on an individual basis, which are indicated by two binary symbols “0” and “1”, accordingly. The output signal is affected by the input signal whose information is fed into given a certain error probability. Assume that these channel probabilities α and β are less than a half, whose error probabilities have been kept steady over a time-variant channel for a wide variety of transmitted symbols such as

Pxn+1=1xn=0=p01=α,Pxn+1=0xn=1=p10=β.E1

In addition, its Markov chain is homogeneous. P represents a 2 × 2 homogeneous probability transition matrix defined as

P=p00p01p10p11=1−ααβ1−β=1−ppp1−pp=0.5=121111,E2

whose two error probabilities are identical similarly to α=β=p over a binary symmetric channel. This paper proceeds as below. First of all, we derive the RNA stochastic entropy by applying it to the Shannon entropy in Section 2. Next, we make an estimate of the variance of RNA in Section 3. Then, the binary symmetric channel entropy is derived in Section 4. Henceforth, two user capacity is made an estimate of over symmetric interference channel in Section 5. Afterward, the construction scheme is proposed, which is enabled to create RNA genetic codes in Section 6. Later, a symmetric genetic Jacket block matrix is examined in Section 7. Hereupon, general patterns of block circulant symmetric genetic Jacket matrices are looked into in Section 8. In the end, this paper comes to a conclusion in Section 9.

Table 1 makes the description of the ratio of bases for several organisms [1, 9, 10, 11], which shows that the ratios are constant among the species.

Organism	Taxon	%A	%G	%C	%T	A / T	G / C	%GC	%AT
Maize	Zea	26.8	22.8	23.2	27.2	0.99	0.98	46.1	54.0
Octopus	Octopus	33.2	17.6	17.6	31.6	1.05	1.00	35.2	64.8
Chicken	Gallus	28.0	22.0	21.6	28.4	0.99	1.02	43.7	56.4
Rat	Rattus	28.6	21.4	20.5	28.4	1.01	1.00	42.9	57.0
Human	Homo	29.3	20.7	20.0	30.0	0.98	1.04	40.7	59.3
Grasshopper	Orthoptera	29.3	20.5	20.7	29.3	1.00	0.99	41.2	58.6
Sea urchin	Echinoidea	32.8	17.7	17.3	32.1	1.02	1.02	35.0	64.9
Wheat	Triticum	27.3	22.7	22.8	27.1	1.01	1.00	45.5	54.4
Yeast	Saccharomyces	31.3	18.7	17.1	32.9	0.95	1.09	35.8	64.4
E. coli	Escherichia	24.7	26.0	25.7	23.6	1.05	1.01	51.7	48.3
φX174	PhiX174	24.0	23.3	21.5	31.2	0.77	1.08	44.8	55.2
Covid-19	SARS-CoV-2	29.9	19.6	18.4	32.1	0.93	1.07	38.0	62.0

Table 1.

Ratio of bases [1, 9, 10, 11].

2. Analytical approach to RNA stochastic entropy

In [1, 5, 12, 13], stochastic complementary RNA bases are given for the genetic code. On the assumption that C=G=19%, A=T=U=31%, P denotes the transition channel matrix expressed by

P=CUAG=0.190.310.310.19.E3

On the condition that the RNA base matrix CUAG for the Markov process described by two independent probabilities of its corresponding source varies from 0.19p to 0.31p, the transition channel matrix P is defined by

P=0.19p1−0.19p1−0.19p0.19p=0.51−0.51−0.50.5=0.50.50.50.5.E4

By comparison with Eq. (12), we have.

0.19p=1−0.19pE5

where p is 2.631.

Applying in a similar fashion to the rest of (4),

P=0.31p1−0.31p1−0.31p0.31p=0.5001−0.5001−0.5000.500=0.50.50.50.5,E6

where 0.31p = 1-0.31p, where p is 1.613.

In order to make a double stochastic matrix by adding (6) to (4),

2P=0.50.50.50.5+0.50.50.50.5=1111.E7

Applying in a similar way to (3),

2P=2CUAG=20.190.310.310.19=0.380.620.620.38.E8

If P is a random variable for source probability p corresponding to the first symbol event, we reach the entropy function [8] represented by

H2P=plog21p+1−plog211−p.E9

The last column of Table 2 shows the result of Eq. (9). Figure 1 portrays the curve of Shannon and RNA Entropy. Make a mental note to make sure that a vertical tangent can be drawn when p = 0 and p = 1 on account of the fact that

P	-log₂p	- plog₂p	H₂(p)
0.3800	1.3959	0.5305	0.9580
0.3900	1.3585	0.5298	0.9648
0.4000	1.3219	0.5288	0.9710
0.4100	1.2863	0.5274	0.9765
0.4200	1.2515	0.5256	0.9815
0.4300	1.2176	0.5236	0.9858
0.4400	1.1844	0.5211	0.9896
0.4500	1.1520	0.5184	0.9928
0.4600	1.1203	0.5153	0.9954
0.4700	1.0893	0.5120	0.9974
0.4800	1.0589	0.5083	0.9988
0.4900	1.0291	0.5043	0.9997
0.5000	1.0000	0.5000	1.0000
0.5100	0.9714	0.4954	0.9997
0.5200	0.9434	0.4906	0.9988
0.5300	0.9159	0.4854	0.9974
0.5400	0.8890	0.4800	0.9954
0.5500	0.8625	0.4744	0.9928
0.5600	0.8365	0.4684	0.9896
0.5700	0.8110	0.4623	0.9858
0.5800	0.7859	0.4558	0.9815
0.5900	0.7612	0.4491	0.9765
0.6000	0.7370	0.4422	0.9710
0.6100	0.7131	0.4350	0.9648
0.6200	0.6897	0.4276	0.9580

Table 2.

Shannon entropy for probability p.

Figure 1.
Comparison between Shannon and RNA entropy for probability p.

ddpplog21p+1−plog211−p=log21p−1−log211−p+1log2e=log21p−log211−p=0,E10

which is maximized when p reaches a half because its derivative becomes 0.

Therefore,

log21p−log211−p=0⇒1p−11−p=0.E11

Then, we reach

p=1−p⇒p=12.E12

For the RNA base matrix CUAG, its symmetric entropy is calculated as

H2PRNA=plog21p+1−plog211−p=0.9790,E13

when p is either 0.38 or 0.62. By the way, the Shannon entropy is calculated as

H2PShannon=plog21p+1−plog211−p=1,E14

when p reaches a half.

Table 2 shows Shannon Entropy for probability p over a binary symmetric channel.

Figure 1 gives a comparison between Shannon and RNA Entropy for probability p under the RNA base matrix CUAG.

3. Derivation of variance for the RNA base matrix CUAG

The variance for RNA random variable X is denoted by VX is the square of the mean, which is expressed by

EX=a=0.5.E15

Therefore, for a random variable X, the variance is obtained such as

VX=EX−a2=EX2−2aEX+Ea2=EX2−2a2+a2=EX2−a2=σ2.E16

Case I. Upper source probability 0.62

σupper2=0.622−0.52=0.13.E17

Case II. Lower source probability 0.38

σlower2=0.52−0.382=0.10.E18

If X1 and X2are the independent random variables, on an individual basis, its expectation and variance are

EX1=a1,VX1=σ12.E19

EX2=a2,VX2=σ22.E20

Therefore, we reach

EX1−a1X2−a2=EX1−a1EX2−a2=0.E21

Assuming that X1 and X2 are independent random variables, the sum of its variances is calculated as

VX1+X2=EX1+X2−a1−a22=EX1−a12+2EX1−a1X2−a2+EX2−a22=VX1+VX2=σ12+σ22=0.13+0.10=0.23,E22

which is approximately 23% corresponding to the difference between A = U and C = G. It means that RNA entropy cannot reach the Shannon entropy because the probabilities of its bases are 23% away from a half that is exactly identical to the sum of its variances.

4. RNA complement base matrix CUAG for symmetric noise immune-free channel

If over a noise immune-free binary symmetric channel the bases of RNA genetic code CUAG are complementary such as C=U and A=G, the conditional probability Pbjai=Pi,j makes description of this channel, whose maximum amount of information can be transmitted as depicted in Figure 2. On the assumption that C and G are one’s complement of its corresponding error probability as well as A and U are interference signals, the matrix [8] for this channel is made description of by

Figure 2.
Complementary bases of RNA genetic code CUAG over noise immune-free binary symmetric channel.

pX1×2P2×2=α1−αCUAG=pY1×2=pY1pY2.E23

Under the condition that p and 1-p are the selection probability α=0 and α=1 over the uniform channel on an individual basis, the mutual information is defined by

IXY=HY−HYX.E24

From Eq. (23), we are confronted with

α1−α−Clog2C−Ulog2U−Alog2A−Glog2G=α1−α−Ulog2U−Clog2CGlog2G−Alog2A,E25

where

HYX=−αClog2C−αAlog2A−1−αUlog2U−1−αGlog2G=−Ulog2U−Glog2G=−Clog2C−Alog2A=0.9790,E26

where A = U = 0.31 and C = G = 0.19.

Therefore, its capacity is derived as

CRNA=maxIXYp=0.38or0.62=HY−HYX=1−0.9790=0.021,E27

i.e. HY=−plog2p−1−plog21−p=−0.38log20.38−0.62log20.62=1.

while Shannon capacity is derived as

CShannon=maxIXYp=0.5=HY−HYX=1−1=0.E28

In Figure 3, we compare Shannon and RNA capacity for probability p. As fore-mentioned in Section 3, if only if under the ideal circumstance, Shannon capacity can be reached. In other words, the difference between Shannon and RNA capacity exists, which is identical to the sum of variances of RNA base random variables because they are unable to become a half over a symmetric channel.

Figure 3.
Shannon and RNA capacity vary with probability p.

5. Two user capacity over symmetric interference channel

Figure 4 makes the description of the environment of the binary symmetric channel with the RNA base matrix CUAG as well as that of the symmetric interference channel for two users where two independent messages W₁ and W₂ with the common message set W_i are transmitted. Assume that C = G = 19% and A = U = 31% where C = H ¹¹ is the direct signal and its corresponding interference signal is U = H ¹² for Y₁. Analogously, the direct signal for the second user Y₂ is G = H²² and its corresponding interference signal is A = H²¹.

Figure 4.
Two-user symmetric Interference Channel. (a) Strong Interference Channel. (b) Weak Interference Channel.

H11=H12=hdPSNR,H12=H21=hcPSNR.

The relationship between the input and output for two user symmetric channel is described as follows [14],

Y1=hdPSNRX1+hcPSNRαX2+Z1,E29

Y2=hcPSNRαX1+hdPSNRX2+Z2,E30

where the powers of input symbols X₁, X_2, and additive white Gaussian noise (AWGN) terms Z₁ and Z₂ are normalized to unity. Analogous to the definition of the degree of freedom (DoF), the total GDoF metric d(α) is defined as

dα=limPSNR→∞CPSNRαlogPSNR,E31

where C (P_SNR, α) is the sum-capacity parameterized by P_SNR and α. Here α is the ratio (on the decibel scale) of cross channel strength compared to straight channel strength and P_SNR indicates the ratio (on the decibel scale) of signal to the noise. Importantly, in order to find the achievable DoF, take the limit of Eq. (31) by letting P_SNR go to infinity. Make a mental note of the DoF metric resembling to that at the point α =1. Thus, the GDoF curve gives a significant hint for optimal interference management strategies, which has been made use of most successfully to estimate the capacity of two-user interference channel to contain a constant gap in [14]. To take an example, for RNA genetic code, assuming that its bases C = G = 19% and A = T = U = 31%, this symmetric interference channel for two users can be analyzed in strong and weak interference region as below. The noise immune channel is described as below where X₁ and X₂ denote the input symbols while Y₁ and Y₂ denote the output symbols

Y1=CX1+UX2,E32

Y2=GX1+AX2.E33

Case 1. Strong Interference region.

Figure 4 (a) makes the description of the channel in a strong interference regime, where its receivers have to try to decode the interfering signal in order to recover its desired signal. The general condition for a strong interference signal is represented by,

C<A,U>G.E34

Regretfully, it is still challenging to propose the scheme achieving a symmetric rate as well as being upper-bounded unlike in the weak interference region.

Case 2. Weak Interference region.

Figure 4 (b) makes the description of the channel in a very weak interference regime, where its receivers do not need to try to decode any portion of the interference signal by regarding it as noise. This scheme is enabled to achieve a symmetric rate per user as below [14],

R=min12log1+INR+SNR+12log2+SNRINR−1log1+INR+SNRINR−1.E35

The upper bound on the symmetric capacity is,

CSym≤min12log1+SNR+12log1+SNR1+INRlog1+INR+SNR1+INR.E36

Letting A = T = U = 31%, C = G = 19%, i.e. INR = 31 and SNR = 19, we are confronted with the symmetric achievable rate such as

R=min12log21+31+19+12log22+1931−1log21+31+1931−1=min2.83+0.69−15.02−1=min2.534.02=2.52.E37

Analogously, the symmetric capacity is made the description of by

Csym≤min12log21+19+12log21+1931log21+31+1931≤min2.16+0.345.02≤min2.505.02=2.50.E38

Following the above steps, in a weak interference regime, by treating interference as noise, the symmetric capacity is close to its achievable capacity such as

Csym=R.E39

Figure 5 makes the description of the weak and strong interference region where the leftmost indicates a very weak interference region while the rightmost suggests a very strong interference region.

Figure 5.
Generalized degree of freedom for Gaussian Channel (W curve).

Analysis:

In 1948, Shannon proposed the code generation method by exploiting the random codebook in point-to-point communication with inverse Gaussian distribution (Gaussian distribution variance towards infinity is called inverse Gaussian) to achieve the channel capacity, which is described as follows [8],

C=12log21+SN,E40

where the signal power is S and the noise power is N.

The point-to-point channel capacity is

CAWGN=log21+SN,E41

where the signal power is S and the noise power is N.

From Eq. (31), the degree of freedom is [14].

DoF=limx→∞1+SN1+SN=1,E42

And the achievable rate is orthogonalized as

∑i=1KRi=log21+∑i=1KPiN,E43

where K means the number of users.

For two users,

2R=log21+2PN=log21+2SNR.E44

Therefore, the achievable rate is,

R=12log21+2SNR.E45

SNR = 19 and SNR = 31 case:

The capacity:C=12log21+1931=12log21+0.61=0.34E46

Achievable rate:2R=log21+219312R=log22.222R=1.15R=0.57E47

And the degree of freedom,

DoF=limSNR→∞Rlog22SNR≈12log21+2SNRlog22SNR≈12.E48

On the condition that the ratio α=log2INRlog2SNR is fixed and the strength of the signal is much larger than that of interference and noise, it is able to treat interference as noise. Therefore, the achievable rate is represented by

R=log21+SNR1+INR.E49

From Eq. (49), the DoF is represented by [14].

DoF=limSNR→∞Rlog2SNR1+INR=log2SNR1+INRlog2SNR≈log2SNRINRlog2SNR=log2SNR−log2INRlog2SNR=1−log2INRlog2SNR=1−α.E50

In the conventional binary symmetric channel, p is a random variable and a large amount of resources are used up to make an estimate of p corresponding to the given channel. By the way, p can be determined deterministically for the RNA base matrix CUAG, which is either 0.38 or 0.62. Because the specific value of p is given, the channel estimation should be investigated. The reason why the specific numerical values are selected is that for the RNA model, its maximum channel capacity is maintained even if p is determined deterministically, the variance of signal is not large, and a generalized DoF’s point of view shows a reasonable performance in the W curve. In the actual implementation, the receiver has to be satisfied with the 1-α = p shown in Figure 2. Under this circumstance, signal strength and the interference intensity are important to analyze the given channel where strong interference environment and weak interference environment are classified according to α. To take an example, if α = 1-p = 0.38, we need to analyze the strong interference channel. If α = 1-p = 0.62, we need to analyze the weak interference channel. This p estimation is able to minimize performance degradation in the binary symmetric channel while significantly reducing computational complexity. The GDoF curve of two user interference symmetric channel in Figure 5 is the highly recognizable “W” curve shown that it greatly improves understanding of interference channel by identifying two regimes. From the abovementioned example, over the symmetric channel, when α = 0.62, the signal is relatively stronger than interference. By the way, when α = 0.38, signal is relatively weaker than interference.

6. RNA genetic code constructed by block circulant jacket matrix

A block circulant Jacket matrix (BCJM) is defined by [7, 12, 13, 15].

E51

where C₀ and C ₁ are the Hadamard matrix.

The circulant submatrices are 2 × 2 matrices, whose entries are moved by block diagonal cyclic shifts. These submatrices are block circulant Jacket matrices. The BCJM C₄ is defined by

C4≜I0⊗C0'+I1⊗C1⏟,E52

where I0=1001,I1=0110,C0'=111−1, and C1=1−1−1−1, while ⊗ is the Kronecker product.

From Eq. (52), the genetic matrix CUAG3 generates RNA sequences such as [12, 13].

P1=CUAG,P2=CUAG⊗CUAG,P3=CUAG2⊗CUAG,E53

where ⊗ denotes the Kronecker product. RNA consists of the sequence of 4 bases where C, U, A, and G indicate cytosine, uracil, adenine, and guanine, on an individual basis.

According to the theory of noise-immunity coding, for 64 triplets, by comparing them with strong roots and weak roots, it is able to construct a mosaic gene matrix CUAG3. If any triplet belongs to one of the strong roots, it is substituted for 1. In an analogous fashion, if any triplet is included with one of the weak roots, it is replaced with −1. Here, the strong roots are CCCUCGACUCGCGUGG and CAAAAUAGUAUUUGGA are the weak roots, which results in the singular Rademacher matrix R₈ is in Table 3 [6, 16].

	000 (0)	001 (1)	010 (2)	011 (3)	100 (4)	101 (5)	110 (6)	111 (7)
000 (0)	CCC 000	CCU 001	CUC 010	CUU 011	UCC 100	UCU 101	UUC 110	UUU 111
001 (1)	CCA 001	CCG 000	CUA 011	CUG 010	UCA 101	UCG 100	UUA 111	UUG 110
010 (2)	CAC 010	CAU 011	CGC 000	CGU 001	UAC 110	UAU 111	UGC 100	UGU 101
011 (3)	CAA 011	CAG 010	CGA 001	CGG 000	UAA 111	UAG 110	UGA 101	UGG 100
100 (4)	ACC 100	ACU 101	AUC 110	AUU 111	GCC 000	GCU 001	GUC 010	GUU 011
101 (5)	ACA 101	ACG 100	AUA 111	AUG 110	GCA 001	GCG 000	GUA 011	GUG 010
110 (6)	AAC 110	AAU 111	AGC 100	AGU 101	GAC 010	GAU 011	GGC 000	GGU 001
111 (7)	AAA 111	AAG 110	AGA 101	AGG 100	GAA 011	GAG 010	GGA 001	GGG 000

Table 3.

[C U;A G]³ code [6, 16].

A novel encoding scheme is proposed as

E54

The Eq. (54) gives a hint of the DNA double helix.

Make a mental note to ensure that

R8≜I0⊗C0⊗P2+I1⊗C1⊗P2⏟,E55

where I0=1001,I1=0110,C0=11−11,C1=1−1−1−1, and P₂ is the double stochastic permutation matrix represented by P2=1111. Eq. (54) has a series of redundant rows which just repeat and are able to be canceled. From the Rademacher matrix R₈, one version of its mosaic gene matrices can be reached as

R8′=111111−1−1−1−111−1−1−1−111−1−11111−1−1−1−1−1−111.E56

Furthermore, by canceling the repeated column from Eq. (56) by means of CRISPR, another version of the mosaic gene matrices can be reached as Eq. (57), which is a singular RNA matrix.

E57

where C0=11−11 and C1=1−1−1−1. These matrices are able to be expanded into the DNA double helix or the RNA single strand, which indicates the process by that DNA replicates its genetic information for itself, which is transcribed into RNA and used to synthesize protein for its translation. Therefore,

R4″≜I0⊗C0+I1⊗C1⏟,E58

where C₀ has eigenvalues such that λ11=1+i and λ21=1−i, and their eigenvectors ς1=1−iT and ς2=1iT, correspondingly. In addition, C₁ has eigenvalues such that λ12=2 and λ22=−2 where their eigenvectors ς1=−1+21T and ς1=−1−21T on an individual basis [3, 17]. Then,

R4″⊗P2⇒R8=R4×2k,E59

where k = 1.

7. Symmetric genetic jacket block matrix

It is demonstrated that the genomatrices are constructed based on the kernel CAUG and the mosaic genomatrices CAUG3 are built by a series of Kronecker products, which are expanded by permuting the 4 bases C, A, U, and G on their locations in the matrix.

7.1 Permutation scheme from upper to lower

Following this scheme, we are confronted with 24 variants of genomatrices, which distinguish them from each other by replacing their subsets by the kernel CAUG. To take an analogous instance, by applying the upper-low scheme to [C A;U G], the standard genetic code is expanded into UCAGT⊗UCAG⊗UCAGT, where ^T is the transpose. Analogous to Eq. (56), one version of variants of genomatrices is constructed as

E60

Eq. (60) is also another version of variants of genomatrices by a series of Kronecker product on [1 1 1 1]^T, which is expanded into Eq. (61) indicating the process transcribing from R₈ DNA to R₄^″ RNA.

E61

Example 7.1. If A = U, C = G, we are confronted with six versions of variants of the genomatrices constructed by a series of Kronecker product of the kernel CAUG.

ACUG=−11−11−1−111−11−11−1−111=10⊗11⊗−11−1−1+01⊗11⊗−1111,E62

which is expanded into Eq. (63) and Eq. (64). These are other versions of variants of genomatrices.

AGUC=−1−1−11−1111−1−1−11−1111=10⊗11⊗−1−1−11+01⊗11⊗−1111,E63

GUCA=11−1−11−11−111−1−11−11−1=10⊗11⊗111−1+01⊗11⊗−1−11−1,E64

CUGA=111−11−1−1−1111−11−1−1−1=10⊗11⊗111−1+01⊗11⊗1−1−1−1,E65

CAGU=1−11−111−1−11−11−111−1−1=10⊗11⊗1−111+01⊗11⊗1−1−1−1,E66

GACU=1−1−1−1111−11−1−1−1111−1=10⊗11⊗1−111+01⊗11⊗−1−11−1.E67

Eq. (62–67) are six versions of variants of genomatrices, which indicate six half pairs expanded from symmetric RNA genetic matrices by an upper-lower scheme. In other words, they are constructed by rotating the block in the direction from upper to low or vice versa.

7.2 Permutation scheme from left to right

Following this scheme, we are confronted with 6 variants of genomatrices, which distinguish them from each other with the kernel CAUG. To take an analogous instance, by applying the left-right scheme to CAUG, the standard genetic code is expanded into R₈

E68

Eq. (68) is also another version of variants of genomatrices by a series of Kronecker product on [1 1;1 1], which is expanded into Eq. (69) indicating the process transcribing from R₈ DNA to R₄^″ RNA.

E69

Example 7.2. If A = U, C = G, we are confronted with six versions of variants of the genomatrices constructed by a series of Kronecker product of the kernel CAUG.

CGUA=11111−11−11−11−1−1−1−1−1=10⊗11⊗111−1⏟+01⊗11⊗1−1−1−1⏟,E70

which is expanded into Eq. (71) and Eq. (72). These are other versions of variants of genomatrices.

GCUA=11111−11−1−11−11−1−1−1−1=10⊗11⊗111−1+01⊗11⊗1−1−1−1,E71

UACG=−1−1−1−11−11−11−11−11111=10⊗11⊗−1−11−1+01⊗11⊗1−111,E72

AUGC=−1−1−1−1−11−11−11−111111=10⊗11⊗−1−1−11+01⊗11⊗−1111,E73

GCAU=1111−11−11−11−11−1−1−1−1=10⊗11⊗11−11+01⊗11⊗−11−1−1,E74

CGAU=1111−11−111−11−1−1−1−1−1=10⊗11⊗11−11+01⊗11⊗1−1−1−1.E75

Eqs. (70)–(75) are 6 versions of variants of genomatrices, which indicate six half pairs expanded from symmetric RNA genetic matrices by the left-right scheme. In other words, they are constructed by rotating the block in the direction from upper to low or vice versa.

7.3 Block Circulant jacket matrix

Construct a block matrix CN by Jacket matrices C0p and C1p such as CN=C0C1C1C0 where its order N is 2p. This matrix is called block circulant if only if C0C1RT+C1RTC0=0N, where ^RT is the reciprocal transpose. In other words, CN is a block circulant Jacket matrix (BCJM) [12, 13, 15, 18]. From the fact that C0C0RT=pIp and C1C1RT=pIp, C0 and C1 are Jacket matrices. Look back on the fact that CN is a Jacket matrix if only if CCRT=NIN, where ^RT is the reciprocal transpose. Therefore, C is a Jacket matrix if only if

CCRT=C0C1C1C0C0C1C1C0RT=2pIpC0C1RT+C1RTC0C0C1RT+C1RTC02pIp=NIN,E76

where ^RT is the reciprocal transpose. Therefore, Eq. (76) results in plenty of BCJMs.

Example 7.3. Two 2 × 2 matrices are given such as

C0=111−1,C1=a−a−1/a−1/a.

It is easy to know that C0C0RT=2I2 and C1C1RT=2I2 are satisfied. Therefore, C0 and C1 are Jacket matrices.

Moreover,

C0C1RT+C1RTC0=111−1−1/a−a−1/a−a+a−a−1/a−1/a111−1=02.E77

8. General pattern of block circulant symmetric genetic jacket matrix

We present 24(=4 × ₄C₂) DNA classes of genomatrices with their own characteristics. The main kernel of Eq. (78) is

E⏟Position⊗I0⊗A+I1⊗B⏟Main Body Kernel⊗F⏟Extending.E78

Eq. (58) is an RNA pattern by the main kernel. By applying an upper-lower or left-right scheme to the genetic matrix, the position matrix E creates the patterns analogous to Eq. (61, 69). Analogously, by applying the upper-lower and left-right scheme to the genetic matrix, the extending matrix F creates the patterns analogous to Eq. (60, 68).

South Korea’s national flag stands for different symbols of trigrams and Yin-Yang located in its middle, which is analogous to that of Figure 6. We present 24 versions of variants of genomatrices, which distinguish from each other by replacing their subsets with the kernel shown in Figure 6 like its left-hand side 10⊗11, its right-hand side 01⊗11, its upper position 10⊗11, its lower position 01⊗11, and its center part I0⊗C0+I1⊗C1, on an individual basis.

Figure 6.
General pattern by block circulant, upper-lower, and left–right scheme: Normal case.

From the fact that 10⊗11↔01⊗11 and 10⊗11↔01⊗11, upper symmetric genetic matrices are complementary with lower ones while left ones are complementary with right ones.

In addition, the pattern is created by block circulant, upper-lower, and left–right scheme on the ½ symmetric block, which are analyzed in three cases.

Case 1. Block circulant scheme

CUAG=111−1−11−1−11−111−1−11−1=100Adiag⊗11−11+0110⊗1−1−1−1.E79

UCGA=−1111−1−11−111−11−11−1−1=1001⊗−11−1−1+01AAnti−diag0⊗111−1.E80

Case 2. Upper-lower scheme

UGAC=−1−111−11−11−1−11−1−1111=10⊗11⊗−1−1−11+0AUpper⊗11⊗11−11.E81

UCAG=−1111−1−11−1−1111−1−1−11=10⊗11⊗−11−1−1+0ALower⊗11⊗111−1.E82

Case 3. Left-right scheme

AUCG=−1−1−1−11−11−111−111−111=10⊗11⊗−1−11−1+0ALeft⊗11⊗111−1.E83

UAGC=−1−1−1−1−11−111−11111−11=10⊗11⊗−1−1−11+0ARight⊗11⊗1−111.E84

Eq. (79) is a block circulant while Eq. (80) is not. Meanwhile, one part of Eq. (81, 82) is upper-lower symmetric while the other is not. By the way, one part of Eq. (83, 84) is left–right symmetric while the other part is not. Figure 7 shows a certain pattern constructed by a series of the product of CAUG as well as a distorted pattern in comparison with that in Figure 6. Therefore, these are called sickness pattern, which can cover COVID-19.

Figure 7.
Abnormal pattern by block circulant, upper-lower, and left–right scheme.

To take an analogous instance,

CUAG⇒ABCD,E85

Make a mental note to ensure.

Case 1. A≠D, B=C and A=D, B≠C.

Case 2. A=C, B≠D and A≠C, B=D.

Case 3: A=B, C≠D and A≠B, C=D.

From the aforementioned processes, we are confronted with six half symmetric blocks such as CUAG,UCGA,UGAC,UCAG,AUCG, and UAGC.

9. Conclusion

We show the experimental results of C = G = 19% and A = U = T = 31% for the COVID-19 with the RNA base matrix CUAG, which are expanded into our mathematical proof based on the information theory of doubly stochastic matrix. RNA entropy cannot reach the Shannon entropy because the probabilities of its bases are 23% away from a half that is exactly identical to the sum of its variances. In other words, there is a difference between Shannon capacity and RNA capacity, which is identical to the sum of variances of RNA base random variables because they are unable to become a half over a symmetric channel. We present a straightforward way of laying out a mathematical basis for double helix DNA in the process of reverse transcription from RNA to DNA, which is straightforward and explicit by decomposing a DNA matrix into sparse matrices which have non-redundant columns and rows. And we introduce a general pattern by block circulant, upper-lower, and left–right scheme, which is applied to the correct communication as well as means the healthy condition because it perfectly consists of 4 bases. Furthermore, we introduce an abnormal pattern by block circulant, upper-lower, and left–right scheme, which covers the distorted signal as well as COVID-19. The Equation 57, RNA matrix is the same as the Reference [12] USA patent MIMO Comm. definition 3.1 matrix.

Conflict of interest

The authors declare no conflict of interest.

References

1. Chargaff E, Zamenhof S, Green C. Human desoxypentose nucleic acid: Composition of human desoxypentose nucleic acid. Nature. 1950;165:756-757. DOI: 10.1038/165756b0
2. Watson J, Crick F. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature. 1953;171:737-738. DOI: 10.1038/171737a0
3. Temin HM. Nature of the provirus of Rous sarcoma. National Cancer Institute Monograph. 1964;17:557-570
4. Lee MH, Lee SK, Cho KM. A Life Ecosystem Management With DNA Base Complementarity. Moscow: Proceedings of the International Conference of Artificial Intelligence, Medical Engineering, Education (AIMEE 2018); 6–8 October 2018; Springer Nature; 2020
5. Papoulis A, Pillai SU. Probability, Random Variables and Stochastic Process. 4th ed. Boston: McGraw Hill; 2002
6. He M, Petoukhov S. Mathematics of Bioinformatics: Theory, Practice, and Applications. 1st ed. New Jersey: John Wiley & Sons; 2010. DOI: 10.1002/9780470904640
7. Lee SK, Park DC, Lee MH. RNA genetic 8 by 8 matrix construction from the block circulant Jacket matrix. Springer Nature: Proceedings of Symmetry Festival 2016; 18-22 July 2016, Vienna, Cham; 2017
8. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948;27:31-423-623-656. DOI: 10.1002/j.1538-7305.1948.tb01338.x\
9. Azgari C, Kilinc Z, Turhan B, Circi D, Adebali O. The mutation profile of SARS-CoV-2 is primarily shaped by the host antiviral defense. Viruses. 2021;13(3):394. DOI: 10.3390/v13030394
10. Berkhout B, Hemert VF. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Research. 2015;202:41-47. DOI: 10.1016/j.virusres.2014.11.031
11. Xia X. Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Molecular Biology and Evolution. 2020;37(9):2699-2705. DOI: 10.1093/molbev/msaa094
12. Lee MH, Hai H, Zhang XD. MIMO Communication Method and System using the Block Circulant Jacket Matrix. United States Patent US 009356671B1 [Internet]. 31 May 2016. Available from: https://patentimages.storage.googleapis.com/cb/46/34/4acf23e5a9b6e1/US9356671.pdf [Accessed: 12 December 2021]
13. Lee MH. Jacket Matrices: Construction and Its Application for Fast Cooperative Wireless Signal Processing. 1st ed. Germany, Saarbrucken: LAP LAMBERT Academic Publishing; 2012
14. Tse D, Viswanath P. Fundamentals of Wireless Communication. 1st ed. New York: Cambridge University Press; 2005. DOI: 10.1017/CBO9780511807213
15. Wikipedia, the free encyclopedia. Jacket Matrix [Internet]. 1999. Available from: https://en.wikipedia.org/wiki/Jacket_matrix [Accessed: 12 December 2021]
16. Rumer YB. Translation of ‘Systematization of Codons in the Genetic Code [II]’ by Yu. B. Rumer (1968). Royal Society. 2016;374:2063. DOI: 10.1098/rsta.2015.0447
17. Lee MH, Hai H, Lee SK, Petoukhov SV. A Mathematical Proof of Double Helix DNA to Reverse Transcription RNA for Bioinformatics. Moscow: Proceedings of the 1st International Conference of Artificial Intelligence, Medical Engineering, and Education (AIMEE 2017); 21–23 August 2017; Springer Nature; 2018
18. Chen Z, Lee MH, Zeng G. Fast cocyclic Jacket transform. IEEE trans. on Signal Processing. 2008;56(5):2143-2148. DOI: 10.1109/TSP.2007.912895

[1] 1. Chargaff E, Zamenhof S, Green C. Human desoxypentose nucleic acid: Composition of human desoxypentose nucleic acid. Nature. 1950;165:756-757. DOI: 10.1038/165756b0

[2] 2. Watson J, Crick F. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature. 1953;171:737-738. DOI: 10.1038/171737a0

[3] 3. Temin HM. Nature of the provirus of Rous sarcoma. National Cancer Institute Monograph. 1964;17:557-570

[4] 4. Lee MH, Lee SK, Cho KM. A Life Ecosystem Management With DNA Base Complementarity. Moscow: Proceedings of the International Conference of Artificial Intelligence, Medical Engineering, Education (AIMEE 2018); 6–8 October 2018; Springer Nature; 2020

[5] 5. Papoulis A, Pillai SU. Probability, Random Variables and Stochastic Process. 4th ed. Boston: McGraw Hill; 2002

[6] 6. He M, Petoukhov S. Mathematics of Bioinformatics: Theory, Practice, and Applications. 1st ed. New Jersey: John Wiley & Sons; 2010. DOI: 10.1002/9780470904640

[7] 7. Lee SK, Park DC, Lee MH. RNA genetic 8 by 8 matrix construction from the block circulant Jacket matrix. Springer Nature: Proceedings of Symmetry Festival 2016; 18-22 July 2016, Vienna, Cham; 2017

[8] 8. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948;27:31-423-623-656. DOI: 10.1002/j.1538-7305.1948.tb01338.x\

[9] 9. Azgari C, Kilinc Z, Turhan B, Circi D, Adebali O. The mutation profile of SARS-CoV-2 is primarily shaped by the host antiviral defense. Viruses. 2021;13(3):394. DOI: 10.3390/v13030394

[10] 10. Berkhout B, Hemert VF. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Research. 2015;202:41-47. DOI: 10.1016/j.virusres.2014.11.031

[11] 11. Xia X. Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Molecular Biology and Evolution. 2020;37(9):2699-2705. DOI: 10.1093/molbev/msaa094

[12] 12. Lee MH, Hai H, Zhang XD. MIMO Communication Method and System using the Block Circulant Jacket Matrix. United States Patent US 009356671B1 [Internet]. 31 May 2016. Available from: https://patentimages.storage.googleapis.com/cb/46/34/4acf23e5a9b6e1/US9356671.pdf [Accessed: 12 December 2021]

[13] 13. Lee MH. Jacket Matrices: Construction and Its Application for Fast Cooperative Wireless Signal Processing. 1st ed. Germany, Saarbrucken: LAP LAMBERT Academic Publishing; 2012

[14] 14. Tse D, Viswanath P. Fundamentals of Wireless Communication. 1st ed. New York: Cambridge University Press; 2005. DOI: 10.1017/CBO9780511807213

[15] 15. Wikipedia, the free encyclopedia. Jacket Matrix [Internet]. 1999. Available from: https://en.wikipedia.org/wiki/Jacket_matrix [Accessed: 12 December 2021]

[16] 16. Rumer YB. Translation of ‘Systematization of Codons in the Genetic Code [II]’ by Yu. B. Rumer (1968). Royal Society. 2016;374:2063. DOI: 10.1098/rsta.2015.0447

[17] 17. Lee MH, Hai H, Lee SK, Petoukhov SV. A Mathematical Proof of Double Helix DNA to Reverse Transcription RNA for Bioinformatics. Moscow: Proceedings of the 1st International Conference of Artificial Intelligence, Medical Engineering, and Education (AIMEE 2017); 21–23 August 2017; Springer Nature; 2018

[18] 18. Chen Z, Lee MH, Zeng G. Fast cocyclic Jacket transform. IEEE trans. on Signal Processing. 2008;56(5):2143-2148. DOI: 10.1109/TSP.2007.912895

The COVID-19 DNA-RNA Genetic Code Analysis Using Double Stochastic and Block Circulant Jacket Matrix

Matrix Theory - Classics and Advances

Abstract

Keywords

Author Information

Sung Kook Lee

Moon Ho Lee*

1. Introduction

Table 1.

2. Analytical approach to RNA stochastic entropy

Table 2.

Figure 1.

3. Derivation of variance for the RNA base matrix CUAG

4. RNA complement base matrix CUAG for symmetric noise immune-free channel

Figure 2.

Figure 3.

5. Two user capacity over symmetric interference channel

Figure 4.

Figure 5.

6. RNA genetic code constructed by block circulant jacket matrix

Table 3.

7. Symmetric genetic jacket block matrix

7.1 Permutation scheme from upper to lower

7.2 Permutation scheme from left to right

7.3 Block Circulant jacket matrix

8. General pattern of block circulant symmetric genetic jacket matrix

Figure 6.

Figure 7.

9. Conclusion

Conflict of interest

References

Joint EigenValue Decomposition for Quantum Information Theory and Processing

The COVID-19 DNA-RNA Genetic Code Analysis Using Double Stochastic and Block Circulant Jacket Matrix

Matrix Theory - Classics and Advances

Abstract

Keywords

Author Information

Sung Kook Lee

Moon Ho Lee*

1. Introduction

Table 1.

2. Analytical approach to RNA stochastic entropy

Table 2.

Figure 1.

3. Derivation of variance for the RNA base matrix CUAG

4. RNA complement base matrix CUAG for symmetric noise immune-free channel

Figure 2.

Figure 3.

5. Two user capacity over symmetric interference channel

Figure 4.

Figure 5.

6. RNA genetic code constructed by block circulant jacket matrix

Table 3.

7. Symmetric genetic jacket block matrix

7.1 Permutation scheme from upper to lower

7.2 Permutation scheme from left to right

7.3 Block Circulant jacket matrix

8. General pattern of block circulant symmetric genetic jacket matrix

Figure 6.

Figure 7.

9. Conclusion

Conflict of interest

References

Continue reading from the same book

Matrix Theory