Open access peer-reviewed chapter - ONLINE FIRST

The COVID-19 DNA-RNA Genetic Code Analysis Using Information Theory of Double Stochastic Matrix

Written By

Sung Kook Lee and Moon Ho Lee

Reviewed: December 22nd, 2021 Published: April 17th, 2022

DOI: 10.5772/intechopen.102342

IntechOpen
Matrix Theory - Classics and Advances Edited by Mykhaylo Andriychuk

From the Edited Volume

Matrix Theory - Classics and Advances [Working Title]

Dr. Mykhaylo I. Andriychuk

Chapter metrics overview

14 Chapter Downloads

View Full Metrics

Abstract

We present a COVID-19 DNA-RNA genetic code where A=T=U=31% and C=G=19%, which has been developed from a base matrix CUAG where C, U, A, and G are RNA bases while C, U, A, and T are DNA bases that E. Chargaff found them complementary like A=T=U=30%, and C=G=20% from his experimental results, which implied the structure of DNA double helix and its complementary combination. Unfortunately, they have not been solved mathematically yet. Therefore, in this paper, we present a simple solution by the information theory of a doubly stochastic matrix over the Shannon symmetric channel as well as prove it mathematically. Furthermore, we show that DNA-RNA genetic code is one kind of block circulant Jacket matrix. Moreover, general patterns by block circulant, upper-lower, and left-right scheme are presented, which are applied to the correct communication as well as means the healthy condition because it perfectly consists of 4 bases. Henceforth, we also provide abnormal patterns by block circulant, upper-lower, and left-right scheme, which cover the distorted signal as well as COVID-19.

Keywords

  • COVID-19 DNA-RNA
  • E. Chargaff
  • DNA-RNA genetic code
  • double stochastic matrix
  • symmetric channel
  • block circulant jacket matrix
  • general pattern
  • abnormal pattern

1. Introduction

In 1950, Chargaff’s two rules [1] were presented. One is that the percentage of adenine is identical to that of thymine as well as the percentage of guanine is identical to that of cytosine, which gives a hint of the composition of the base pair for the double-strand DNA molecule. The other is that base complementarity is effective for each DNA strand, which gives an explanation for the overall characteristics of fundamental bases. To make an example of COVID-19 DNA, its four bases are satisfied with these two rules analogous to A=T=31%and C=G=19%. In 1953, it was discovered that DNA has a double helix structure [2, 3], which results in an optimal and economical genetic code [4].

A RNA base matrix CUAGwas based on stochastic matrices [5], which results in the genetic code [6, 7]. A symmetric capacity is calculated by applying the Markov process to these doubly stochastic matrices, which suggested the symmetry between Shannon [8] and RNA stochastic transition matrix CUAG, which is defined as below. A square matrix of P=pijis stochastic, whose entries are positive as well as its sum in rows and columns is equal to one or constant. In other words, if the sum of all its elements in rows and columns is equal to one or invariable, it is double stochastic, which is able to describe the time-invariant binary symmetric channel. For the input xnand the output xn+1, two states e0and e1are able to depict Markov processes on an individual basis, which are indicated by two binary symbols “0” and “1”, accordingly. The output signal is affected by the input signal whose information is fed into given a certain error probability. Assume that these channel probabilities αand βare less than a half, whose error probabilities have been kept steady over a time-variant channel for a wide variety of transmitted symbols such as

Pxn+1=1xn=0=p01=α,Pxn+1=0xn=1=p10=β.E1

In addition, its Markov chain is homogeneous. Prepresents a 2 × 2 homogeneous probability transition matrix defined as

P=p00p01p10p11=1ααβ1β=1ppp1pp=0.5=121111,E2

whose two error probabilities are identical similarly to α=β=pover a binary symmetric channel. This paper proceeds as below. First of all, we derive the RNA stochastic entropy by applying it to the Shannon entropy in Section 2. Next, we make an estimate of the variance of RNA in Section 3. Then, the binary symmetric channel entropy is derived in Section 4. Henceforth, two user capacity is made an estimate of over symmetric interference channel in Section 5. Afterward, the construction scheme is proposed, which is enabled to create RNA genetic codes in Section 6. Later, a symmetric genetic Jacket block matrix is examined in Section 7. Hereupon, general patterns of block circulant symmetric genetic Jacket matrices are looked into in Section 8. In the end, this paper comes to a conclusion in Section 9.

Table 1 makes the description of the ratio of bases for several organisms [1, 9, 10, 11], which shows that the ratios are constant among the species.

OrganismTaxon%A%G%C%TA / TG / C%GC%AT
MaizeZea26.822.823.227.20.990.9846.154.0
OctopusOctopus33.217.617.631.61.051.0035.264.8
ChickenGallus28.022.021.628.40.991.0243.756.4
RatRattus28.621.420.528.41.011.0042.957.0
HumanHomo29.320.720.030.00.981.0440.759.3
GrasshopperOrthoptera29.320.520.729.31.000.9941.258.6
Sea urchinEchinoidea32.817.717.332.11.021.0235.064.9
WheatTriticum27.322.722.827.11.011.0045.554.4
YeastSaccharomyces31.318.717.132.90.951.0935.864.4
E. coliEscherichia24.726.025.723.61.051.0151.748.3
φX174PhiX17424.023.321.531.20.771.0844.855.2
Covid-19SARS-CoV-229.919.618.432.10.931.0738.062.0

Table 1.

Ratio of bases [1, 9, 10, 11].

Advertisement

2. Analytical approach to RNA stochastic entropy

In [1, 5, 12, 13], stochastic complementary RNA bases are given for the genetic code. On the assumption that C=G=19%,A=T=U=31%,Pdenotes the transition channel matrix expressed by

P=CUAG=0.190.310.310.19.E3

On the condition that the RNA base matrix CUAGfor the Markov process described by two independent probabilities of its corresponding source varies from 0.19pto 0.31p, the transition channel matrix Pis defined by

P=0.19p10.19p10.19p0.19p=0.510.510.50.5=0.50.50.50.5.E4

By comparison with Eq. (12), we have.

0.19p=10.19pE5

where pis 2.631.

Applying in a similar fashion to the rest of (4),

P=0.31p10.31p10.31p0.31p=0.50010.50010.5000.500=0.50.50.50.5,E6

where 0.31p = 1-0.31p, where pis 1.613.

In order to make a double stochastic matrix by adding (6) to (4),

2P=0.50.50.50.5+0.50.50.50.5=1111.E7

Applying in a similar way to (3),

2P=2CUAG=20.190.310.310.19=0.380.620.620.38.E8

If Pis a random variable for source probability pcorresponding to the first symbol event, we reach the entropy function [8] represented by

H2P=plog21p+1plog211p.E9

The last column of Table 2 shows the result of Eq. (9). Figure 1 portrays the curve of Shannon and RNA Entropy. Make a mental note to make sure that a vertical tangent can be drawn when p = 0 and p = 1 on account of the fact that

P-log2p- plog2pH2(p)
0.38001.39590.53050.9580
0.39001.35850.52980.9648
0.40001.32190.52880.9710
0.41001.28630.52740.9765
0.42001.25150.52560.9815
0.43001.21760.52360.9858
0.44001.18440.52110.9896
0.45001.15200.51840.9928
0.46001.12030.51530.9954
0.47001.08930.51200.9974
0.48001.05890.50830.9988
0.49001.02910.50430.9997
0.50001.00000.50001.0000
0.51000.97140.49540.9997
0.52000.94340.49060.9988
0.53000.91590.48540.9974
0.54000.88900.48000.9954
0.55000.86250.47440.9928
0.56000.83650.46840.9896
0.57000.81100.46230.9858
0.58000.78590.45580.9815
0.59000.76120.44910.9765
0.60000.73700.44220.9710
0.61000.71310.43500.9648
0.62000.68970.42760.9580

Table 2.

Shannon entropy for probability p.

Figure 1.

Comparison between Shannon and RNA entropy for probabilityp.

ddpplog21p+1plog211p=log21p1log211p+1log2e=log21plog211p=0,E10

which is maximized when preaches a half because its derivative becomes 0.

Therefore,

log21plog211p=01p11p=0.E11

Then, we reach

p=1pp=12.E12

For the RNA base matrix CUAG, its symmetric entropy is calculated as

H2PRNA=plog21p+1plog211p=0.9790,E13

when pis either 0.38 or 0.62. By the way, the Shannon entropy is calculated as

H2PShannon=plog21p+1plog211p=1,E14

when preaches a half.

Table 2 shows Shannon Entropy for probability pover a binary symmetric channel.

Figure 1 gives a comparison between Shannon and RNA Entropy for probability punder the RNA base matrix CUAG.

Advertisement

3. Derivation of variance for the RNA base matrix CUAG

The variance for RNA random variable Xis denoted by VXis the square of the mean, which is expressed by

EX=a=0.5.E15

Therefore, for a random variable X, the variance is obtained such as

VX=EXa2=EX22aEX+Ea2=EX22a2+a2=EX2a2=σ2.E16

Case I. Upper source probability 0.62

σupper2=0.6220.52=0.13.E17

Case II. Lower source probability 0.38

σlower2=0.520.382=0.10.E18

If X1and X2are the independent random variables, on an individual basis, its expectation and variance are

EX1=a1,VX1=σ12.E19
EX2=a2,VX2=σ22.E20

Therefore, we reach

EX1a1X2a2=EX1a1EX2a2=0.E21

Assuming that X1and X2are independent random variables, the sum of its variances is calculated as

VX1+X2=EX1+X2a1a22=EX1a12+2EX1a1X2a2+EX2a22=VX1+VX2=σ12+σ22=0.13+0.10=0.23,E22

which is approximately 23% corresponding to the difference between A = Uand C = G. It means that RNA entropy cannot reach the Shannon entropy because the probabilities of its bases are 23% away from a half that is exactly identical to the sum of its variances.

Advertisement

4. RNA complement base matrix CUAGfor symmetric noise immune-free channel

If over a noise immune-free binary symmetric channel the bases of RNA genetic code CUAGare complementary such as C=Uand A=G, the conditional probability Pbjai=Pi,jmakes description of this channel, whose maximum amount of information can be transmitted as depicted in Figure 2. On the assumption that Cand Gare one’s complement of its corresponding error probability as well as Aand Uare interference signals, the matrix [8] for this channel is made description of by

Figure 2.

Complementary bases of RNA genetic codeCUAGover noise immune-free binary symmetric channel.

pX1×2P2×2=α1αCUAG=pY1×2=pY1pY2.E23

Under the condition that pand 1-pare the selection probability α=0and α=1over the uniform channel on an individual basis, the mutual information is defined by

IXY=HYHYX.E24

From Eq. (23), we are confronted with

α1αClog2CUlog2UAlog2AGlog2G=α1αUlog2UClog2CGlog2GAlog2A,E25

where

HYX=αClog2CαAlog2A1αUlog2U1αGlog2G=Ulog2UGlog2G=Clog2CAlog2A=0.9790,E26

where A = U = 0.31 and C = G = 0.19.

Therefore, its capacity is derived as

CRNA=maxIXYp=0.38or0.62=HYHYX=10.9790=0.021,E27

i.e. HY=plog2p1plog21p=0.38log20.380.62log20.62=1.

while Shannon capacity is derived as

CShannon=maxIXYp=0.5=HYHYX=11=0.E28

In Figure 3, we compare Shannon and RNA capacity for probability p.As fore-mentioned in Section 3, if only if under the ideal circumstance, Shannon capacity can be reached. In other words, the difference between Shannon and RNA capacity exists, which is identical to the sum of variances of RNA base random variables because they are unable to become a half over a symmetric channel.

Figure 3.

Shannon and RNA capacity vary with probabilityp.

Advertisement

5. Two user capacity over symmetric interference channel

Figure 4 makes the description of the environment of the binary symmetric channel with the RNA base matrix CUAGas well as that of the symmetric interference channel for two users where two independent messages W1 and W2 with the common message set Wiare transmitted. Assume that C = G = 19% and A = U = 31% where C = H11 is the direct signal and its corresponding interference signal is U = H12 for Y1. Analogously, the direct signal for the second user Y2 is G = H22 and its corresponding interference signal is A = H21.

Figure 4.

Two-user symmetric Interference Channel. (a) Strong Interference Channel. (b) Weak Interference Channel.

H11=H12=hdPSNR,H12=H21=hcPSNR.

The relationship between the input and output for two user symmetric channel is described as follows [14],

Y1=hdPSNRX1+hcPSNRαX2+Z1,E29
Y2=hcPSNRαX1+hdPSNRX2+Z2,E30

where the powers of input symbols X1, X2, and additive white Gaussian noise (AWGN) terms Z1 and Z2 are normalized to unity. Analogous to the definition of the degree of freedom (DoF), the total GDoFmetric d(α) is defined as

dα=limPSNRCPSNRαlogPSNR,E31

where C(PSNR, α) is the sum-capacity parameterized by PSNRand α. Here αis the ratio (on the decibel scale) of cross channel strength compared to straight channel strength and PSNRindicates the ratio (on the decibel scale) of signal to the noise. Importantly, in order to find the achievable DoF, take the limit of Eq. (31) by letting PSNRgo to infinity. Make a mental note of the DoFmetric resembling to that at the point α=1. Thus, the GDoFcurve gives a significant hint for optimal interference management strategies, which has been made use of most successfully to estimate the capacity of two-user interference channel to contain a constant gap in [14]. To take an example, for RNA genetic code, assuming that its bases C = G = 19% and A = T = U = 31%, this symmetric interference channel for two users can be analyzed in strong and weak interference region as below. The noise immune channel is described as below where X1 and X2 denote the input symbols while Y1 and Y2 denote the output symbols

Y1=CX1+UX2,E32
Y2=GX1+AX2.E33

Case 1. Strong Interference region.

Figure 4 (a) makes the description of the channel in a strong interference regime, where its receivers have to try to decode the interfering signal in order to recover its desired signal. The general condition for a strong interference signal is represented by,

C<A,U>G.E34

Regretfully, it is still challenging to propose the scheme achieving a symmetric rate as well as being upper-bounded unlike in the weak interference region.

Case 2. Weak Interference region.

Figure 4 (b) makes the description of the channel in a very weak interference regime, where its receivers do not need to try to decode any portion of the interference signal by regarding it as noise. This scheme is enabled to achieve a symmetric rate per user as below [7],

R=min12log1+INR+SNR+12log2+SNRINR1log1+INR+SNRINR1.E35

The upper bound on the symmetric capacity is,

CSymmin12log1+SNR+12log1+SNR1+INRlog1+INR+SNR1+INR.E36

Letting A = T = U = 31%, C = G = 19%, i.e. INR = 31 and SNR = 19, we are confronted with the symmetric achievable rate such as

R=min12log21+31+19+12log22+19311log21+31+19311=min2.83+0.6915.021=min2.534.02=2.52.E37

Analogously, the symmetric capacity is made the description of by

Csymmin12log21+19+12log21+1931log21+31+1931min2.16+0.345.02min2.505.02=2.50.E38

Following the above steps, in a weak interference regime, by treating interference as noise, the symmetric capacity is close to its achievable capacity such as

Csym=R.E39

Figure 5 makes the description of the weak and strong interference region where the leftmost indicates a very weak interference region while the rightmost suggests a very strong interference region.

Figure 5.

Generalized degree of freedom for Gaussian Channel (W curve).

Analysis:

In 1948, Shannon proposed the code generation method by exploiting the random codebook in point-to-point communication with inverse Gaussian distribution (Gaussian distribution variance towards infinity is called inverse Gaussian) to achieve the channel capacity, which is described as follows [8],

C=12log21+SN,E40

where the signal power is Sand the noise power is N.

The point-to-point channel capacity is

CAWGN=log21+SN,E41

where the signal power is Sand the noise power is N.

From Eq. (31), the degree of freedom is [14].

DoF=limx1+SN1+SN=1,E42

And the achievable rate is orthogonalized as

i=1KRi=log21+i=1KPiN,E43

where Kmeans the number of users.

For two users,

2R=log21+2PN=log21+2SNR.E44

Therefore, the achievable rate is,

R=12log21+2SNR.E45

SNR = 19 and SNR = 31 case:

The capacity:C=12log21+1931=12log21+0.61=0.34E46
Achievable rate:2R=log21+219312R=log22.222R=1.15R=0.57E47

And the degree of freedom,

DoF=limSNRRlog22SNR12log21+2SNRlog22SNR12.E48

On the condition that the ratio α=log2INRlog2SNRis fixed and the strength of the signal is much larger than that of interference and noise, it is able to treat interference as noise. Therefore, the achievable rate is represented by

R=log21+SNR1+INR.E49

From Eq. (49), the DoFis represented by [14].

DoF=limSNRRlog2SNR1+INR=log2SNR1+INRlog2SNRlog2SNRINRlog2SNR=log2SNRlog2INRlog2SNR=1log2INRlog2SNR=1α.E50

In the conventional binary symmetric channel, pis a random variable and a large amount of resources are used up to make an estimate of pcorresponding to the given channel. By the way, pcan be determined deterministically for the RNA base matrix CUAG, which is either 0.38 or 0.62. Because the specific value of pis given, the channel estimation should be investigated. The reason why the specific numerical values are selected is that for the RNA model, its maximum channel capacity is maintained even if pis determined deterministically, the variance of signal is not large, and a generalized DoF’s point of view shows a reasonable performance in the W curve. In the actual implementation, the receiver has to be satisfied with the 1-α = pshown in Figure 2. Under this circumstance, signal strength and the interference intensity are important to analyze the given channel where strong interference environment and weak interference environment are classified according to α. To take an example, if α = 1-p = 0.38, we need to analyze the strong interference channel. If α = 1-p = 0.62, we need to analyze the weak interference channel. This pestimation is able to minimize performance degradation in the binary symmetric channel while significantly reducing computational complexity. The GDoFcurve of two user interference symmetric channel in Figure 5 is the highly recognizable “W” curve shown that it greatly improves understanding of interference channel by identifying two regimes. From the abovementioned example, over the symmetric channel, when α = 0.62, the signal is relatively stronger than interference. By the way, when α = 0.38, signal is relatively weaker than interference.

Advertisement

6. RNA genetic code constructed by block circulant jacket matrix

A block circulant Jacket matrix (BCJM) is defined by [7, 12, 13, 15].

E51

where C0 and C1 are the Hadamard matrix.

The circulant submatrices are 2 × 2 matrices, whose entries are moved by block diagonal cyclic shifts. These submatrices are block circulant Jacket matrices. The BCJM C4 is defined by

C4I0C0'+I1C1,E52

where I0=1001,I1=0110,C0'=1111,and C1=1111, while is the Kronecker product.

From Eq. (52), the genetic matrix CUAG3generates RNA sequences such as [12, 13].

P1=CUAG,P2=CUAGCUAG,P3=CUAG2CUAG,E53

where denotes the Kronecker product. RNA consists of the sequence of 4 bases where C, U, A,and Gindicate cytosine, uracil, adenine, and guanine, on an individual basis.

According to the theory of noise-immunity coding, for 64 triplets, by comparing them with strong roots and weak roots, it is able to construct a mosaic gene matrix CUAG3. If any triplet belongs to one of the strong roots, it is substituted for 1. In an analogous fashion, if any triplet is included with one of the weak roots, it is replaced with −1. Here, the strong roots are CCCUCGACUCGCGUGGand CAAAAUAGUAUUUGGAare the weak roots, which results in the singular Rademacher matrix R8 is in Table 3 [6, 16].

000
(0)
001
(1)
010
(2)
011
(3)
100
(4)
101
(5)
110
(6)
111
(7)
000
(0)
CCC
000
CCU
001
CUC
010
CUU
011
UCC
100
UCU
101
UUC
110
UUU
111
001
(1)
CCA
001
CCG
000
CUA
011
CUG
010
UCA
101
UCG
100
UUA
111
UUG
110
010
(2)
CAC
010
CAU
011
CGC
000
CGU
001
UAC
110
UAU
111
UGC
100
UGU
101
011
(3)
CAA
011
CAG
010
CGA
001
CGG
000
UAA
111
UAG
110
UGA
101
UGG
100
100
(4)
ACC
100
ACU
101
AUC
110
AUU
111
GCC
000
GCU
001
GUC
010
GUU
011
101
(5)
ACA
101
ACG
100
AUA
111
AUG
110
GCA
001
GCG
000
GUA
011
GUG
010
110
(6)
AAC
110
AAU
111
AGC
100
AGU
101
GAC
010
GAU
011
GGC
000
GGU
001
111
(7)
AAA
111
AAG
110
AGA
101
AGG
100
GAA
011
GAG
010
GGA
001
GGG
000

Table 3.

[C U;A G]3 code [6, 16].

A novel encoding scheme is proposed as

E54

The Eq. (54) gives a hint of the DNA double helix.

Make a mental note to ensure that

R8I0C0P2+I1C1P2,E55

where I0=1001,I1=0110,C0=1111,C1=1111,and P2 is the double stochastic permutation matrix represented by P2=1111. Eq. (54) has a series of redundant rows which just repeat and are able to be canceled. From the Rademacher matrix R8, one version of its mosaic gene matrices can be reached as

R8=11111111111111111111111111111111.E56

Furthermore, by canceling the repeated column from Eq. (56) by means of CRISPR, another version of the mosaic gene matrices can be reached as Eq. (57), which is a singular RNA matrix.

E57

where C0=1111and C1=1111. These matrices are able to be expanded into the DNA double helix or the RNA single strand, which indicates the process by that DNA replicates its genetic information for itself, which is transcribed into RNA and used to synthesize protein for its translation. Therefore,

R4I0C0+I1C1,E58

where C0 has eigenvalues such that λ11=1+iand λ21=1i, and their eigenvectors ς1=1iTand ς2=1iT, correspondingly. In addition, C1 has eigenvalues such that λ12=2and λ22=2where their eigenvectors ς1=1+21Tand ς1=121Ton an individual basis [3, 17]. Then,

R4P2R8=R4×2k,E59

where k = 1.

Advertisement

7. Symmetric genetic jacket block matrix

It is demonstrated that the genomatrices are constructed based on the kernel CAUGand the mosaic genomatrices CAUG3are built by a series of Kronecker products, which are expanded by permuting the 4 bases C, A, U, and Gon their locations in the matrix.

7.1 Permutation scheme from upper to lower

Following this scheme, we are confronted with 24 variants of genomatrices, which distinguish them from each other by replacing their subsets by the kernel CAUG. To take an analogous instance, by applying the upper-low scheme to [C A;U G], the standard genetic code is expanded into UCAGTUCAGUCAGT, where Tis the transpose. Analogous to Eq. (56), one version of variants of genomatrices is constructed as

E60

Eq. (60) is also another version of variants of genomatrices by a series of Kronecker product on [1 1 1 1]T, which is expanded into Eq. (61) indicating the process transcribing from R8 DNA to R4 RNA.

E61

Example 7.1. If A = U, C = G, we are confronted with six versions of variants of the genomatrices constructed by a series of Kronecker product of the kernel CAUG.

ACUG=1111111111111111=10111111+01111111,E62

which is expanded into Eq. (63) and Eq. (64). These are other versions of variants of genomatrices.

AGUC=1111111111111111=10111111+01111111,E63
GUCA=1111111111111111=10111111+01111111,E64
CUGA=1111111111111111=10111111+01111111,E65
CAGU=1111111111111111=10111111+01111111,E66
GACU=1111111111111111=10111111+01111111.E67

Eq. (6267) are six versions of variants of genomatrices, which indicate six half pairs expanded from symmetric RNA genetic matrices by an upper-lower scheme. In other words, they are constructed by rotating the block in the direction from upper to low or vice versa.

7.2 Permutation scheme from left to right

Following this scheme, we are confronted with 6 variants of genomatrices, which distinguish them from each other with the kernel CAUG. To take an analogous instance, by applying the left-right scheme to CAUG, the standard genetic code is expanded into R8

E68

Eq. (68) is also another version of variants of genomatrices by a series of Kronecker product on [1 1;1 1], which is expanded into Eq. (69) indicating the process transcribing from R8 DNA to R4 RNA.

E69

Example 7.2. If A = U, C = G, we are confronted with six versions of variants of the genomatrices constructed by a series of Kronecker product of the kernel CAUG.

CGUA=1111111111111111=10111111+01111111,E70

which is expanded into Eq. (71) and Eq. (72). These are other versions of variants of genomatrices.

GCUA=1111111111111111=10111111+01111111,E71
UACG=1111111111111111=10111111+01111111,E72
AUGC=1111111111111111=10111111+01111111,E73
GCAU=1111111111111111=10111111+01111111,E74
CGAU=1111111111111111=10111111+01111111.E75

Eqs. (70)(75) are 6 versions of variants of genomatrices, which indicate six half pairs expanded from symmetric RNA genetic matrices by the left-right scheme. In other words, they are constructed by rotating the block in the direction from upper to low or vice versa.

7.3 Block Circulant jacket matrix

Construct a block matrix CNby Jacket matrices C0pand C1psuch as CN=C0C1C1C0where its order Nis 2p. This matrix is called block circulant if only if C0C1RT+C1RTC0=0N,where RTis the reciprocal transpose. In other words, CNis a block circulant Jacket matrix (BCJM) [12, 13, 15, 18]. From the fact that C0C0RT=pIpand C1C1RT=pIp,C0and C1are Jacket matrices. Look back on the fact that CNis a Jacket matrix if only if CCRT=NIN,where RTis the reciprocal transpose. Therefore, Cis a Jacket matrix if only if

CCRT=C0C1C1C0C0C1C1C0RT=2pIpC0C1RT+C1RTC0C0C1RT+C1RTC02pIp=NIN,E76

where RTis the reciprocal transpose. Therefore, Eq. (76) results in plenty of BCJMs.

Example 7.3. Two 2 × 2 matrices are given such as

C0=1111,C1=aa1/a1/a.

It is easy to know that C0C0RT=2I2and C1C1RT=2I2are satisfied. Therefore, C0and C1are Jacket matrices.

Moreover,

C0C1RT+C1RTC0=11111/aa1/aa+aa1/a1/a1111=02.E77
Advertisement

8. General pattern of block circulant symmetric genetic jacket matrix

We present 24(=4 × 4C2) DNA classes of genomatrices with their own characteristics. The main kernel of Eq. (78) is

EPositionI0A+I1BMain Body KernelFExtending.E78

Eq. (58) is an RNA pattern by the main kernel. By applying an upper-lower or left-right scheme to the genetic matrix, the position matrix Ecreates the patterns analogous to Eq. (61, 69). Analogously, by applying the upper-lower and left-right scheme to the genetic matrix, the extending matrix F creates the patterns analogous to Eq. (60, 68).

South Korea’s national flag stands for different symbols of trigrams and Yin-Yang located in its middle, which is analogous to that of Figure 6. We present 24 versions of variants of genomatrices, which distinguish from each other by replacing their subsets with the kernel shown in Figure 6 like its left-hand side 1011, its right-hand side 0111, its upper position 1011, its lower position 0111, and its center part I0C0+I1C1, on an individual basis.

Figure 6.

General pattern by block circulant, upper-lower, and left–right scheme: Normal case.

From the fact that 10110111and 10110111, upper symmetric genetic matrices are complementary with lower ones while left ones are complementary with right ones.

In addition, the pattern is created by block circulant, upper-lower, and left–right scheme on the ½ symmetric block, which are analyzed in three cases.

Case 1. Block circulant scheme

CUAG=1111111111111111=100Adiag1111+01101111.E79
UCGA=1111111111111111=10011111+01AAntidiag01111.E80

Case 2. Upper-lower scheme

UGAC=1111111111111111=10111111+0AUpper111111.E81
UCAG=1111111111111111=10111111+0ALower111111.E82

Case 3. Left-right scheme

AUCG=1111111111111111=10111111+0ALeft111111.E83
UAGC=1111111111111111=10111111+0ARight111111.E84

Eq. (79) is a block circulant while Eq. (80) is not. Meanwhile, one part of Eq. (81, 82) is upper-lower symmetric while the other is not. By the way, one part of Eq. (83, 84) is left–right symmetric while the other part is not. Figure 7 shows a certain pattern constructed by a series of the product of CAUGas well as a distorted pattern in comparison with that in Figure 6. Therefore, these are called sickness pattern, which can cover COVID-19.

Figure 7.

Abnormal pattern by block circulant, upper-lower, and left–right scheme.

To take an analogous instance,

CUAGABCD,E85

Make a mental note to ensure.

Case 1. AD,B=Cand A=D,BC.

Case 2. A=C,BDand AC,B=D.

Case 3: A=B,CDand AB,C=D.

From the aforementioned processes, we are confronted with six half symmetric blocks such as CUAG,UCGA,UGAC,UCAG,AUCG,and UAGC.

Advertisement

9. Conclusion

We show the experimental results of C = G = 19% and A = U = T = 31% for the COVID-19 with the RNA base matrix CUAG, which are expanded into our mathematical proof based on the information theory of doubly stochastic matrix. RNA entropy cannot reach the Shannon entropy because the probabilities of its bases are 23% away from a half that is exactly identical to the sum of its variances. In other words, there is a difference between Shannon capacity and RNA capacity, which is identical to the sum of variances of RNA base random variables because they are unable to become a half over a symmetric channel. We present a straightforward way of laying out a mathematical basis for double helix DNA in the process of reverse transcription from RNA to DNA, which is straightforward and explicit by decomposing a DNA matrix into sparse matrices which have non-redundant columns and rows. And we introduce a general pattern by block circulant, upper-lower, and left–right scheme, which is applied to the correct communication as well as means the healthy condition because it perfectly consists of 4 bases. Furthermore, we introduce an abnormal pattern by block circulant, upper-lower, and left–right scheme, which covers the distorted signal as well as COVID-19. The Equation 57, RNA matrix is the same as the Reference 11 USA patent MIMO Comm. definition 3.1 matrix.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Chargaff E, Zamenhof S, Green C. Human desoxypentose nucleic acid: Composition of human desoxypentose nucleic acid. Nature. 1950;165:756-757. DOI: 10.1038/165756b0
  2. 2. Watson J, Crick F. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature. 1953;171:737-738. DOI: 10.1038/171737a0
  3. 3. Temin HM. Nature of the provirus of Rous sarcoma. National Cancer Institute Monograph. 1964;17:557-570
  4. 4. Lee MH, Lee SK, Cho KM. A Life Ecosystem Management With DNA Base Complementarity. Moscow: Proceedings of the International Conference of Artificial Intelligence, Medical Engineering, Education (AIMEE 2018); 6–8 October 2018; Springer Nature; 2020
  5. 5. Papoulis A, Pillai SU. Probability, Random Variables and Stochastic Process. 4th ed. Boston: McGraw Hill; 2002
  6. 6. He M, Petoukhov S. Mathematics of Bioinformatics: Theory, Practice, and Applications. 1st ed. New Jersey: John Wiley & Sons; 2010. DOI: 10.1002/9780470904640
  7. 7. Lee SK, Park DC, Lee MH. RNA genetic 8 by 8 matrix construction from the block circulant Jacket matrix. Springer Nature: Proceedings of Symmetry Festival 2016; 18-22 July 2016, Vienna, Cham; 2017
  8. 8. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948;27:31-423-623-656. DOI: 10.1002/j.1538-7305.1948.tb01338.x\
  9. 9. Azgari C, Kilinc Z, Turhan B, Circi D, Adebali O. The mutation profile of SARS-CoV-2 is primarily shaped by the host antiviral defense.Viruses. 2021;13(3):394. DOI: 10.3390/v13030394
  10. 10. Berkhout B, Hemert VF. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Research. 2015;202:41-47. DOI: 10.1016/j.virusres.2014.11.031
  11. 11. Xia X. Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Molecular Biology and Evolution. 2020;37(9):2699-2705. DOI: 10.1093/molbev/msaa094
  12. 12. Lee MH, Hai H, Zhang XD. MIMO Communication Method and System using the Block Circulant Jacket Matrix. United States Patent US 009356671B1 [Internet]. 31 May 2016. Available from:https://patentimages.storage.googleapis.com/cb/46/34/4acf23e5a9b6e1/US9356671.pdf[Accessed: 12 December 2021]
  13. 13. Lee MH. Jacket Matrices: Construction and Its Application for Fast Cooperative Wireless Signal Processing. 1st ed. Germany, Saarbrucken: LAP LAMBERT Academic Publishing; 2012
  14. 14. Tse D, Viswanath P. Fundamentals of Wireless Communication. 1st ed. New York: Cambridge University Press; 2005. DOI: 10.1017/CBO9780511807213
  15. 15. Wikipedia, the free encyclopedia. Jacket Matrix [Internet]. 1999. Available from:https://en.wikipedia.org/wiki/Jacket_matrix[Accessed: 12 December 2021]
  16. 16. Rumer YB. Translation of ‘Systematization of Codons in the Genetic Code [II]’ by Yu. B. Rumer (1968).Royal Society. 2016;374:2063. DOI: 10.1098/rsta.2015.0447
  17. 17. Lee MH, Hai H, Lee SK, Petoukhov SV. A Mathematical Proof of Double Helix DNA to Reverse Transcription RNA for Bioinformatics. Moscow: Proceedings of the 1st International Conference of Artificial Intelligence, Medical Engineering, and Education (AIMEE 2017); 21–23 August 2017; Springer Nature; 2018
  18. 18. Chen Z, Lee MH, Zeng G. Fast cocyclic Jacket transform. IEEE trans. on Signal Processing. 2008;56(5):2143-2148. DOI: 10.1109/TSP.2007.912895

Written By

Sung Kook Lee and Moon Ho Lee

Reviewed: December 22nd, 2021 Published: April 17th, 2022