Open access peer-reviewed chapter

The COVID-19 DNA-RNA Genetic Code Analysis Using Double Stochastic and Block Circulant Jacket Matrix

Written By

Sung Kook Lee and Moon Ho Lee

Reviewed: 22 December 2021 Published: 17 April 2022

DOI: 10.5772/intechopen.102342

From the Edited Volume

Matrix Theory - Classics and Advances

Edited by Mykhaylo Andriychuk

Chapter metrics overview

130 Chapter Downloads

View Full Metrics

Abstract

We present a COVID-19 DNA-RNA genetic code where A=T=U=31% and C=G=19%, which has been developed from a base matrix CUAG where C, U, A, and G are RNA bases while C, U, A, and T are DNA bases that E. Chargaff found them complementary like A=T=U=30%, and C=G=20% from his experimental results, which implied the structure of DNA double helix and its complementary combination. Unfortunately, they have not been solved mathematically yet. Therefore, in this paper, we present a simple solution by the information theory of a doubly stochastic matrix over the Shannon symmetric channel as well as prove it mathematically. Furthermore, we show that DNA-RNA genetic code is one kind of block circulant Jacket matrix. Moreover, general patterns by block circulant, upper-lower, and left-right scheme are presented, which are applied to the correct communication as well as means the healthy condition because it perfectly consists of 4 bases. Henceforth, we also provide abnormal patterns by block circulant, upper-lower, and left-right scheme, which cover the distorted signal as well as COVID-19.

Keywords

  • COVID-19 DNA-RNA
  • E. Chargaff
  • DNA-RNA genetic code
  • double stochastic matrix
  • symmetric channel
  • block circulant jacket matrix
  • general pattern
  • abnormal pattern

1. Introduction

In 1950, Chargaff’s two rules [1] were presented. One is that the percentage of adenine is identical to that of thymine as well as the percentage of guanine is identical to that of cytosine, which gives a hint of the composition of the base pair for the double-strand DNA molecule. The other is that base complementarity is effective for each DNA strand, which gives an explanation for the overall characteristics of fundamental bases. To make an example of COVID-19 DNA, its four bases are satisfied with these two rules analogous to A=T=31% and C=G=19%. In 1953, it was discovered that DNA has a double helix structure [2, 3], which results in an optimal and economical genetic code [4].

A RNA base matrix CUAG was based on stochastic matrices [5], which results in the genetic code [6, 7]. A symmetric capacity is calculated by applying the Markov process to these doubly stochastic matrices, which suggested the symmetry between Shannon [8] and RNA stochastic transition matrix CUAG, which is defined as below. A square matrix of P=pij is stochastic, whose entries are positive as well as its sum in rows and columns is equal to one or constant. In other words, if the sum of all its elements in rows and columns is equal to one or invariable, it is double stochastic, which is able to describe the time-invariant binary symmetric channel. For the input xn and the output xn+1, two states e0 and e1 are able to depict Markov processes on an individual basis, which are indicated by two binary symbols “0” and “1”, accordingly. The output signal is affected by the input signal whose information is fed into given a certain error probability. Assume that these channel probabilities α and β are less than a half, whose error probabilities have been kept steady over a time-variant channel for a wide variety of transmitted symbols such as

Pxn+1=1xn=0=p01=α,Pxn+1=0xn=1=p10=β.E1

In addition, its Markov chain is homogeneous. P represents a 2 × 2 homogeneous probability transition matrix defined as

P=p00p01p10p11=1ααβ1β=1ppp1pp=0.5=121111,E2

whose two error probabilities are identical similarly to α=β=p over a binary symmetric channel. This paper proceeds as below. First of all, we derive the RNA stochastic entropy by applying it to the Shannon entropy in Section 2. Next, we make an estimate of the variance of RNA in Section 3. Then, the binary symmetric channel entropy is derived in Section 4. Henceforth, two user capacity is made an estimate of over symmetric interference channel in Section 5. Afterward, the construction scheme is proposed, which is enabled to create RNA genetic codes in Section 6. Later, a symmetric genetic Jacket block matrix is examined in Section 7. Hereupon, general patterns of block circulant symmetric genetic Jacket matrices are looked into in Section 8. In the end, this paper comes to a conclusion in Section 9.

Table 1 makes the description of the ratio of bases for several organisms [1, 9, 10, 11], which shows that the ratios are constant among the species.

OrganismTaxon%A%G%C%TA / TG / C%GC%AT
MaizeZea26.822.823.227.20.990.9846.154.0
OctopusOctopus33.217.617.631.61.051.0035.264.8
ChickenGallus28.022.021.628.40.991.0243.756.4
RatRattus28.621.420.528.41.011.0042.957.0
HumanHomo29.320.720.030.00.981.0440.759.3
GrasshopperOrthoptera29.320.520.729.31.000.9941.258.6
Sea urchinEchinoidea32.817.717.332.11.021.0235.064.9
WheatTriticum27.322.722.827.11.011.0045.554.4
YeastSaccharomyces31.318.717.132.90.951.0935.864.4
E. coliEscherichia24.726.025.723.61.051.0151.748.3
φX174PhiX17424.023.321.531.20.771.0844.855.2
Covid-19SARS-CoV-229.919.618.432.10.931.0738.062.0

Table 1.

Ratio of bases [1, 9, 10, 11].

Advertisement

2. Analytical approach to RNA stochastic entropy

In [1, 5, 12, 13], stochastic complementary RNA bases are given for the genetic code. On the assumption that C=G=19%, A=T=U=31%, P denotes the transition channel matrix expressed by

P=CUAG=0.190.310.310.19.E3

On the condition that the RNA base matrix CUAG for the Markov process described by two independent probabilities of its corresponding source varies from 0.19p to 0.31p, the transition channel matrix P is defined by

P=0.19p10.19p10.19p0.19p=0.510.510.50.5=0.50.50.50.5.E4

By comparison with Eq. (12), we have.

0.19p=10.19pE5

where p is 2.631.

Applying in a similar fashion to the rest of (4),

P=0.31p10.31p10.31p0.31p=0.50010.50010.5000.500=0.50.50.50.5,E6

where 0.31p = 1-0.31p, where p is 1.613.

In order to make a double stochastic matrix by adding (6) to (4),

2P=0.50.50.50.5+0.50.50.50.5=1111.E7

Applying in a similar way to (3),

2P=2CUAG=20.190.310.310.19=0.380.620.620.38.E8

If P is a random variable for source probability p corresponding to the first symbol event, we reach the entropy function [8] represented by

H2P=plog21p+1plog211p.E9

The last column of Table 2 shows the result of Eq. (9). Figure 1 portrays the curve of Shannon and RNA Entropy. Make a mental note to make sure that a vertical tangent can be drawn when p = 0 and p = 1 on account of the fact that

P-log2p- plog2pH2(p)
0.38001.39590.53050.9580
0.39001.35850.52980.9648
0.40001.32190.52880.9710
0.41001.28630.52740.9765
0.42001.25150.52560.9815
0.43001.21760.52360.9858
0.44001.18440.52110.9896
0.45001.15200.51840.9928
0.46001.12030.51530.9954
0.47001.08930.51200.9974
0.48001.05890.50830.9988
0.49001.02910.50430.9997
0.50001.00000.50001.0000
0.51000.97140.49540.9997
0.52000.94340.49060.9988
0.53000.91590.48540.9974
0.54000.88900.48000.9954
0.55000.86250.47440.9928
0.56000.83650.46840.9896
0.57000.81100.46230.9858
0.58000.78590.45580.9815
0.59000.76120.44910.9765
0.60000.73700.44220.9710
0.61000.71310.43500.9648
0.62000.68970.42760.9580

Table 2.

Shannon entropy for probability p.

Figure 1.

Comparison between Shannon and RNA entropy for probability p.

ddpplog21p+1plog211p=log21p1log211p+1log2e=log21plog211p=0,E10

which is maximized when p reaches a half because its derivative becomes 0.

Therefore,

log21plog211p=01p11p=0.E11

Then, we reach

p=1pp=12.E12

For the RNA base matrix CUAG, its symmetric entropy is calculated as

H2PRNA=plog21p+1plog211p=0.9790,E13

when p is either 0.38 or 0.62. By the way, the Shannon entropy is calculated as

H2PShannon=plog21p+1plog211p=1,E14

when p reaches a half.

Table 2 shows Shannon Entropy for probability p over a binary symmetric channel.

Figure 1 gives a comparison between Shannon and RNA Entropy for probability p under the RNA base matrix CUAG.

Advertisement

3. Derivation of variance for the RNA base matrix CUAG

The variance for RNA random variable X is denoted by VX is the square of the mean, which is expressed by

EX=a=0.5.E15

Therefore, for a random variable X, the variance is obtained such as

VX=EXa2=EX22aEX+Ea2=EX22a2+a2=EX2a2=σ2.E16

Case I. Upper source probability 0.62

σupper2=0.6220.52=0.13.E17

Case II. Lower source probability 0.38

σlower2=0.520.382=0.10.E18

If X1 and X2are the independent random variables, on an individual basis, its expectation and variance are

EX1=a1,VX1=σ12.E19
EX2=a2,VX2=σ22.E20

Therefore, we reach

EX1a1X2a2=EX1a1EX2a2=0.E21

Assuming that X1 and X2 are independent random variables, the sum of its variances is calculated as

VX1+X2=EX1+X2a1a22=EX1a12+2EX1a1X2a2+EX2a22=VX1+VX2=σ12+σ22=0.13+0.10=0.23,E22

which is approximately 23% corresponding to the difference between A = U and C = G. It means that RNA entropy cannot reach the Shannon entropy because the probabilities of its bases are 23% away from a half that is exactly identical to the sum of its variances.

Advertisement

4. RNA complement base matrix CUAG for symmetric noise immune-free channel

If over a noise immune-free binary symmetric channel the bases of RNA genetic code CUAG are complementary such as C=U and A=G, the conditional probability Pbjai=Pi,j makes description of this channel, whose maximum amount of information can be transmitted as depicted in Figure 2. On the assumption that C and G are one’s complement of its corresponding error probability as well as A and U are interference signals, the matrix [8] for this channel is made description of by

Figure 2.

Complementary bases of RNA genetic code CUAG over noise immune-free binary symmetric channel.

pX1×2P2×2=α1αCUAG=pY1×2=pY1pY2.E23

Under the condition that p and 1-p are the selection probability α=0 and α=1 over the uniform channel on an individual basis, the mutual information is defined by

IXY=HYHYX.E24

From Eq. (23), we are confronted with

α1αClog2CUlog2UAlog2AGlog2G=α1αUlog2UClog2CGlog2GAlog2A,E25

where

HYX=αClog2CαAlog2A1αUlog2U1αGlog2G=Ulog2UGlog2G=Clog2CAlog2A=0.9790,E26

where A = U = 0.31 and C = G = 0.19.

Therefore, its capacity is derived as

CRNA=maxIXYp=0.38or0.62=HYHYX=10.9790=0.021,E27

i.e. HY=plog2p1plog21p=0.38log20.380.62log20.62=1.

while Shannon capacity is derived as

CShannon=maxIXYp=0.5=HYHYX=11=0.E28

In Figure 3, we compare Shannon and RNA capacity for probability p. As fore-mentioned in Section 3, if only if under the ideal circumstance, Shannon capacity can be reached. In other words, the difference between Shannon and RNA capacity exists, which is identical to the sum of variances of RNA base random variables because they are unable to become a half over a symmetric channel.

Figure 3.

Shannon and RNA capacity vary with probability p.

Advertisement

5. Two user capacity over symmetric interference channel

Figure 4 makes the description of the environment of the binary symmetric channel with the RNA base matrix CUAG as well as that of the symmetric interference channel for two users where two independent messages W1 and W2 with the common message set Wi are transmitted. Assume that C = G = 19% and A = U = 31% where C = H 11 is the direct signal and its corresponding interference signal is U = H 12 for Y1. Analogously, the direct signal for the second user Y2 is G = H22 and its corresponding interference signal is A = H21.

Figure 4.

Two-user symmetric Interference Channel. (a) Strong Interference Channel. (b) Weak Interference Channel.

H11=H12=hdPSNR,H12=H21=hcPSNR.

The relationship between the input and output for two user symmetric channel is described as follows [14],

Y1=hdPSNRX1+hcPSNRαX2+Z1,E29
Y2=hcPSNRαX1+hdPSNRX2+Z2,E30

where the powers of input symbols X1, X2, and additive white Gaussian noise (AWGN) terms Z1 and Z2 are normalized to unity. Analogous to the definition of the degree of freedom (DoF), the total GDoF metric d(α) is defined as

dα=limPSNRCPSNRαlogPSNR,E31

where C (PSNR, α) is the sum-capacity parameterized by PSNR and α. Here α is the ratio (on the decibel scale) of cross channel strength compared to straight channel strength and PSNR indicates the ratio (on the decibel scale) of signal to the noise. Importantly, in order to find the achievable DoF, take the limit of Eq. (31) by letting PSNR go to infinity. Make a mental note of the DoF metric resembling to that at the point α =1. Thus, the GDoF curve gives a significant hint for optimal interference management strategies, which has been made use of most successfully to estimate the capacity of two-user interference channel to contain a constant gap in [14]. To take an example, for RNA genetic code, assuming that its bases C = G = 19% and A = T = U = 31%, this symmetric interference channel for two users can be analyzed in strong and weak interference region as below. The noise immune channel is described as below where X1 and X2 denote the input symbols while Y1 and Y2 denote the output symbols

Y1=CX1+UX2,E32
Y2=GX1+AX2.E33

Case 1. Strong Interference region.

Figure 4 (a) makes the description of the channel in a strong interference regime, where its receivers have to try to decode the interfering signal in order to recover its desired signal. The general condition for a strong interference signal is represented by,

C<A,U>G.E34

Regretfully, it is still challenging to propose the scheme achieving a symmetric rate as well as being upper-bounded unlike in the weak interference region.

Case 2. Weak Interference region.

Figure 4 (b) makes the description of the channel in a very weak interference regime, where its receivers do not need to try to decode any portion of the interference signal by regarding it as noise. This scheme is enabled to achieve a symmetric rate per user as below [14],

R=min12log1+INR+SNR+12log2+SNRINR1log1+INR+SNRINR1.E35

The upper bound on the symmetric capacity is,

CSymmin12log1+SNR+12log1+SNR1+INRlog1+INR+SNR1+INR.E36

Letting A = T = U = 31%, C = G = 19%, i.e. INR = 31 and SNR = 19, we are confronted with the symmetric achievable rate such as

R=min12log21+31+19+12log22+19311log21+31+19311=min2.83+0.6915.021=min2.534.02=2.52.E37

Analogously, the symmetric capacity is made the description of by

Csymmin12log21+19+12log21+1931log21+31+1931min2.16+0.345.02min2.505.02=2.50.E38

Following the above steps, in a weak interference regime, by treating interference as noise, the symmetric capacity is close to its achievable capacity such as

Csym=R.E39

Figure 5 makes the description of the weak and strong interference region where the leftmost indicates a very weak interference region while the rightmost suggests a very strong interference region.

Figure 5.

Generalized degree of freedom for Gaussian Channel (W curve).

Analysis:

In 1948, Shannon proposed the code generation method by exploiting the random codebook in point-to-point communication with inverse Gaussian distribution (Gaussian distribution variance towards infinity is called inverse Gaussian) to achieve the channel capacity, which is described as follows [8],

C=12log21+SN,E40

where the signal power is S and the noise power is N.

The point-to-point channel capacity is

CAWGN=log21+SN,E41

where the signal power is S and the noise power is N.

From Eq. (31), the degree of freedom is [14].

DoF=limx1+SN1+SN=1,E42

And the achievable rate is orthogonalized as

i=1KRi=log21+i=1KPiN,E43

where K means the number of users.

For two users,

2R=log21+2PN=log21+2SNR.E44

Therefore, the achievable rate is,

R=12log21+2SNR.E45

SNR = 19 and SNR = 31 case:

The capacity:C=12log21+1931=12log21+0.61=0.34E46
Achievable rate:2R=log21+219312R=log22.222R=1.15R=0.57E47

And the degree of freedom,

DoF=limSNRRlog22SNR12log21+2SNRlog22SNR12.E48

On the condition that the ratio α=log2INRlog2SNR is fixed and the strength of the signal is much larger than that of interference and noise, it is able to treat interference as noise. Therefore, the achievable rate is represented by

R=log21+SNR1+INR.E49

From Eq. (49), the DoF is represented by [14].

DoF=limSNRRlog2SNR1+INR=log2SNR1+INRlog2SNRlog2SNRINRlog2SNR=log2SNRlog2INRlog2SNR=1log2INRlog2SNR=1α.E50

In the conventional binary symmetric channel, p is a random variable and a large amount of resources are used up to make an estimate of p corresponding to the given channel. By the way, p can be determined deterministically for the RNA base matrix CUAG, which is either 0.38 or 0.62. Because the specific value of p is given, the channel estimation should be investigated. The reason why the specific numerical values are selected is that for the RNA model, its maximum channel capacity is maintained even if p is determined deterministically, the variance of signal is not large, and a generalized DoF’s point of view shows a reasonable performance in the W curve. In the actual implementation, the receiver has to be satisfied with the 1-α = p shown in Figure 2. Under this circumstance, signal strength and the interference intensity are important to analyze the given channel where strong interference environment and weak interference environment are classified according to α. To take an example, if α = 1-p = 0.38, we need to analyze the strong interference channel. If α = 1-p = 0.62, we need to analyze the weak interference channel. This p estimation is able to minimize performance degradation in the binary symmetric channel while significantly reducing computational complexity. The GDoF curve of two user interference symmetric channel in Figure 5 is the highly recognizable “W” curve shown that it greatly improves understanding of interference channel by identifying two regimes. From the abovementioned example, over the symmetric channel, when α = 0.62, the signal is relatively stronger than interference. By the way, when α = 0.38, signal is relatively weaker than interference.

Advertisement

6. RNA genetic code constructed by block circulant jacket matrix

A block circulant Jacket matrix (BCJM) is defined by [7, 12, 13, 15].

E51

where C0 and C 1 are the Hadamard matrix.

The circulant submatrices are 2 × 2 matrices, whose entries are moved by block diagonal cyclic shifts. These submatrices are block circulant Jacket matrices. The BCJM C4 is defined by

C4I0C0'+I1C1,E52

where I0=1001,I1=0110,C0'=1111, and C1=1111, while is the Kronecker product.

From Eq. (52), the genetic matrix CUAG3 generates RNA sequences such as [12, 13].

P1=CUAG,P2=CUAGCUAG,P3=CUAG2CUAG,E53

where denotes the Kronecker product. RNA consists of the sequence of 4 bases where C, U, A, and G indicate cytosine, uracil, adenine, and guanine, on an individual basis.

According to the theory of noise-immunity coding, for 64 triplets, by comparing them with strong roots and weak roots, it is able to construct a mosaic gene matrix CUAG3. If any triplet belongs to one of the strong roots, it is substituted for 1. In an analogous fashion, if any triplet is included with one of the weak roots, it is replaced with −1. Here, the strong roots are CCCUCGACUCGCGUGG and CAAAAUAGUAUUUGGA are the weak roots, which results in the singular Rademacher matrix R8 is in Table 3 [6, 16].

000
(0)
001
(1)
010
(2)
011
(3)
100
(4)
101
(5)
110
(6)
111
(7)
000
(0)
CCC
000
CCU
001
CUC
010
CUU
011
UCC
100
UCU
101
UUC
110
UUU
111
001
(1)
CCA
001
CCG
000
CUA
011
CUG
010
UCA
101
UCG
100
UUA
111
UUG
110
010
(2)
CAC
010
CAU
011
CGC
000
CGU
001
UAC
110
UAU
111
UGC
100
UGU
101
011
(3)
CAA
011
CAG
010
CGA
001
CGG
000
UAA
111
UAG
110
UGA
101
UGG
100
100
(4)
ACC
100
ACU
101
AUC
110
AUU
111
GCC
000
GCU
001
GUC
010
GUU
011
101
(5)
ACA
101
ACG
100
AUA
111
AUG
110
GCA
001
GCG
000
GUA
011
GUG
010
110
(6)
AAC
110
AAU
111
AGC
100
AGU
101
GAC
010
GAU
011
GGC
000
GGU
001
111
(7)
AAA
111
AAG
110
AGA
101
AGG
100
GAA
011
GAG
010
GGA
001
GGG
000

Table 3.

[C U;A G]3 code [6, 16].

A novel encoding scheme is proposed as

E54

The Eq. (54) gives a hint of the DNA double helix.

Make a mental note to ensure that

R8I0C0P2+I1C1P2,E55

where I0=1001,I1=0110,C0=1111,C1=1111, and P2 is the double stochastic permutation matrix represented by P2=1111. Eq. (54) has a series of redundant rows which just repeat and are able to be canceled. From the Rademacher matrix R8, one version of its mosaic gene matrices can be reached as

R8=11111111111111111111111111111111.E56

Furthermore, by canceling the repeated column from Eq. (56) by means of CRISPR, another version of the mosaic gene matrices can be reached as Eq. (57), which is a singular RNA matrix.

E57

where C0=1111 and C1=1111. These matrices are able to be expanded into the DNA double helix or the RNA single strand, which indicates the process by that DNA replicates its genetic information for itself, which is transcribed into RNA and used to synthesize protein for its translation. Therefore,

R4I0C0+I1C1,E58

where C0 has eigenvalues such that λ11=1+i and λ21=1i, and their eigenvectors ς1=1iT and ς2=1iT, correspondingly. In addition, C1 has eigenvalues such that λ12=2 and λ22=2 where their eigenvectors ς1=1+21T and ς1=121T on an individual basis [3, 17]. Then,

R4P2R8=R4×2k,E59

where k = 1.

Advertisement

7. Symmetric genetic jacket block matrix

It is demonstrated that the genomatrices are constructed based on the kernel CAUG and the mosaic genomatrices CAUG3 are built by a series of Kronecker products, which are expanded by permuting the 4 bases C, A, U, and G on their locations in the matrix.

7.1 Permutation scheme from upper to lower

Following this scheme, we are confronted with 24 variants of genomatrices, which distinguish them from each other by replacing their subsets by the kernel CAUG. To take an analogous instance, by applying the upper-low scheme to [C A;U G], the standard genetic code is expanded into UCAGTUCAGUCAGT, where T is the transpose. Analogous to Eq. (56), one version of variants of genomatrices is constructed as

E60

Eq. (60) is also another version of variants of genomatrices by a series of Kronecker product on [1 1 1 1]T, which is expanded into Eq. (61) indicating the process transcribing from R8 DNA to R4 RNA.

E61

Example 7.1. If A = U, C = G, we are confronted with six versions of variants of the genomatrices constructed by a series of Kronecker product of the kernel CAUG.

ACUG=1111111111111111=10111111+01111111,E62

which is expanded into Eq. (63) and Eq. (64). These are other versions of variants of genomatrices.

AGUC=1111111111111111=10111111+01111111,E63
GUCA=1111111111111111=10111111+01111111,E64
CUGA=1111111111111111=10111111+01111111,E65
CAGU=1111111111111111=10111111+01111111,E66
GACU=1111111111111111=10111111+01111111.E67

Eq. (6267) are six versions of variants of genomatrices, which indicate six half pairs expanded from symmetric RNA genetic matrices by an upper-lower scheme. In other words, they are constructed by rotating the block in the direction from upper to low or vice versa.

7.2 Permutation scheme from left to right

Following this scheme, we are confronted with 6 variants of genomatrices, which distinguish them from each other with the kernel CAUG. To take an analogous instance, by applying the left-right scheme to CAUG, the standard genetic code is expanded into R8

E68

Eq. (68) is also another version of variants of genomatrices by a series of Kronecker product on [1 1;1 1], which is expanded into Eq. (69) indicating the process transcribing from R8 DNA to R4 RNA.

E69

Example 7.2. If A = U, C = G, we are confronted with six versions of variants of the genomatrices constructed by a series of Kronecker product of the kernel CAUG.

CGUA=1111111111111111=10111111+01111111,E70

which is expanded into Eq. (71) and Eq. (72). These are other versions of variants of genomatrices.

GCUA=1111111111111111=10111111+01111111,E71
UACG=1111111111111111=10111111+01111111,E72
AUGC=1111111111111111=10111111+01111111,E73
GCAU=1111111111111111=10111111+01111111,E74
CGAU=1111111111111111=10111111+01111111.E75

Eqs. (70)(75) are 6 versions of variants of genomatrices, which indicate six half pairs expanded from symmetric RNA genetic matrices by the left-right scheme. In other words, they are constructed by rotating the block in the direction from upper to low or vice versa.

7.3 Block Circulant jacket matrix

Construct a block matrix CN by Jacket matrices C0p and C1p such as CN=C0C1C1C0 where its order N is 2p. This matrix is called block circulant if only if C0C1RT+C1RTC0=0N, where RT is the reciprocal transpose. In other words, CN is a block circulant Jacket matrix (BCJM) [12, 13, 15, 18]. From the fact that C0C0RT=pIp and C1C1RT=pIp, C0 and C1 are Jacket matrices. Look back on the fact that CN is a Jacket matrix if only if CCRT=NIN, where RT is the reciprocal transpose. Therefore, C is a Jacket matrix if only if

CCRT=C0C1C1C0C0C1C1C0RT=2pIpC0C1RT+C1RTC0C0C1RT+C1RTC02pIp=NIN,E76

where RT is the reciprocal transpose. Therefore, Eq. (76) results in plenty of BCJMs.

Example 7.3. Two 2 × 2 matrices are given such as

C0=1111,C1=aa1/a1/a.

It is easy to know that C0C0RT=2I2 and C1C1RT=2I2 are satisfied. Therefore, C0 and C1 are Jacket matrices.

Moreover,

C0C1RT+C1RTC0=11111/aa1/aa+aa1/a1/a1111=02.E77
Advertisement

8. General pattern of block circulant symmetric genetic jacket matrix

We present 24(=4 × 4C2) DNA classes of genomatrices with their own characteristics. The main kernel of Eq. (78) is

EPositionI0A+I1BMain Body KernelFExtending.E78

Eq. (58) is an RNA pattern by the main kernel. By applying an upper-lower or left-right scheme to the genetic matrix, the position matrix E creates the patterns analogous to Eq. (61, 69). Analogously, by applying the upper-lower and left-right scheme to the genetic matrix, the extending matrix F creates the patterns analogous to Eq. (60, 68).

South Korea’s national flag stands for different symbols of trigrams and Yin-Yang located in its middle, which is analogous to that of Figure 6. We present 24 versions of variants of genomatrices, which distinguish from each other by replacing their subsets with the kernel shown in Figure 6 like its left-hand side 1011, its right-hand side 0111, its upper position 1011, its lower position 0111, and its center part I0C0+I1C1, on an individual basis.

Figure 6.

General pattern by block circulant, upper-lower, and left–right scheme: Normal case.

From the fact that 10110111 and 10110111, upper symmetric genetic matrices are complementary with lower ones while left ones are complementary with right ones.

In addition, the pattern is created by block circulant, upper-lower, and left–right scheme on the ½ symmetric block, which are analyzed in three cases.

Case 1. Block circulant scheme

CUAG=1111111111111111=100Adiag1111+01101111.E79
UCGA=1111111111111111=10011111+01AAntidiag01111.E80

Case 2. Upper-lower scheme

UGAC=1111111111111111=10111111+0AUpper111111.E81
UCAG=1111111111111111=10111111+0ALower111111.E82

Case 3. Left-right scheme

AUCG=1111111111111111=10111111+0ALeft111111.E83
UAGC=1111111111111111=10111111+0ARight111111.E84

Eq. (79) is a block circulant while Eq. (80) is not. Meanwhile, one part of Eq. (81, 82) is upper-lower symmetric while the other is not. By the way, one part of Eq. (83, 84) is left–right symmetric while the other part is not. Figure 7 shows a certain pattern constructed by a series of the product of CAUG as well as a distorted pattern in comparison with that in Figure 6. Therefore, these are called sickness pattern, which can cover COVID-19.

Figure 7.

Abnormal pattern by block circulant, upper-lower, and left–right scheme.

To take an analogous instance,

CUAGABCD,E85

Make a mental note to ensure.

Case 1. AD, B=C and A=D, BC.

Case 2. A=C, BD and AC, B=D.

Case 3: A=B, CD and AB, C=D.

From the aforementioned processes, we are confronted with six half symmetric blocks such as CUAG,UCGA,UGAC,UCAG,AUCG, and UAGC.

Advertisement

9. Conclusion

We show the experimental results of C = G = 19% and A = U = T = 31% for the COVID-19 with the RNA base matrix CUAG, which are expanded into our mathematical proof based on the information theory of doubly stochastic matrix. RNA entropy cannot reach the Shannon entropy because the probabilities of its bases are 23% away from a half that is exactly identical to the sum of its variances. In other words, there is a difference between Shannon capacity and RNA capacity, which is identical to the sum of variances of RNA base random variables because they are unable to become a half over a symmetric channel. We present a straightforward way of laying out a mathematical basis for double helix DNA in the process of reverse transcription from RNA to DNA, which is straightforward and explicit by decomposing a DNA matrix into sparse matrices which have non-redundant columns and rows. And we introduce a general pattern by block circulant, upper-lower, and left–right scheme, which is applied to the correct communication as well as means the healthy condition because it perfectly consists of 4 bases. Furthermore, we introduce an abnormal pattern by block circulant, upper-lower, and left–right scheme, which covers the distorted signal as well as COVID-19. The Equation 57, RNA matrix is the same as the Reference [12] USA patent MIMO Comm. definition 3.1 matrix.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Chargaff E, Zamenhof S, Green C. Human desoxypentose nucleic acid: Composition of human desoxypentose nucleic acid. Nature. 1950;165:756-757. DOI: 10.1038/165756b0
  2. 2. Watson J, Crick F. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature. 1953;171:737-738. DOI: 10.1038/171737a0
  3. 3. Temin HM. Nature of the provirus of Rous sarcoma. National Cancer Institute Monograph. 1964;17:557-570
  4. 4. Lee MH, Lee SK, Cho KM. A Life Ecosystem Management With DNA Base Complementarity. Moscow: Proceedings of the International Conference of Artificial Intelligence, Medical Engineering, Education (AIMEE 2018); 6–8 October 2018; Springer Nature; 2020
  5. 5. Papoulis A, Pillai SU. Probability, Random Variables and Stochastic Process. 4th ed. Boston: McGraw Hill; 2002
  6. 6. He M, Petoukhov S. Mathematics of Bioinformatics: Theory, Practice, and Applications. 1st ed. New Jersey: John Wiley & Sons; 2010. DOI: 10.1002/9780470904640
  7. 7. Lee SK, Park DC, Lee MH. RNA genetic 8 by 8 matrix construction from the block circulant Jacket matrix. Springer Nature: Proceedings of Symmetry Festival 2016; 18-22 July 2016, Vienna, Cham; 2017
  8. 8. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948;27:31-423-623-656. DOI: 10.1002/j.1538-7305.1948.tb01338.x\
  9. 9. Azgari C, Kilinc Z, Turhan B, Circi D, Adebali O. The mutation profile of SARS-CoV-2 is primarily shaped by the host antiviral defense. Viruses. 2021;13(3):394. DOI: 10.3390/v13030394
  10. 10. Berkhout B, Hemert VF. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Research. 2015;202:41-47. DOI: 10.1016/j.virusres.2014.11.031
  11. 11. Xia X. Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Molecular Biology and Evolution. 2020;37(9):2699-2705. DOI: 10.1093/molbev/msaa094
  12. 12. Lee MH, Hai H, Zhang XD. MIMO Communication Method and System using the Block Circulant Jacket Matrix. United States Patent US 009356671B1 [Internet]. 31 May 2016. Available from: https://patentimages.storage.googleapis.com/cb/46/34/4acf23e5a9b6e1/US9356671.pdf [Accessed: 12 December 2021]
  13. 13. Lee MH. Jacket Matrices: Construction and Its Application for Fast Cooperative Wireless Signal Processing. 1st ed. Germany, Saarbrucken: LAP LAMBERT Academic Publishing; 2012
  14. 14. Tse D, Viswanath P. Fundamentals of Wireless Communication. 1st ed. New York: Cambridge University Press; 2005. DOI: 10.1017/CBO9780511807213
  15. 15. Wikipedia, the free encyclopedia. Jacket Matrix [Internet]. 1999. Available from: https://en.wikipedia.org/wiki/Jacket_matrix [Accessed: 12 December 2021]
  16. 16. Rumer YB. Translation of ‘Systematization of Codons in the Genetic Code [II]’ by Yu. B. Rumer (1968). Royal Society. 2016;374:2063. DOI: 10.1098/rsta.2015.0447
  17. 17. Lee MH, Hai H, Lee SK, Petoukhov SV. A Mathematical Proof of Double Helix DNA to Reverse Transcription RNA for Bioinformatics. Moscow: Proceedings of the 1st International Conference of Artificial Intelligence, Medical Engineering, and Education (AIMEE 2017); 21–23 August 2017; Springer Nature; 2018
  18. 18. Chen Z, Lee MH, Zeng G. Fast cocyclic Jacket transform. IEEE trans. on Signal Processing. 2008;56(5):2143-2148. DOI: 10.1109/TSP.2007.912895

Written By

Sung Kook Lee and Moon Ho Lee

Reviewed: 22 December 2021 Published: 17 April 2022