SARS-CoV-2 virus strains are taken into consideration for the analysis of digitized sequences of information by means of the notions of entropy. The occurrence of a particular pattern in the corona viral sequence is paid a special attention. The incidence of genetic word is represented in a density means. The incidence frequency of the q-gram genetic word is determined with the help of finite impulse response (FIR) filter along the sequence. It is in turn, used for the determination of the probability distribution of the genetic word incidence as the input for the calculation of entropy in the sequence. The sequence entropy is further used for principal component analysis (PCA) to determine the similarity/dissimilarity between the viral sequences. We have considered seven human corona virus sequences. Entropy based similarity study for SARS-CoV-2 strains is presented in this work.
Part of the book: Entropy and Exergy in Renewable Energy