The machine learning and rule mining methods related to gene inactivation and RNAi.
RNA interference (RNAi) and gene inactivation are extensively used biological terms in biomedical research. Two categories of small ribonucleic acid (RNA) molecules, viz., microRNA (miRNA) and small interfering RNA (siRNA) are central to the RNAi. There are various kinds of algorithms developed related to RNAi and gene silencing. In this book chapter, we provided a comprehensive review of various machine learning and association rule mining algorithms developed to handle different biological problems such as detection of gene signature, biomarker, gene module, potentially disordered protein, differentially methylated region and many more. We also provided a comparative study of different well-known classifiers along with other used methods. In addition, we demonstrated the brief biological information regarding the immense biological challenges for gene activation as well as their advantages, disadvantages and possible therapeutic strategies. Finally, our study helps the bioinformaticians to understand the overall immense idea in different research dimensions including several learning algorithms for the benevolent of the disease discovery.
- machine learning
- association rule mining
- gene silencing
- multi-omics data
RNAi  is an innate biological process in which RNA molecules inhibit gene expression or translation  by suppressing targeted mRNA molecules. Since the discovery of RNAi by Andrew Fire and Craig Mello, it has become evident that RNAi has immense potential in suppression of desired genes . The first evidence that double-stranded RNA (dsRNA) could achieve efficient gene silencing through RNAi came from studies on the nematode Caenorhabditis elegans  and Drosophila melanogaster , which lead toward understanding the biochemical nature of the RNAi pathway. Two types of small ribonucleic acid (RNA) molecules—microRNA (miRNA) and small interfering RNA (siRNA)—are central to RNAi . To compare two types of elicit RNAi, the siRNA must be fully complementary to its target mRNA, whereas, miRNA only needs to be partially complementary to its target mRNA. In organisms like C. elegans and D. melanogaster, RNAi can be induced by introducing long dsRNA complementary to the target mRNA to be degraded, however, in mammalian cells and organisms, introducing dsRNA longer than 30 bp activates a potent antiviral response. To solve this limitation, siRNAs are used to induce RNAi in mammalian cells and organisms [7, 8, 9].
The discovery of both siRNA and miRNA provides a new therapeutic approach [10, 11] for the treatment of diseases by targeting genes that have undesired mutated or overexpression of normal genes. The RNAi Process is as following. SiRNAs that induce the degradation of specific endogenous is very common phenomenon in eukaryotic cells to inhibit protein production at post transcriptional level . The RNAi process is initiated by short dsRNAs, 21–25 nucleotides that lead to the sequence specific inhibition of their homologous mRNAs. These siRNAs are normally produced in cells from cleavage of longer dsRNA precursors by Dicer that is a ribonuclease III family member. The cleaved parts are incorporated into a multi-component nuclease complex known as the RNA-induced silencing complexes (RISC), which contain the splicing protein Argonaute-2 (Ago-2) . The ssRNA derived from the short dsRNA acts as a antisense strand directing the complex to the specific target mRNA; in where a RISC-associated endoribonuclease cleavages the target mRNA . Therapeutic approaches based on siRNA involve the introduction of a synthetic siRNA into the target cells to elicit RNAi, thereby inhibiting the expression of a specific messenger RNA (mRNA) to produce a gene silencing effect . RNAi is beneficial in accelerating cures in medicine, especially when a disease is thought to due to a defective gene . For historical perspective, the first application of RNAi therapy was in age-related macular degeneration (AMD) by using siRNAs to suppress the vascular endothelial growth factor (VEGF) pathway that causes abnormal growth of blood vessels behind the retina, carried out directly to the patient’s eye . RNAi techniques have been used against the spread of tumor growth and increasing its sensitization toward drug treatment, RNAi technology will be beneficial to selectively affect cancer cells without damaging normal cells as the RNAi therapy against cancer cells is used for directly targeting the oncogenes; and therefore, found to stop progression and invasion of the tumor cells [18, 19] and also increase the sensitization of tumor against drug, as mentioned earlier . As RNAi can silence disease-associated genes in tissue culture and animal models, the development of RNAi-based reagents for therapeutic applications involves technological enhancements that improve siRNA stability and delivery in vivo , while minimizing off-target and nonspecific effects.
A number of different approaches have been developed for the in vivo delivery of siRNA, among which, rapid infusion by hydrodynamic injection of siRNA achieves the best delivery in rodents . However, this way, the delivery is restricted to highly vascularized tissues, such as the liver  and also, it is currently not a viable method for delivery in human clinical studies. Lipid-based in vivo applications have been devised , which have been used extensively for cell culture experiments, with some issues, like the cationic nature of the lipids used in cell culture leads to aggregation when used in animals and results in rapid serum clearance and lung accumulation. Even then, there are an increasing number of reports citing success with lipid-mediated delivery of siRNAs in vivo. To improve the delivery of siRNA into human liver cells  without transfection agents, lipophilic siRNAs were conjugated with derivatives of cholesterol, lithocholic acid, or lauric acid, where the lipid moieties were covalently linked to the 5′-ends of the RNAs using phosphoramidite chemistry . These could down-regulate the expression of a LacZ expression construct. By conjugating cholesterol to the 3′-end of the sense strand of siRNA by means of a pyrrolidine linker, the pharmacological properties of siRNA molecules was improved by Soutschek et al. . Advantages of cholesterol attachments are evident as being more resistant to nuclease degradation, more stable in the blood by increasing binding to human serum albumin and increased uptake of siRNA molecules by the liver. Intravascular delivery of siRNA molecules is a very simple technique, which was used to protect mice from fulminant hepatitis using siRNAs against Fas receptors by Song et al. , who administer Fas siRNA by intravenous injection into mice over a 24-hour period. The authors could show the persist effects for 10 days and protected mice against experimentally induced liver fibrosis. Local delivery of siRNAs have also been tried into the eye to target the VEGF pathway and shown that it could be therapeutically beneficial in neovascularization-related eye diseases. SiRNA topical gels have also been used to deliver them to cells in dermatological applications and cervical cancer treatment . Gene gun method was used for an intradermal administration of nucleic acids to enhance cancer vaccine potency . The other technique is an electroporation, which has been used to deliver siRNAs into the brain  and muscles of rodents. Injecting viral vectors for the in vivo delivery of siRNA directly have been tried, where an adeno-associated virus (AAV) associated shRNA vector injected directly into the midbrain neurons of adult mice to silence of the tyrosine hydroxylase gene near the site of injection for several weeks. However, there exist an alternative to injection, called as an ex vivo approach to generate human immunodeficiency virus (HIV)-1-resistant lymphocytes and macrophages . It was accomplished through using a lentiviral vector, an anti-rev siRNA construct into CD34(+) hematopoietic progenitor cells. The siRNA-transduced progenitor cells were allowed to mature into macrophages in vitro and T-cells in vivo, .
Many machine learning, bio-statistical  and association rule mining methods  are available that have been developed to solve different problems related to gene silencing and disease discovery. In this book chapter, we provided a comprehensive survey of different machine learning and association rule mining algorithms developed for tackling various biological challenges such as detection of gene signature, biomarker, gene module, potentially disordered protein detection, differentially methylated region, multi-omics data integration, etc. We also described a comparative study of different well-known classifiers along with other used methods for the study. Meanwhile, many gene module discovery based approaches are also developed that employs several machine learning, deep learning and soft computing approaches. In addition, many multi-objective algorithms are also developed to find optimal multi-omics genetic signatures for the respective disease. Furthermore, we demonstrated the brief biological information regarding the immense biological challenges for gene activation and their advantages, disadvantages and possible therapeutic strategies. There are certain challenges exist, such as off-target effects, cytotoxicity, need for efficient delivery methods, their clinical implementation need efficient delivery vehicles and siRNA activity, itself, non-specific gene silencing, activation of innate immune system, the lack of efficient in vivo delivery systems still remain to be handled. Apart from these challenges, the development of efficient tissue-specific and differentiation dependent expression of siRNA is essential for transgenic and therapeutic approaches. However, there are successful in vitro and in vivo experiments for raising hopes in treating human diseases with RNAi . Moreover, our study is useful for the researchers to understand the central idea about RNAi and gene silencing, along with the current machine/deep learning and association rule mining algorithms related to these (Figure 1).
2. Fundamental concepts
In this section, some basic symbols of the graph mining, pattern recognition,  and information theory are described. A graph is an ordered pair G = (V, E) comprising of a set of vertices denoted as V and a set of edges denoted as E. To avoid ambiguity, the graph is described here precisely as undirected and simple. Let, be an unweighted as well as undirected graph, and be a (hypograph) of it, (). Further, suppose, the density of , denoted by , be defined as , where depicts the induced edge-set of , and refers to the cardinality of . Suppose, the highest density of the graph , referred to as , is illustrated as follows: . Now, if is a weighted graph, will be , where symbolizes the induced edge-set of , and denotes the weight of the edge . Entropy of a random variable evaluates the amount of uncertainty corresponding to the variable . The entropy of a discrete variable , referred to as , is defined in the following: , where refers to the probability mass function of , and the value of , in general, is considered as 2. Mutual information  between two random variables estimates the quantity of information that they combinedly share, i.e., the mutual dependency between them. When mutual information is zero, this signifies that these two variables are entirely independent to each other; whereas when mutual information is higher, it signifies that these two variables are extremely dependent on each other.
Topological Overlap Measure (TOM) and other related measures: Ravasz et al.  proposed a new measure Topological Overlap Measure (TOM) that provided the similarity between two nodes belonging to a network depending upon nearest neighbor concept. Furthermore, various modified versions of TOM such as weighted TOM (wTOM) , generalized TOM (GTOM)  are present in the literature. In the course of computing the wTOM, Pearson correlation coefficient scores are first evaluated for all pairs of vertices, and then a soft thresholding power (say, ) is utilized from the correlation coefficient matrix through scale free topology. After that, weighted adjacency matrix is calculated using the coefficient matrix using the calculated power . Then wTOM is computed from the weighted adjacency matrix. In the same way, the GTOM can also be defined just like TOM except it counts the number of -step neighbors while calculating TOM measure between two vertices. Now, for calculating GTOM of order 0 (i.e., GTOM0), the adjacency score becomes the score of GTOM0. But, for determining the GTOM with higher order than zero (i.e., GTOM1, GTOM2, GTOM3,…), it follows the same procedure of TOM calculation, but counts up to -th neighbors for each vertex (). Notably, GTOM1, GTOM2 and other higher order GTOM work only on binary matrix. So, before using those measures, the weighted adjacency matrix is translated into binary matrix in which the greater adjacency value than a specified cutoff (e.g., 70% score of the distance between the minimum and maximum adjacency values is converted into 1, and the lower value than the cutoff is transferred into 0.
In data mining, hierarchical clustering is one of the most popular cluster analyses in forming a hierarchy of clusters. There exist two types of strategies: agglomerative and divisive . As is already known, agglomerative hierarchical clustering does not need any input parameters except the similarity matrix. Thus, there is no extra burden of utilizing cluster initialization as it simply merges two closest clusters at each iteration and continues till a singleton cluster is found. Divisive hierarchical clustering also follows the same style but in a reverse order. This is the major benefit of performing hierarchical clustering over the traditional K-means clustering algorithm, which is sensitive to initialization.
Association rule mining (ARM)  is a popular method for generating interesting relationships among different items (viz., genes). Suppose, be a item set (gene set) and be sample set (viz., transaction set). Therefore, an association rule can be stated as , where and . Notably, and symbolize as antecedent and consequent, respectively. An association rule can be described as the cause-effect relationships of the corresponding item sets in the transactions of a transactional data-profile in a big shopping market. A set of bought items may fall into a transaction. In a similar fashion, many genes may occur together in a sample (transaction) of a gene expression profile or similar profile. Many of these genes may be up-regulated or down-regulated, whereas the remaining genes will be non-differentially expressed.
3. Machine learning and rule mining approaches for gene inactivation
Currently, omics data analysis is one of the widely popular research domains. It can be categorized into two major types, single-omics data analysis, and multi-omics data analysis. In earlier, single-omics data processing such as gene expression data processing was highly popular. In those days, basically microarray gene expression data was popular. Now, the microarray data becomes obsolete while RNAseq, next-generation sequencing (NGS) and whole exome sequencing (WES) data become popular. However, the major aim of the single omics data analysis was to identify genetic marker as well as gene module identification. In current era, multi-omics data integration is now a big challenge to any researcher since it consists of various kind of profiles that are either proportional or inversely proportional to each other. Different kinds of regression analysis (logistic regression, sglasso [47, 48], flasso , etc.) are popular to integrate the multi-omics data. In case of the multi-omics data, the aim is to determine either single (or, combinatorial) gene marker, or gene signature, or multi-biomolecular closed bio-circuit. There are many machine learning and association rule mining methods available that have been developed to solve different problems related to gene silencing and disease discovery (Table 1 for tools and Table 2 for their application). For this regard, Bandyopadhyay et al. provided a comprehensive survey of various statistical tests for determining differentially expressed transcripts from microarray or other related datasets . Then a rank based weighted association rule mining, RANWAR is developed to identify weighted interesting genomic rules applicable to any kind of genomic or epigenomic data . A new technique of gene-based association rule mining approach was developed in . Next, another statistics-based association rule mining technique “StatBicRM” had been proposed that utilized statistical test and Binary Inclusion maximal algorithm (BiMax) to find classification-based genetic rules . Reverently, further enhancement of “StatBicRM” algorithm was performed and a new method of combinatorial marker discovery had been developed whose central concept was based upon the inverse relationship between the gene expression and methylation pattern . In addition, mutual information based feature selection strategy had been incorporated into the statistical methodology, and a new method of identifying epigenetic biomarkers through maximal relevance and minimal redundancy based feature (gene) selection method from bi-omics dataset was proposed . A new method of identifying multi-view gene-module identification was also proposed that applied the integrated methodology of statistical method and dense subgraph mining . Detection of strongly connected genetic modules in multi-omics regulatory networks is an important study for the integrated study analysis of the network-based architecture. Many profiles belonging to the multi-omics datasets basically consist of a massive amount of genes, many of them are noisy and redundant. Such kind of noisy and redundant genes (or, features) are irrelevant while obtaining knowledge from the data. Furthermore, it is computationally absurd to utilize any clustering technique on such type of huge sized data profiles to get the dense genetic clusters. In many times, researchers face problems while calculating and subsequently accumulating the similarity matrix of such massive dimensions consisting of all the mutual dependency information between all the possible gene-pairs equivalent to every such profile. So, managing the high dimensionality of the underlying profile is a critical challenge to the researchers. To overcome the “curse of dimensionality” problem, the job of feature selection is basically treated as one of the most important preprocessing works to remove such noisy and redundant genes, which in turn decreases the total elapsed time. The main purpose of the feature selection is to find an optimal subset of features depending on some optimization conditions by which efficient knowledge discovery can be performed . Depending on the availability of the class labels, the feature selection process can be organized into two types: supervised and unsupervised . Unsupervised feature selection does not need the class label information while choosing the minimized feature subset , whereas supervised feature selection selects a subset of favorable features by utilizing the knowledge of class labels into the feature selection procedure. In the case of supervised feature selection, significant test , mutual information , are some broadly used measures to evaluate the excellence of the candidate features. In the territory of biological rematches, a statistical test is generally treated as one of the important tools for obtaining the significant genes for the big sized datasets, and therefore aids in decreasing the size of the dataset. There are different types of statistical tests such as t-test, significant analysis of microarrays, empirical Bayes test, etc. in the literature.
|Method name||Reference||Type||Brief description|
|Multi-view gene modules using hypograph mining||Bhadra et al. ||Gene-module detection||Module detection from multi-view data using the statistical test and mutual information based dense subgraph.|
|RANWAR||Mallik et al. ||Rank based genomic rule mining||Rank based weighted association rule mining to identify interesting genomic rules applicable to any genomic/epigenomic data.|
|Combinatorial marker discovery by integrating multiple profiles||Bandyopadhyay et al. ||Combinatorial marker discovery||Integrating gene expression and methylation profiles, and identifying combinatorial gene markers.|
|DTFP-growth||Mallik et al. ||Gene based ARM||Multiple-threshold based ARM integrating gene expression, methylation and protein-protein interaction profiles.|
|StatBicRM||Maulik et al. ||Statistical biclustering-based rule mining||Statistical biclustering-based rule mining and analyzing the gene expression and methylation data profiles using it.|
|sglasso||Augugliaro [47, 48]||Regression method||Sglasso tool develops the structured graphical lasso estimator for the weighted l1-penalized RCON(V, E) model.|
|flasso||Augugliaro [47, 52]||Regression method||Implements the weight l1-penlized factorial dynamic Gaussian graphical model.|
|MVDA||Serra et al. ||Multi-view genomic profile integration||Works to conjoin the those kinds of data at the levels of the outcomes of every single view clustering iteration.|
|Machine learning for epigenetics and future medical applications||Holder et al. ||Machine learning and deep learning approaches||Active learning and imbalanced class learning are utilized to solve the shortcoming with machine learning for building better feature selection and solving the imbalance data problem.|
|A machine learning approach to integrate big data for precision medicine||Lee et al. ||Molecular marker discovery||The robust molecular markers that might be useful for targeted treatment of the acute myeloid leukemia are identified.|
|Deep learning based multi-omics integration robustly predicts survival||Chaudhary et al. ||Deep learning based multi-omics integration method||A deep learning method is used to integrate multi-omics data and to perform survival study on hepatocellular carcinoma.|
|Deep learning for genomics: a concise overview||Yue et al. ||Deep learning applications on genomic data||The strengths of various deep learning methodologies are demonstrated that are applicable on any kind of genomic profile.|
|intNMF||Chalise and Fridley ||Integrative clustering method||Integrative clustering of several high dimensional profiles and subtype classification by non-negative matrix factorization (NMF).|
|Multi-modal data analysis for heterogeneous data||Yang and Michailidis ||Module detection for heterogeneous data||The multi-modal profile analysis is conducted for heterogeneous data depending upon NMF.|
|Comparative study and evaluation of the integrative techniques for the multilevel omics data||Pucher et al. ||Integrative method for multilevel omics profiles||The comparative study of three integrative methods (viz., NMF, sparse canonical correlation analysis (sCCA) and logic data mining MicroArray Logic Analyzer (MALA)) is conducted on simulated data and real omics profile.|
|Mallik and Bandyopadhyay ||Weighted connectivity (similarity) measure||is developed integrating co-expression, co-methylation and protein-protein interactions, and useful for determining the similarity between any two molecules.|
|Tumor prediction using integrated analysis of expression and methylation||Mallik et al. ||Rule-based classifier||Integrated analysis of gene expression and DNA methylation and classification rule mining for tumor/cancer prediction.|
|Epigenetic gene marker discovery through feature selection||Mallik et al. ||Gene based ARM||Epigenetic gene marker discovery using maximal relevance and minimal redundancy based feature selection.|
|Method name||Reference||Type||Brief description|
|TF-MiRNA-gene network based modules for cytosine variants||Sen et al. ||Module detection||TF-MiRNA-gene network based module detection for 5hmC and 5mC brain samples between human and rhesus.|
|IDPT||Mallik et al. ||Intrinsically disordered protein finding||Potential intrinsically disordered protein identification through transcriptomic analysis of genes for epigenetic data.|
|Integration of DNA methylation data and gene expression data||Singh et al. ||Finding differentially methylated regions||Differentially methylated regions are determined and further statistical analysis is performed.|
|Application of machine-learning algorithms for gene expression regulation||Cheng and Worzel ||Applications of machine learning methods on gene regulation||The machine learning strategies on gene regulation are reviewed, and their functional links mediated by histone modifications and transcription factors are demonstrated.|
|Application of machine-learning techniques on histone methylation||Xu et al. ||Predictive model of gene expression by epigenetic factors by regression||A new model is developed to predict the gene expression using the function of histone modification levels through multi-linear regression multivariate adaptive regression splines.|
The significant genes therefore provide a weighted graph in which the nodes refer to the significant genes and the weighted edges signify the association between the related two nodes. Recently, graph data can be obtained in different rising fields of studies for forming the complicated structures viz., biological networks, chemical compounds, social networks, protein structures, etc. With the increasing stipulate on the analysis of large sized structured data, graph mining has become one of the most demanding topics of research for identifying the critical relationships among various entities included in the large graphs . In the recent era, analyzing multi-omics dataset is one of the emerging topics of research where different profiles denoting several directions are applied to carry out different important tasks viz., marker determination, classification, and clustering. For this regard, many research works have been performed in the following directions viz., marker identification , classification , clustering , etc. Recently, Bhadra et al.  have developed a new algorithm handling an integrated study comprising of statistical method and normalized mutual information oriented hypo-graph mining to find the multi-omics co-similar genetic modules present in multi-omics datasets. Formerly, various statistical (viz., correlation, regression oriented) and/or weight-based techniques (viz., ) are matured for multi-omics data integration, but not for multi-omics genetic-module detecting. Furthermore, some multi-view data integration mechanism employs various soft-computing methods such as clustering, non-matrix factorization, etc. Recently, Serra et al.  proposed a framework for combining different data profiles of multi-view datasets by integrating several clustering results done on each profile through non-matrix factorization. Pucher et al.  provided a comprehensive review and comparative study of the three integrative methods (viz., non-negative matrix factorization (NMF), sparse canonical correlation analysis (sCCA) and logic data mining MicroArray Logic Analyzer (MALA)) on simulated data as well as real omics profile. In addition, there are many deep learning techniques that were also developed to handle biological data. Chaudhary et al.  proposed a deep learning based methodology to integrate multi-omics data and robustly perform survival study on hepatocellular carcinoma. Furthermore, there are many interesting applications of the above machine learning and deep learning techniques. For example, Xu et al.  developed a new model using the regression to predict the gene expression using the function of histone modifications/variants levels through the consecutive regression methods (viz., multi-linear regression as well as multivariate adaptive regression splines). Mallik et al.  performed a comprehensive analysis to identify potential intrinsically disordered proteins through the transcriptomic analysis of genes for the expression and methylation data. To find differentially methylated regions is also an area of interest. Comparison of different classifiers used in many tools related to RNAi and gene inactivation is described in Table 3.
|C4.5 classifier||K-nearest neighbors (KNN) classifier||Naive Bayes classifier||Support vector machines (SVM) classifier||Artificial neural networks (ANN) classifier|
4. Biological challenges for gene inactivation
There are certain challenges exist, such as off-target effects, cytotoxicity, need for efficient delivery methods, their clinical implementation need efficient delivery vehicles and siRNA activity, itself, non-specific gene silencing, activation of innate immune system, the lack of efficient in vivo delivery systems still remain to be handled . The effective delivery of RNAi therapeutics in vivo is one of the important challenge and have to consider several parameters for an efficient silencing, particle sizing, duration of the RNAi effect, its stability and modification, the delivery system and clearing off-target effects . Apart from these challenges, the development of efficient tissue-specific and differentiation dependent expression of siRNA is essential for transgenic and therapeutic approaches. Bioactive drugs have been shown to perturb the naturally running system as these can clog/saturate the biochemical pathways. Since siRNA/shRNA relies on the endogenous microRNA machinery, thereby high doses of ectopic RNA have the risk of saturating all component of the miRNA pathway components. This was observed in the work by Grimm et al.  observed fatality association with high doses of liver-directed AAV-encoded shRNAs in mice, where high doses killed the recipient mice within 2 months. The length threshold of siRNAs seems to vary among cell types and it is an important consideration as dsRNA would induce innate immune responses that would eventually lead to cell death in mammalian. However, dsRNA less than 30 nucleotides have been shown good enough for no induction of cellular toxicity in mammalian and longer dsRNA is known to rapidly induce interferon responses. This suggests the careful risk assessment strategies when using longer and more potent Dicer substrates siRNAs. Moreover, correct RNAi targets are must, though ideal specificity of RNAi targets has not been shown. However if RNAi is going to silence off-targets, it can alter the gene function, which is clearly undesirable, therefore, care should be taken before-hand not to suppress the off-targets. If one third of siRNA are chosen randomly that it results in a toxic phenotype . Comparison of siRNA and miRNA is described Table 4. However, there are successful in vitro and in vivo experiments for raising hopes in treating human disease with RNAi. The epigenetic network is one of the complex regulatory networks where epigenetic mechanisms such as DNA methylation and modifications to histone proteins regulate gene expression and high-order DNA structure . Epigenetics is basically a study of heritable changes in phenotypes where the DNA sequences are not changed anymore. DNA methylation  is an epigenetic factor that represents the inclusion of a methyl group (–CH3) to the fifth position of a cytosine pyrimidine ring or to the sixth nitrogen position of an adenine purine ring in genomic DNA. DNA methylation generally decreases belong to the gene expression level. In this connection, copy number variation ()  is another latest domain of research in genomics. It is basically an event where the repetition of different portions of the genome continuously happens, and an alteration on the number of repeats in the genome is recognized between individual to individual in the human population. Copy number variation is a category of structural changes, especially, it is a type of either duplication or deletion event which generally influences a reasonable number of base pairs. It has been realized from recent researches that around two-thirds of the total human genome is made up of repeats. In the case of mammals, copy number alteration provides a significant contribution on producing the necessary deviation in both the population and disease phenotype. Cancer forms by various types of somatic genetic changes including copy-number alternations which affect the activity of the critical genes regulating the growth of the cell. Disadvantages and advantages of RNAi, and possible overcome strategies are demonstrated briefly in Table 5.
|Disadvantages||Advantages and possible therapeutic strategies|
RNAi and gene inactivation are well-known research topics in the research of biomedical field. MiRNA and siRNA are closely associated with RNAi. Various categories of algorithms associated with RNAi and gene silencing have been developed in last 2 decades. In this book chapter, we provided a comprehensive review of various machine/deep learning as well as association rule mining algorithms that have been developed for handling different biological problems such as gene signature detection, multi-omics data integration, single/combinatorial biomarker identification, gene module detection, potentially disordered protein detection, differentially methylated region finding, and many more. Thereafter, a comparative study of several well-known classifiers along with other used approaches for the study has been included. In addition, we provided a brief biological description of the immense biological challenges for the gene activation along with their advantages, disadvantages and possible therapeutic strategies. Finally, this chapter helps the bioinformaticians to understand the central idea of RNAi and gene silencing along with their peripheral machine/deep learning and association rule mining algorithms for the benevolent of the disease discovery as well as possible therapeutic values.