Chapter 2 Spectral Analysis of Exons in DNA Signals

DNA is found in blood cells carrying nucleus. The DNA is isolated from blood through a series of different procedures including heat shock, thermal change and applications of dif‐ ferent chemicals etc. DNA sequence contains chromosomes which further contains genes over them. The genes have regions which could translate to protein and the regions which don’t perform any contribution in protein production. Both kinds of regions are made-up of nucleotides characterized as Adenine, Thymine, Cytosine and Guanine. The order of these nucleotides determines the traits, habits and livings of all species. Since with the exponential growth of biological data, there is an enormous amount of such data that needs to be trans‐ lated to protein. A successful translation would result in knowing important information about species.


Introduction
DNA is found in blood cells carrying nucleus.The DNA is isolated from blood through a series of different procedures including heat shock, thermal change and applications of different chemicals etc.DNA sequence contains chromosomes which further contains genes over them.The genes have regions which could translate to protein and the regions which don't perform any contribution in protein production.Both kinds of regions are made-up of nucleotides characterized as Adenine, Thymine, Cytosine and Guanine.The order of these nucleotides determines the traits, habits and livings of all species.Since with the exponential growth of biological data, there is an enormous amount of such data that needs to be translated to protein.A successful translation would result in knowing important information about species.
Comparative analysis of computational techniques employed over genetic datasets has given very interesting results.We are able to identify species from each other on the behalf of DNA properties.A true correct conversion takes to fruitful results.Literature has shown that direct comparative analysis is not as useful as approximate estimation.So far, there is no compact solution available that could outperform for a robust translation from DNA to RNA.
It is a common phenomenon that nucleotide sequences in DNA perform a period three property [3,11] due to codon composition and structure in the strand.This fundamental characteristic can be exploited to predict the codon regions that help in determination of RNA sequences in DNA.This finding is of immense importance as cell growth and function is determined by the type of protein the cell produces and helps in drug design and revealing genetic disorders as a result of mutation in structure of nucleotide bases (order in which they appear over chain).Many approaches have been proposed in literature that addresses this open optimization problem in computational biology.
Discrete Fourier Transforms [6, 7, and 8] normally result in spectral leakage that doesn't preview the optimal power spectral density estimation.On the other hand, the Short Time Fourier Transforms [2,4] minimize the leakage but are considered useful when we desire to have the frequency contents with location information.It can plot the components for time, amplitude and frequency of a genetic signal.
Digital Filters [5, 7, and 13] present the spectral contents of signal around the periodicity property of coding regions but don't specify the frequency time relationship with amplitude.[14] emphasized the classification of introns in two groups based on RNA secondary structure and self splicing ability in variant species using PCR.

Dosay-Akbulut
A. Parent et al., [15] describe the importance of coordination between transcription and RNA processing that carboxy-terminal domain of RNA polymerase II acts as a common link in both.
Al Wadi et al. [16] used wavelet transforms for forecasting volatility in experimental results.M. Hashemi et al. [17] provided Identification of Escherichia coli O157:H7 Isolated from Cattle Carcasses in Mashhad Abattoir by Multiplex PCR.
A. Ali et al. [18] have presented a Histopathological Study for development of a model for Tumor Lung Cancer Assessing Anti-neoplastic Effect of PMF in Rodents.J. Singh et al. [19] proposed a technique for Prediction of in vitro Drug Release Mechanisms from Extended Release Matrix Tablets.

Proposed approach
The proposed approach consists of a series of components that analyze the DNA signal and enhances the prediction accuracy of genic regions over DNA sequence.The major steps of proposed approach are, Indicator sequence The signal is decomposed employing the wavelet transforms of order three at level three 3 rd order wavelet decomposition The wavelet decomposition passes the signal into a series of low and high pass filters that decompose and synthesize the signal for reducing flicker noise (pink noise).
The signal is then convoluted with a window function (Kaiser Window) defined below, ( ) ( )

Kaiser window of length 351 bp
Spectral Analysis of Exons in DNA Signals http://dx.doi.org/10.5772/52763 Each section of the signal is traversed for calculation of absolute and power values.Each segment is plotted over the power spectral graph keeping the period three property maintained at each step.The exon boundaries appear as sharp peaks.The final discrimination measure depicts the degree of relevance in exon and introns.

Results and discussions
A specimen gene pattern S.cerevisiae chromosome III (AF099922) has been taken for experiments over proposed approach.The gene is passed through the series of steps defined, At processing stage, the dataset is passed through two kinds of filters.First filter refines the data and outputs a data file that purely contains nucleotide characters.Second filter operates on output file obtained from first filter application and generates a file that contains numeric data.This data is fed into central engine for further processing.
Figure 1 Shows dataset that contains nucleotide characters and some other characters.This is first necessary step because this input when fed into our engine will badly degrade the performance and brings false results.The EIIP indicator sequence transforms the nucleotides in numeric values as per its definition.A part of signal is described in Figure 3 below using EIIP indicator sequence as,  The complex indicator sequence is defined by replacing the nucleotide with 1, -1, iota andiota values.
Figure 5 shows a portion of gene AF099922 after application of complex indicator sequence.The complex indicator sequence transforms the sequence into four digital patterns with associated weights.It is worth mentioning that this indicator sequence provided close range estimation for nucleotides in the literature.
This signal is then passed through the steps of windowed STFT for exonic prediction spectral analysis.This helps to extend the length of the signal to a target length so that perfect analysis could be performed over the signal.
Figure 6 shows that signal has been extended to a desired length.The length of signal was 8000 patterns.The convolution method suggests that to perform a better approximation, the signal should be extended to 8192 patterns.The signal should be mapped employing Kaiser Window of length 351 base pairs.The previous power of two shows a numerical value 4096 which truncates the signal from its original length.Truncation phenomenon can degrade the results and may bring faulty approximation that would lead to differ from the standard range of exons.The digital signal passes through refinement stages.First, the sequence was obtained as a raw data which was purified to access only nucleotides bases without degrading factor.This is termed as an important process because any kind of unwanted characters may lead to different set of nucleotides values that would be away from actual results.
The digital signal under discussions contains 8000 base pairs.The same dataset was used extensively in literature by other researchers and it is being used as a bench mark.The spectral estimation graph reveals that it contains five exonic regions at different nucleotides ranges.Identification of these ranges close to standard range demands to denoise the signal and selection of an appropriate window function that could be used for perfect convolution.The standard convolution requires to multiply the signal with a portion of window function, this is the reason that signal was extended to a power of two to make it to desired length.Each frame of the signal is calculated numerically equal sized so that power spectral graph is uniform in all characteristics.
For discrete wavelet transforms of order three, the signal is decomposed and synthesized.These db3 performs the quick vanishing of coefficients for approximate and detail patterns.It is also important to note that histogram of frequency components present the redundancy of bases in the digital pattern.This repetition depends over the order of nucleotides in DNA sequence which defines the habits, traits and other characteristics of species.
Figure 10 shows the synthesized signal at level three.Like the original signal, the synthesized signal owns the same histogram characteristics.There are 8000 base pairs shown in the form of a digital pattern.Cumulative histogram of signal shows different range of weight values assigned to nucleotides base pairs.It can be seen that nucleotides with numerals higher than 0.25 have high frequency while those between 0.1 and less than 0.25 have lower frequency.The individual histogram also shows three separate characterizations of nucleotide weight values.The standard deviation has been found to be 0.09037, median of absolute deviation is 0.11 while mean absolute deviation is 0.07843.The maximum range is 0.375 while minimum range is 0.125 and the average range is depicted as 0.25.The synthesized signal shows the same histogram even after its decomposition.The synthesized signal is perfectly reconstructed by employing discrete wavelet transforms.The approximate and detail coefficients of signal are obtained in passing through a series of filters.These digital filters have been defined and constructed using Matlab.The decomposed signal is addition of approximate and detail coefficients at level three along with detail coefficients at level two and level one.
As for as, we decompose the signal, the components are loosely packed.Figure 13 shows the density estimation of approximate and detail coefficients.The density estimate of original signal shows the numerals for nucleotides present in the signal in digital format as a general.The approximate coefficients at level three presents a sharp peak at some 0.25 points.The signal remains uniform through the course except at another peak value ranging from 0.37 to the end of the signal.The density estimation for detail coefficients at level one shows the same sharp peak around 0.27 points.The same peak can be observed around 0.40 at level two.At level three, the phenomenon is same but the signal components are loosely packed than level two.At granular level, the components are more packed at level one than other levels.Figure 15 shows the approximate coefficients at level three.A sharp gradual change can be observed in a commutative histogram.The peaks are more pronounced at from point one to onwards.In another histogram, the peaks are not much visible around first 0.6 points, there is a sharp gradual increment in the bars reaching the maximum of 0.07 points then a gradual decrement is observed leading it to point one.The peaks are less pronounced after this point.The coefficients of approximation at level three show the signal as loosely packed components.
It can be observed that detail coefficient at level one are packed showing more concentration of nucleotides while detail coefficients at level two are loosely packed.The coefficients at level three are more significant than other levels, which represents that the signal is filtered for refinement.The signal was passed through a series of filters for the wavelet db3 which denoised the signal as a result of reconstruction of signal.It can be observed that detail coefficient at level one are packed showing more concentration of nucleotides while detail coefficients at level two are loosely packed.The coefficients at level three are more significant than other levels, which represents that the signal is filtered for refinement.The signal was passed through a series of filters for the wavelet db3 which denoised the signal as a result of reconstruction of signal.
Table 1 presents the nucleotide range for exons.Clear differences can be observed as a comparative analysis of various approaches.Binary and EIIP methods show a wide range difference compared with the standard range.Complex method results are better than the first two approaches.Digital filters behave accordingly.The proposed approach has more significant results than other prevailing approaches.
Conversion of target DNA stretch to a digital pattern employing an indicator sequence • Decomposition of signal using wavelet transforms • Calculations of approximate coefficients of signal at level three • Calculations of detail coefficients of signal • Density estimation of signal • Signal analysis for denosing • Depiction of original and synthesized signal at level three • Histogram estimations of signal • Signal extension to a desired length Digital Filters and Signal Processing • Shannon entropy calculation of signal • Magnitude and power estimation of signal • Calculation of discrimination measure for PSD analysis • Exon and intron boundaries' estimationAs an elaboration, the DNA sequence is passed through a filter that transforms it into a digital pattern.This phase is accomplished employing an indicator sequence with the following weights for nucleotides, Adenine (A) = X (A) = 0

Figure 2
Figure 2 represents a data glimpse that contains pure nucleotide characters.

Figure 3 .
Figure 3. Numeric translation of gene F56F11.5 (AF099922)The binary indicator sequence is formed by replacing the individual nucleotides with values either 0 or 1. 1 stands for presence and 0 for absence of a particular nucleotide in specified location in DNA signal,

Figure 5 .
Figure 5. Complex indicator sequence applied to gene

Figure 7
Figure 7 depicts the wavelet sketch for db3 wavelet.Scaling and wavelet functions have been described.Decomposition of low pass filter and high pass filters have been identified, similarly signal synthesis for low and high pass filters have been shown.This sketch demonstrates that signal should be passed through these defined filters to further analyze it for denoising and enhancement.The upward and downward curves self explains the convolution of signal with the window function at desired location of nucleotides.

Figure 9
Figure9shows a glimpse of original signal.There are 8000 base pairs shown in the form of a digital pattern.Cumulative histogram of signal shows different range of weight values assigned to nucleotides base pairs.It can be seen that nucleotides with numerals higher than 0.25 have high frequency while those between 0.1 and less than 0.25 have lower frequency.The individual histogram also shows three separate characterizations of nucleotide weight values.The standard deviation has been found to be 0.09037, median of absolute deviation is 0.11 while mean absolute deviation is 0.07843.The maximum range is 0.375 while minimum range is 0.125 and the average range is depicted as 0.25.

Figure 11
Figure 11 depicts the signal decomposition into approximate and detail coefficients.Symbol s represents the original signal.Approximate and detail coefficient at level three show the reduced complexity in the signal.

Figure 13 .
Figure 13.Density estimation of signal

Figure 16
Figure16shows the detail coefficients at level three.A sharp gradual change can be observed in a commutative histogram.The peaks are more pronounced at from point one to onwards.In another histogram, the peaks are not much visible around first 0.6 points, there is a sharp gradual increment in the bars reaching the maximum of 0.07 points then a gradual decrement is observed leading it to point one.The peaks are less pronounced after this point.The coefficients of detail at level three show the signal as loosely packed components.

Figure 16 .
Figure 16.Coefficients of detail at level 3