Computational complexity of O(n^{2}) versus O(n log_{2} n).

## Abstract

Signal processing has long been dominated by the Fourier transform. However, there is an alternate transform that has gained popularity recently and that is the wavelet transform. The wavelet transform has a long history starting in 1910 when Alfred Haar created it as an alternative to the Fourier transform. In 1940 Norman Ricker created the first continuous wavelet and proposed the term wavelet. Work in the field has proceeded in fits and starts across many different disciplines, until the 1990’s when the discrete wavelet transform was developed by Ingrid Daubechies. While the Fourier transform creates a representation of the signal in the frequency domain, the wavelet transform creates a representation of the signal in both the time and frequency domain, thereby allowing efficient access of localized information about the signal.

### Keywords

- time-frequency analysis
- Fourier transform
- wavelet transform
- signal processing
- vanishing moment

## 1. Introduction

The Fourier transform has been the basis of digital signal processing since the development of the fast Fourier transform in 1965 by Cooley and Tukey in [1]. Its use for analysis goes back much farther with the development of the Fourier transform by Jean Baptiste Joseph Fourier in 1807 as a solution to thermodynamic equations. By using the Fourier transform, we can take any signal and obtain the amplitude of the sinusoids needed to recreate it. Then we can use this information to obtain the power spectrum of the signal, or we modify the amplitudes and take the inverse Fourier transform of the signal, which then filters the signal.

A fundamental limitation of the Fourier transform is that the all properties of a signal are global in scope. Information about local features of the signal, such as changes in frequency, becomes a global property of the signal in the frequency domain. There have been various methods proposed to address this limitation; the main two are the windowed Fourier transform and wavelets.

Gabor [2] created the windowed Fourier transform in 1946. It applies a window function of a short duration to the signal and the Fourier transform is applied to the resulting data. This method is frequently used; however, there are two limitations with this method. The first is that, since the filtering window is constant, it creates problems if the feature is larger or shorter than the window. The second is that the time resolution is the same for high frequencies as it is for low frequencies. Since as frequency increases, so does the rate of change of the signal, higher frequency signals can have more information in the same period of time as lower frequency signals, and so require a higher time resolution.

Wavelets overcome both these limitations in that the window is scaled in both time and frequency. The term wavelet was introduced by Ricker [3] in 1940 to describe the limited duration functions that he created to model seismic phenomena. The first wavelet was created earlier, in 1910, by Haar [4] as an alternative to the Fourier transform developed in 1807 by Fourier [5]. Work on the wavelet transform preceded slowly through the twentieth century until the 1980’s when work on them increased dramatically with the development of the continuous wavelet transform. In the 1990’s, the discrete wavelet transform and its inverse were developed, allowing filtering and compression of signals.

The wavelet transform has many more modes of operation and other options than the Fourier transform. This is one of the key problems with the use of wavelets; we can feel overwhelmed by all the options we have available with them. This chapter will go through some of these options and demonstrate their use.

## 2. Fourier transform

The Fourier Transform was first published in 1822 by Joseph Fourier [6]. It converts a mathematical function from the time domain to the frequency domain. This enables us to find new properties of the function that would otherwise be hidden. There are several different variations of the Fourier transform equation. In this chapter, we are using the traditional electrical engineering equation

to convert f(t) to the frequency domain.

The Fourier transform itself is for continuous functions. The Discrete Fourier transform was developed for astronomical observations. The goal was to calculate a trigonometric equation for the orbit of an object in the sky based on observations of its ascensions and declinations at various points in time. Most datasets consist of discrete points sampled in time. These can be converted to the frequency domain as well with the discrete Fourier transform. The computational complexity of this is O(n^{2}).

An interesting note about the Fast Fourier Transform is that it actually predates the Fourier Transform. While the Fast Fourier Transform that we now use was published in a paper by Cooley and Tukey [1] in 1967, a functionally equivalent algorithm was found in an unpublished work by Carl Friedrich Gauss [7] that is presumed to date to 1805. A fascinating history of the Fast Fourier Transform is in [8]. Gauss was computing the discrete Fourier transform of 12 points and noted that the problem could be broken down into subproblems that could simplify the number of steps used [5].

The Fast Fourier Transform reduces the computational complexity of the Discrete Fourier Transform from O(n^{2}) to O(n log_{2} n). This enables efficient computation of time series. Table 1 shows how the computational complexity increases for an O(n^{2}) process versus an O(n log_{2} n) process. The difference grows between the two processes until at 1 million data points, the discrete Fourier transform would require over 50,000 times the amount of time that the Fast Fourier transform would require.

n | O(n^{2}) | O(n log_{2} n) | Ratio |
---|---|---|---|

10 | 100 | 34 | 2.94 |

100 | 10,000 | 665 | 15.04 |

1000 | 1,000,000 | 9966 | 100.34 |

10,000 | 100,000,000 | 132,878 | 752.57 |

100,000 | 10,000,000,000 | 1,660,965 | 6020.60 |

1,000,000 | 1E+12 | 1,9931,569 | 50171.66 |

The drawback with the Fourier transform is that all signal information is across the entire range of the transform. As stated in [9], “A local characteristic of the signal becomes a global characteristic of the transform”. As illustrated by other authors [10], the best way that this can be explained is by a score of music as shown in Figure 1.

The score consists of many different notes, each with a finite duration, each happening at a precise time. A Fourier transform of this signal gives you the average amplitude of the individual frequencies over the entire piece, but obscures the duration and location of the notes. The Fourier power spectrum of music often approximates that of pink (1/f) noise [12]. That information is not lost, since the Fourier transform is reversible, but is encoded in the phase of the Fourier transform.

## 3. Windowed Fourier transform

In 1946, Gabor [2] proposed the windowed Fourier transform as a way to deal with this problem. In it, a window function of a short duration is applied to the signal and the Fourier transform is taken. This is repeated at different locations in the signal. An example of the use of Hamming window function is shown in Figure 2.

One limitation of the windowed Fourier transform is that the window length is constant. When a signal feature is much shorter than the window, information about it can be difficult to extract, since the any local property within the time span of the window becomes a global property of the Fourier transform of the window, as noted previously. Conversely, when a signal feature is larger than the windowing function, information about it spans multiple windows, and can also be difficult to extract.

Another limitation is that the time resolution for the windowed Fourier transform is the same for high frequency signals as it is for low frequency signals. The Heisenberg uncertainty principle states that the time resolution of the window is inversely proportional to the frequency resolution. Since a high frequency signal changes much faster than a low frequency signal, it would be ideal to have a transform with better time resolution for high frequency portions of the signal, and better frequency resolution for lower frequency portions of the signal.

Going back to the music score, we can see this by looking at two different notes, Middle C, and one that is two octaves higher, called C_{6}, as shown in Figure 3.

The frequency for middle C is 261.63 Hertz, and the frequency for C_{6} is 1046.50 [13]. The frequency for C_{6} is quadruple the one for Middle C, which means that for every complete cycle of the middle C note, four complete cycles of C_{6} have occurred, as shown in Figure 4. The windowed Fourier transform would have the limitation that both notes would be treated equally, when the time resolution for C_{6} needs to be 4 times that of middle C for analysis purposes.

## 4. Wavelet transform

The wavelet transform overcomes the limitation of the windowed Fourier transform by scaling the bandwidth of the filter inversely to the frequency. According to [14], while each box of the windowed Fourier transform has the same bandwidth, each level of the wavelet transform has the same Q as defined as

This gives the transform the desired time resolution for the higher frequency portions of the signal and the desired frequency resolution for lower frequency portions.

## 5. Continuous wavelet

The continuous wavelet has a long history spanning from the 1940’s to present. In 1940, Norman Ricker first proposed the term wavelet and various mathematical functions to model seismic waves as they traveled through the Earth’s crust in [3]. He further refined this in a series of papers [15, 16, 17]. This was the first continuous wavelet. The functions in the time domain are given by

called the three-loop equation, and

called the two-loop equation [18]. Graphs for both of these are in Figure 5.

The next development for continuous wavelets was in the 1980’s by Grossman and Morlet, and expanded on by Stephen Mallat and others [19]. The term continuous wavelet refers to the fact that it can be scaled to any time scale. Discrete wavelets can only use specific time scales, usually a power of 2.

Wavelet analysis centers around the use of a wavelet function, also called the mother function in literature, traditionally represented by the Greek letter upsilon (ψ). A key requirement is that it has finite energy, i.e.

The energy of the wavelet function is usually one. Functions such as sine and cosine cannot be used as analyzing functions, because they violate this condition by having infinite energy. There is an implicit requirement that, while it has finite energy, it must have some energy, so the integration of the function must be greater than zero.

The second requirement is known as the admissibility condition, which states that the Fourier transform of the wavelet function cannot have a zero-frequency component, i.e.

This can only be satisfied if

A third condition is usually that the wavelet function must have zero mean, which means that it must oscillate, hence be a wavelet. Mathematically this is [20]

Another condition is that the wavelet function has effective support. While the wavelet functions for the continuous wavelets are usually mathematical functions that extend to infinity, effective support means that the wavelet functions are effectively zero outside of a certain range. Since the continuous wavelet functions asymptotically approach 0 as x goes to either ∞ or -∞, the choice of the boundary of this range is a bit arbitrary and can vary from paper to paper.

## 6. Continuous wavelet transform

Morlet and Grossman formalized the continuous wavelet transform in 1984 in [21]. For the continuous wavelet transform, the wavelet function itself is shifted in time and is scaled to do the wavelet transforms [22] as the following equation illustrates:

The continuous wavelet transform is defined as the integration of the function to be analyzed with the complex conjugate of the wavelet function:

In some papers such as [22], you will see the definition of the continuous wavelet transform without the complex conjugate definition. Since most wavelet functions are real valued and not complex, both definitions are equivalent, since the complex conjugate of a real number is equal to that number. The difference only comes up when the wavelet function is complex, such as the Gabor wavelet.

An alternate formula for the continuous wavelet transform is

where W_{n}(s) is the transformed sequence, x_{n′} is the original sequence, and ψ^{*} is the complex conjugate of the analyzing wavelet function, n represents the time shift or dilation, and s represents the scale. Usually the time shift is calculated over the total number of data points of the function, and s goes over the scales that are being analyzed to give a two-dimensional picture of the data [23].

## 7. Discrete wavelets

The first discrete wavelet was created in 1910 by Alfred Haar as an alternative to the Fourier transform. This consists of two functions as shown in Figure 6, one a scaling function and a wavelet function. The scaling function is the unit step function and the wavelet function consists of offsets from that.

One of the drawbacks of the continuous wavelet transform is that it creates a lot of redundant data, since the coefficients between the scales are highly correlated. Ingrid Daubechies developed the theory of discrete wavelets in 1988, which generates compact data by eliminating the redundancy. Daubechies created an entire family of wavelet functions with the Haar wavelet forming the first level of the Daubechies wavelet.

The wavelet function for discrete wavelets is modified to

where s_{0} is the scale of the wavelet, usually 2 [20]. This condition as well as the condition that j and k are integers restricts the wavelet to only certain scales. The wavelet function has the properties of finite energy, oscillation, and the admissibility condition of the continuous wavelets, as well as the properties of compact support, vanishing moments, and orthogonality.

Compact support means that the wavelet function is defined by a series of coefficients over a finite region, and is zero at all other places. This contrasts with the continuous wavelets, which, as mentioned, are mathematical functions and have effective support in which the function continues to infinity, but is effectively zero outside of a finite range.

Vanishing moments are obtained when the following condition defined mathematically as

holds true for all integers 0 ≤ k < N, where N is the number of vanishing moments of the function [24]. This property is useful for analyzing functions that have an additive polynomial trend function given by

Here, g(x) is the function to be analyzed and N(x) is the polynomial trend function (also termed a nuisance function in Economics).

The orthogonality condition removes the redundancy of the continuous wavelet transform. As stated earlier, the discrete wavelet transform can only be used at certain scales, most often a power of 2. Mathematically it is stated as

An orthogonal basis ensures that the signal is represented in the most compact way possible. However, by removing all the redundant information, this also removes information to handle shift variance. The exact same function sampled at two different places can yield very different results. In order to deal with this, some discrete wavelet transforms retain some of this redundant information.

Each wavelet of the discrete wavelet family consists of two functions, a wavelet function (ψ), as in the continuous wavelet families, and also a new function called a scaling function (ϕ). In literature, these are termed the mother and father functions respectively. The scaling function has its own admissibility condition, which ensures that it has the zero-frequency component that the wavelet function does not:

This is necessary so that a discrete wavelet transform terminates in a finite number of steps and can completely regenerate the information in the signal [20]. Otherwise, the zero-frequency component could never be captured, since no amount of scaling value can cause the wavelet filter to have a zero-frequency component.

In addition, as specified in [25] the scaling equation is defined in terms of a finite set of coefficients p_{k} that are defined by the following equation

that adheres to the following conditions as specified in [25] as well:

The wavelet function is defined by

where l is the length of the set of coefficients, so that the wavelet coefficients are basically the scaling coefficients in reverse order with alternating signs. These coefficients are used to implement the discrete wavelet transform as a filter bank of Finite Impulse Response (FIR) filters. Graph of the scaling and wavelet functions for Daubechies level 2 wavelet are shown in Figure 7 and the frequency response is shown in Figure 8. As with the Haar wavelet, the wavelet function is a high pass filter and the scaling function is a low pass filter. Both are symmetric around π/2.

Different papers and software implementations have different coefficients for the Haar and Daubechies wavelet, depending on how they are normalized and whether the scale parameter from Eq. (8) is included is included in the filter. The coefficients for the Haar and the Daubechies level 2 wavelet are in Tables 2 and 3 with b defined by the implementation. Mathematica uses 2 for b, which would normalize the sum of the coefficients to 1. PyWavelets uses

Scaling coefficients | Wavelet coefficients | ||
---|---|---|---|

c_{0} | 1/b | d_{0} | 1/b |

c_{1} | 1/b | d_{1} | −1/b |

Scaling coefficients | Wavelet coefficients | ||
---|---|---|---|

c_{0} | d_{0} | ||

c_{1} | d_{1} | ||

c_{2} | d_{2} | ||

c_{3} | d_{3} |

## 8. Discrete wavelet transform

The class of discrete wavelet functions has many transforms available with the discrete wavelet transform in Figure 9 the most common. Since this was the transform introduced with the Haar wavelet, it is sometimes referred to as the Haar transform [26] as well as the decimated wavelet transform [10]. Essentially, it works as a pyramid algorithm, where the number of coefficients of each lower level is roughly twice that of the preceding level, but each coefficient is influenced by half as much of the data set as the preceding level. Each level has two sets of coefficients, one is called coarse and the other is called details.

In Figure 9, g is the scaling filter defined by the set of scaling filter coefficients and h is the wavelet filter defined by the set of wavelet filter coefficients. At each level, the detail coefficients (W) are outputs, except for the final level, where the coarse coefficients (V) are given as outputs as well. Collectively, this set of coefficients contains enough information to reconstruct the signal perfectly.

One key part of the discrete wavelet transform is the down sampling operator, which is a function that removes every other position from a sequence. An example would be the sequence {a, b, c, d, e, f, g, h} would be {a, c, e, g} or {b, d, f, h}, after the down sampling operator is applied, depending on whether the even or the odd positions are eliminated. Both are valid, however, by convention with the discrete wavelet transform, the even positions are eliminated, leaving only the odd positions. The down sampling operator is what makes the discrete wavelet transform a pyramid function and also reduces the set of coefficients to the minimum amount necessary to reconstruct the signal.

A problem with the decimation operator is aliasing. This is when different sequences map to the same sequence after the application of the operator. An example would be that the sequences {a, b, c, d, e, f, g} and {a, h, c, i, e, j, g} would both map to the sequence {a, c, d, g}. Therefore, just given the sequence {a, c, d, g}, it would be impossible to reconstruct the original. The filters of the discrete wavelets are designed to compensate for this, ensuring that the original sequence can be recovered. The combination of these filters with the down sampling operator is referred to as decimation.

The discrete wavelet transform also has an inverse transform. This process combines as described in Figure 10 to form a perfect reconstruction of the signal, where

Implementing the discrete wavelet transform as a finite impulse response filter and using decimation gives it a computational complexity of O(n). As Table 4 shows, an O(n) process can be much faster than an O(n log_{2} n) process such as the fast Fourier Transform. At 1 million samples, an O(n) process requires almost 20 times less operations than an O(n log_{2} n) process (Table 4).

n | O(n log_{2} n) | O(n) | Ratio |
---|---|---|---|

10 | 34 | 10 | 3.40 |

100 | 665 | 100 | 6.65 |

1000 | 9966 | 1000 | 9.97 |

10,000 | 132,878 | 10,000 | 13.29 |

100,000 | 1,660,965 | 100,000 | 16.61 |

1,000,000 | 19,931,569 | 1,000,000 | 19.93 |

## 9. Stationary wavelet transform

Another wavelet transform for discrete wavelet functions is the stationary wavelet transform, also known as the undecimated discrete wavelet transform. Essentially the stationary wavelet transform is the discrete wavelet transform without the decimation operation for the data. Whereas the number of coefficients for each level is half that of the preceding level in the discrete wavelet transform, the number of coefficients is the same for each level in the stationary wavelet transform.

The procedure is diagrammed in Figure 11, where g_{n} is the set of the scaling filter coefficients and h_{n} is the set of the wavelet filter coefficients. The reason that the scaling filter and wavelet filter coefficients are different for each level is that instead of the decimation operator being applied to the wavelet data coefficients after each level, the upsampling operator is applied to the wavelet and scaling filter coefficients. The wavelet and scaling coefficients for each level are upsampled from the previous level, as shown in Figure 12.

Like the discrete wavelet transform, the stationary wavelet transform has an inverse transform, as shown in Figure 13. The difference between this and the inverse discrete wavelet transform is the absence of the upsampling operator. As with the stationary wavelet transform, the filter coefficients for the inverse stationary wavelet transform are changed instead of the data. In this case, the filters are down sampled. The retention of redundant data in the stationary wavelet transform helps to make it translation invariant, which is useful for filtering applications (Figure 13).

Since the decimation step is not used, the stationary wavelet transform has a computational complexity of O(n log_{2} n), the same as the Fast Fourier Transform. However, there is also memory complexity to consider. While the Fast Fourier Transform and the Discrete Wavelet Transform has an O(n) memory complexity, the stationary wavelet transform has an O(n log_{2} n) memory complexity. Therefore, the output will always be larger than the input.

## 10. Discrete wavelet packet transform

The two previous transforms applied the detail and the coarse filters to the data at each level. The output of the coarse filter is given as the input to the next level and the output of the detail filter at that level is included in the set of the outputs of the transform. In the final level, the output of both the detail and the coarse filters were included in the set of outputs of the transform; however, that is not the only possibility. The packet transform creates a binary tree where the detail and coarse filters are applied to each node, diagrammed in Figure 14. The output of the detail filter becomes one child and the output of the coarse filter becomes the other. This process is repeated until the final level is reached, creating a set of output coefficients where each set is identified by the sequence of filters applied to it.

## 11. Stationary wavelet packet transform

The stationary wavelet packet transform is yet another transform for discrete wavelet functions. Basically, it combines the stationary wavelet transform with the wavelet packet transform, as diagrammed in Figure 15. Instead of the decimation operator, the filters themselves are upsampled for each level. The transform creates a binary tree, as with the discrete wavelet packet transform, where both filters for each level are applied at each node. As with the wavelet packet transform, the output from the detail filter becomes one child and the output from the scaling filter becomes the other, and the process is repeated until the final level is reached. Each set of output coefficients are also identified by the sequence of filters applied to it, with the difference that since there is no decimation applied between levels, the number of each set of output coefficients is the same as the input data. This leads to the total number of output coefficients to be 2 times the number of levels multiplied by the length of the input data. Both the discrete wavelet packet transform and the stationary wavelet packet transform have inverse transforms.

The wavelet packet transform introduces many more possibilities for use, some of which are discussed here. Depending on the application, you can do different combinations of the scaling and wavelet filters. Computational complexity depends on the filter combinations selected. If it is taken to the maximum level with the maximum filter combinations, then the discrete wavelet packet transform has a complexity of O(n log_{2} n) and the stationary wavelet packet transform has a complexity of O(n^{2}).

## 12. Conclusions

The Fast Fourier Transform has been listed as one of the top algorithms of the 20th century [27]. Its development has been instrumental to digital signal processing. However, recently a new algorithm, the wavelet transform, has started to have a significant impact on digital signal processing. The wavelet transform improves on the Fourier Transform in that it can analyze a signal by time and frequency simultaneously, thereby easily recovering localized signal information. This is key to many applications, including fractal and multifractal analysis, compression, and filtering.

The wavelet transform introduces many possibilities for use and this chapter has only touched the surface of it. Different wavelets can be used and the transform itself can be customized to fit the application as shown with the wavelet packet transform. Future research will be to determine the proper combination of features for various applications. In addition, there are other possibilities, such as the lifting wavelet transform, which wasn’t covered in this chapter. Only orthogonal wavelets that use the same set of wavelets for the forward and inverse transform were covered in this chapter. Biorthogonal wavelets that use different wavelets for the forward and inverse transforms are also available.

The key to wavelet compression and filtering is the sparse signal representation generated by the wavelet transform. The wavelet transform can reduce a signal to minimal set of coefficients. Coefficients that are near zero can be rounded to zero, reducing the size of the signal. In addition, fractional parts of the coefficients can be rounded, also reducing signal size. One of the first uses of this was to compress fingerprints for the FBI [28]. As stated in [29], in the 1990’s the FBI had 25 million cards, each containing 10 fingerprints. Digitized, each card contained 10 megabytes of information, for a total of 250 terabytes. Using the two-dimensional discrete wavelet packet transform gives a compression ratio of 20 to 1, enabling the archive to be stored on approximately 12.5 terabytes, while still being able to search and match unknown fingerprints against the ones in the archive. The recently developed JPEG format at the time was based on using the discrete cosine transform on blocks of the image, which left unacceptable artifacts in the image.

JPEG 2000 was developed using the two-dimensional wavelet transform to be the successor to JPEG, although it hasn’t caught on. JPEG 2000 allows both lossy and lossless compression. It also doesn’t have the lossy artifact generation that the JPEG format has as mentioned previously. Both lossy and lossless compression use the discrete wavelet transform, the difference is that the lossless one uses a wavelet transform that is reversible, while the lossy one uses a wavelet transform that introduces quantization noise that making it irreversible.

Compressed sensing deals with the fact that we that we can obtain a vast amount of information and a lot of it can be discarded and still retain what is relevant. As stated in [24], “singularities and irregular structures often carry the most important information in signals.” This is due to the fact that they represent changes to one or more of the properties of the signal. An example of this would be the edges in an image. Compressed sensing removes the redundant, unnecessary information from a signal and analyzes the remaining part of the signal. This is an ideal application for the wavelet transform.

The discrete wavelet transform has been used for Iris recognition for biometric identification in patent US 2002O150281A1 [30]. After taking a picture of the eye, the iris is extracted from the image and then converted to polar coordinates. Using the discrete wavelet transform, the high frequency components are extracted, which are the detail coefficients as referenced in this paper. These form the characteristic vector that is used to identify an iris from the previously recorded data.

The wavelet transform can provide an efficient way to filter white noise from a signal. The procedure consists of applying one of the discrete wavelet transforms to the data and then executing a threshold algorithm that modifies the detail coefficients. After the coefficients are modified, then the inverse transform is applied; the resulting output is a representation of the signal with the noise component significantly reduced.

There are numerous packages available for experimenting with the wavelet transform. The discrete and stationary wavelet transforms are available in Mathematica, Maple, Matlab, R, and PyWavelets to name a few, with the wavelet packet transform available in Mathematica, Matlab, and PyWavelets.

The wavelet transform provides many possibilities for signal analysis depending on the application. A few potential applications were touched on here. The reader is encouraged to develop their own uses and applications for the wavelet transform.