Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

A time series is “a sequence X = (x_{1}, x_{2,}…, x_{m}) of observed data over time”, where m is the number of observations. Tracking the behavior of a specific phenomenon/data in time can produce important information. A large variety of real world applications, such as meteorology, geophysics and astrophysics, collect observations that can be represented as time series.

A collection of time series can be defined as a Time Series Database (TSDB). Given a TSDB, most of time series mining efforts are made for the similarity matching problem. Time series data mining can be exploited from research areas dealing with signals, such as image processing. For example, image data can be converted to time series: from image color histograms (Fig. 2), where image matching can be applied, to object perimeters for the characterization of shapes [39].

Time series are essentially high-dimensional data [17]. Mining high-dimensional involves addressing a range of challenges, among them: i) the curse of dimensionality [1], and ii) the meaningfulness of the similarity measure in the high-dimensional space. An important task to enhance performances on time series is the reduction of their dimensionality, that must preserve the main characteristics, and reflects the original (dis)similarity of such data (this effect will be referred to as lowerbounding [11]). When treating time series, the similarity between two sequences of the same length can be calculated by summing the ordered point-to-point distance between them (Fig. 3). In this sense, the most used distance function is the Euclidean Distance [13], corresponding to the second degree of general L_{p}-norm [41]. This distance measure is cataloged as a metric distance function, since it obeys to the three fundamentals metric properties: non-negativity, symmetry and triangle inequality [29]. In most cases, a metric function is desired, because the triangle inequality can then be used to prune the index during search, allowing speed-up execution for exact matching [28]. In every way, Euclidean distance and its variants present several drawbacks, that make inappropriate their use in certain applications. For these reasons, other distance measure techniques were proposed to give more robustness to the similarity computation. In this sense it is required to cite also the well known Dynamic Time Warping (DTW) [22], that makes distance comparisons less sensitive to signal transformations as shifting, uniform amplitude scaling or uniform time scaling. In the literature there exist other distance measures that overcome signal transformation problems, such as the Landmarks similarity, which does not follow traditional similarity models that rely on point-wise Euclidean distance [31] but, in correspondence of human intuition and episodic memory, relies on similarity of those points (times, events) of “greatest importance” (for example local maxima, local minima, inflection points).

In conjunction to this branch of research, a wide range of techniques for dimensionality reduction was proposed. Among them we will treat only representations that have the desirable property of allowing lower bounding. By this property, after establishing a true distance measure for the raw data (in this case the Euclidean distance), the distance between two time series, in the reduced space, results always less or equal than the true distance. Such a property ensures exact indexing of data (i.e. with no false negatives [13]). The following representations describe the state-of-art in this field: spectral decomposition through Discrete Fourier Transform (DFT) [1]; Discrete Wavelet Transform (DWT) [7]; Singular Value Decomposition (SVD) [24]; PiecewiseAggregate Approximation (PAA) [19]; Adaptive Piecewise Constant Approximation (APCA) [6]; Piecewise Linear Approximation (PLA) [20]; and Chebyshev Polynomials (CHEB) [29]. Many researchers have also included symbolic representations of time series, that transform time series measurements into a collection of discretized symbols; among them we cite the Symbolic Aggregate approXimation (SAX) [26], based on PAA, and the evolved multi-resolution representation iSAX 2.0 [35]. Symbolic representation can take advantage of efforts conducted by the text-processing and bioinformatics communities, who made available several data structures and algorithms for efficient pattern discovery on symbolic encodings [2],[25],[36].

The chapter is organized as follows. Section 2 will introduce the similarity matching problem on time series. We will note the importance of the use of efficient data structures to perform search, and the choice of an adequate distance measure. Section 3 will show some of the most used distance measure for time series data mining. Section 4 will review the above mentioned dimensionality reduction techniques.

A TSDB with m objects, each of length n, is denoted by TSDB = {x_{1}, x_{2},..., x_{m}}, where x_{i} = (x_{i}^{(1)}, x_{i}^{(2)},..., x_{i}^{(n)}) is a vector denoting the ith time series and x_{i}^{(j)} denotes the jth values of x_{i}, with respect to time. n indicates the dimensionality of the data set.

The general representation of a TSDB is a m × n matrix:

In many cases, datasets are supported by special data structures, especially when dataset get larger, that are referred as indexing structures. Indexing consists of building a data structure I that enables efficient searching within database [29]. Usually, it is designed to face two principal similarity queries: the (i) k-nearest neighbors (knn), and the (ii) range query problem. Given a time series Q in TSDB, and a similarity/dissimilarity measure d(T,S) defined for each pair T, S in TSDB, the former query deals with the search of the set of first k time series in TSDB more similar to Q. The latter query finds the set R of time series that are within distance r from Q. Given an indexing structure I, there are two ways to post a similarity query in time series databases [29]:

whole matching: given a TSDB of time series, each of length n, whole matching relates to computation of similarity matching among time series along their whole length.

subsequence matching: given a TSDB of m time series S_{1}, S_{2}, …, S_{m}, each of length n_{i}, and a short query time series Q of length n_{q} < n_{i}, with 0 < i < m, subsequence matching relates to finding matches of Q into subsequences of every S_{i}, starting at every position.

Indexing is crucial for reaching efficiency on data mining tasks, such as clustering or classification, specially for huge database such as TSDBs. Clustering is related to the unsupervised division of data into groups (clusters) of similar objects under some similarity or dissimilarity measures. Sometimes, on time series domain, a similar problem to clustering is the motifdiscovery problem [28], consisting of searching main cluster (or motif) into a TSDB. The search for clusters is unsupervised. Classification assigns unlabeled time series to predefined classes after a supervised learning process. Both tasks make massive use of distance computations.

Distance measures play an important role for similarity problem, in data mining tasks. Concerning a distance measure, it is important to understand if it can be considered metric. A metric function on a TSDB is a function f : TSDB × TSDB → R (where R is the set of real numbers). For all x, y, z in TSDB, this function obeys to four fundamental properties:

f(x, y) ≥ 0 (non-negativity)

f(x, y) = 0 if and only if x = y (identity)

f(x, y) = f(y, x) (symmetry)

f(x, z) ≤ f(x, y) + f(y, z) (triangle inequality)

If any of these is not obeyed, then the distance is non-metric. Using a metric function is desired, because the triangle inequality property (Eq. 5) can be used to index the space for speed-up search. A well known framework, for indexing time series, is GEMINI (GEneric Multimedia INdexIng) [13] that designs fast search algorithms for locating time series that match, in an exact or approximate way, a query time series Q.

It was introduced to accommodate any dimensionality reduction method for time series, and then allows indexing on new representation [29]. GEMINI guarantees no false negatives on index search if two conditions are satisfied: (i) for the raw time series, a metric distance measure must be established; (ii) to work with the reduced representation, a specific requirement is that it guarantees the lower bounding property.

A common data mining task is the estimation of similarity among objects. A similarity measure is a relation between a pair of objects and a scalar number. Common intervals used to mapping the similarity are [-1, 1] or [0, 1], where 1 indicates the maximum of similarity.

Considering the similarity between two numbers x and y as :

numSim(x,y)=1−|x−y||x|+|y|E2

Let two time series X = x_{1,}…,x_{n}, Y = y_{1,}…,y_{n}, some similarity measures are:

Another common similarity functions used to perform complete or partial matching between time series are the cross-correlation function (or Pearson’s correlation function) [38] and the cosine angle between the two vectors. The cross correlation between two time series x and y of length n, allowing a shifted comparison of l positions, is defined as:

Where X¯ and Y¯are the means of X and Y. The correlation r_{XY} provides the degree of linear dependence between the two vectors X and Y from perfect linear relationship (r_{XY} = 1), to perfect negative linear relation (r_{XY} = -1).

Another way to evaluate similarity between two vectors is the estimation of cosine of the angle between the two vectors X and Y, defined as:

cos(ϑ)=X⋅Y‖X‖‖Y‖=∑i=1nxiyi∑i=1nxi2∑i=1nyi2E7

This measure provides values in range [-1, 1]. The lower boundary indicates that the X and Y vectors are exactly opposite, the upper boundary indicates that the vectors are exactly the same, finally the 0 value indicates the independence.

3.1. Metric distances

Another way to compare time series data involves concept of distance measures. Let be two time series T and S vectors of length n, and T_{i} and S_{i} the ith values of T and S, respectively. Let us list the following distance measures. This subsection presents a list of distance functions used in Euclidean space.

Euclidean Distance. The most used distance function in many applications. It is defined as (Fig. 3):

Manhattan Distance. Also called “city block distance”. It is defined as:

Maximum Distance. It is defined to be the maximum value of the distances of the attributes:

Minkowski Distance. The Euclidean distance, Manhattan distance, and Maximum distance, are particular instances of the Minkowski distance, called also L_{p}-norm. It is defined as:

Mahalanobis Distance. The Mahalanobis distance is defined as:

Euclidean distance is cataloged as a metric distance function, since it obeys to the metric properties: non-negativity, identity, symmetry and triangle inequality (Section 2, Eq.2~5). It is surprisingly competitive with other more complex approaches, especially when dataset size gets larger [35]. In every way, Euclidean distance and its variants present several drawbacks, that make inappropriate their use in certain applications:

It compares only time series of the same length.

It doesn’t handle outliers or noise.

It is very sensitive respect to six signal transformations: shifting, uniform amplitude scaling, uniform time scaling, uniform bi-scaling, time warping and non-uniform amplitude scaling [31].

Dynamic Time Warping (DTW) [22] gives more robustness to the similarity computation. By this method, also time series of different length can be compared, because it replaces the one-to-one point comparison, used in Euclidean distance, with a many-to-one (and viceversa) comparison. The main feature of this distance measure is that it allows to recognize similar shapes, even if they present signal transformations, such as shifting and/or scaling (Fig. 4).

Given two time series T = {t_{1}, t_{2},..., t_{n}} and S = {s_{1}, s_{2},..., s_{m}} of length n and m, respectively, an alignment by DTW method exploits information contained in a n × m distance matrix:

where distMatrix(i, j) corresponds to the distance of ith point of T and jth point of Sd(T_{i}, S_{j}), with 1 ≤ i ≤ n and 1 ≤ j ≤ m.

The DTW objective is to find the warping pathW = {w_{1}, w_{2},...,w_{k},..., w_{K}} of contiguous elements on distMatrix (with max(n, m) < K < m +n -1, and w_{k}= distMatrix(i, j)), such that it minimizes the following function:

DTW(T,S)=min(∑k=1Kwk)E14

The warping path is subject to several constraints [22]. Given w_{k}= (i, j) and w_{k-1}= (i’, j’) with i, i’ ≤ n and j, j’ ≤ m :

Boundary conditions. w_{1} = (1,1) and w_{K} = (n, m).

Continuity. i – i’ ≤ 1 and j – j’ ≤ 1.

Monotonicity. i – i’ ≥ 0 and j – j’ ≥ 0.

The warping path can be efficiently computed using dynamic programming [9]. By this method, a cumulative distance matrix γ of the same dimension as the distMatrix, is created to store in the cell (i, j) the following value (Fig. 5):

The overall complexity of the method is relative to the computation of all distances in distMatrix, that is O(nm). The last element of the warping path, w_{K} corresponds to the distance calculated with the DTW method.

In many cases, this method can bring to undesired effects. An example is when a large number of point of a time series T is mapped to a single point of another time series S (Fig. 6a, 7a). A common way to overcome this problem is to restrict the warping path in such a way it has to follow a direction along the diagonal (Fig. 6b, 7b). To do this, we can restrict the path enforcing the recursion to stop at a certain depth, represented by a threshold δ. Then, the cumulative distance matrix γ will be calculated as follows:

Figure 7a shows the computation of a restricted warping path, using a threshold δ = 10. This constraint, besides limiting extreme or degenerate mappings, allows to speed-up DTW distance calculation, because we need to store only distances which are at most δ positions away (in horizontal and vertical direction) from the distMatrix diagonal. This reduces the computational complexity to O((n + m)δ). The above proposed constraint is known also as Sakoe-Chiba band (Fig. 8a) [34], and it is classified as global constraint. Another most common global constraint is the Itakura parallelogram (Fig. 8b) [18].

Local constraints are subject of research and are different from global constraints [22], because they provide local restrictions on the set of the alternative depth steps of the recurrence function (Eq. 19). For example we can replace Eq. 19 with:

Another well known method that takes advantage of dynamic programming to allow comparison of one-to-many points is the Longest Common SubSequence (LCSS) similarity measure [37]. An interesting feature of this method is that it is more resilient to noise than DTW, because allows some elements of time series to be unmatched (Fig. 9). This solution builds a matrix LCSS similar to γ, but considering similarity instead of distances. Given the time series T and S of length n and m, respectively, the recurrence function is expressed as follows:

with 1 ≤ i ≤ n and 1 ≤ j ≤ m. Since exact matching between T_{i} and S_{j} can be strict for numerical values (Eq. 22 is best indicated for string distance computation, such as the edit distance), a common way to relax this definition is to apply the following recurrence function:

The cell LCSS(n, m) contains the similarity between T and S, because it corresponds to length l of the longest common subsequence of elements between time series T and S. To define a distance measure, we can compute [32]:

LCSSdist(T,S)=n+m+2lm+nE20

Also for LCSS the time complexity is O(nm), but it can be improved to O((n + m)δ) if a restriction is used (i.e. when |i - j| < δ).

We have already introduced that a key aspect to achieve efficiency, when mining time series data, is to work with a data representation that is lighter than the raw data. This can be done by reducing the dimensionality of data, still maintaining its main properties. An important feature to be considered, when choosing a representation, is the lower bounding property.

Given two raw representations of the time series T and S, by this property, after establishing a true distance measure d_{true} for the raw data (such as the Euclidean distance), the distance d_{feature} between two time series, in the reduced space, R(T) and R(S), have to be always less or equal than d_{true}:

dfeature(R(T),R(S))≤dtrue(T,S)E21

If a dimensionality reduction techniques ensures that the reduced representation of a time series satisfies such a property, we can assume that the similarity matching in the reduced space maintains its meaning. Moreover, we can take advantage of indexing structure such as GEMINI (Section 2) [13] to perform speed-up search even avoiding false negative results. In the following subsections, we will review the main dimensionality reduction techniques that preserve the lower bounding property.

4.1. DFT

The dimensionality reduction, based on the Discrete Fourier Transform (DFT) [1], was the first to be proposed for time series. The DFT decomposes a signal into a sum of sine and cosine waves, called FourierSeries. Each wave is represented by a complex number known as Fourier coefficient (Fig. 10) [29],[32]. The most important feature of this method is the data compression, because the original signal can be reconstructed by means of information carried by the waves with higher Fourier coefficient. The rest can be discarded with no significant loss.

More formally, given a signal x = {x_{1}, x_{2},..., x_{n}}, the n-point Discrete Fourier Transform of x is a sequence X = {X_{1}, X_{2},..., X_{n}} of complex numbers. X is the representation of x in the frequency domain. Each wave/frequency X_{F} is calculated as:

XF=1n∑i=1nxie−2πijFn(j=−1)F=1,…,nE22

The original representation of x, in the time domain, can be recovered by the inverse function:

xi=1n∑F=1nXFe−2πijFn(j=−1)i=1,…,nE23

The energy E(x) of a signal x is given by:

E(x)=‖x‖2=∑i=1n|xi|2E24

A fundamental property of DFT is guaranteed by the Parseval’s Theorem, which asserts that the energy calculated on time series domain for signal x is preserved on frequency domain, and then:

E(x)=∑i=1n|xi|2=∑F=1n|XF|2=E(X)E25

If we use the Euclidean distance (Eq. 12), by this property, the distance d(x,y) between two signals x and y on time domain is the same as calculated in the frequency domain d(X,Y), where X and Y are the respective transforms of x and y. The reduced representation X’ = {X_{1}, X_{2},..., X_{k}} is built by only keeping first k coefficients of X to reconstruct the signal x (Fig. 11).

For the Parseval’s Theorem we can be sure that the distance calculated on the reduced space is always less than the distance calculated on the original space, because k ≤ n and then the distance measured using Eq. 12 will produce:

d(X',Y')≤d(X,Y)=d(x,y)E26

that satisfies the lower bounding property defined in Eq. 25.

The computational complexity of DFT is O(n^{2}), but it can be reduced by means of the FFT algorithm [8], which computes the DFT in O(n log n) time. The main drawback of DFT reduction technique is the choice of the best number of coefficients to keep for a faithfully reconstruction of the original signal.

4.2. DWT

Another technique for decomposing signals is the WaveletTransform (WT). The basic idea of WT is data representation in terms of sum and difference of prototype functions, called wavelets. The discrete version of WT is the Discrete Wavelet Transform (DWT). Similarly to DFT, wavelet coefficients give local contributions to the reconstruction of the signal, while Fourier coefficients always represent global contributions to the signal over all the time [32].

The Haar wavelet is the simplest possible wavelet. Its formal definition is shown in [7]. An example of DWT based on Haar wavelet is shown in Table 1.

The general Haar transform H_{L}(T) of a time series T of length n can be formalized as follows:

AL'+1(i)=AL'(2i)+AL'(2i+1)2E27

DL'+1(i)=DL'(2i)−DL'(2i+1)2E28

HL(T)=(AL,DL,DL−1,…,Do)E29

where 0 < L’ ≤ L, and 1 ≤ i ≤ n.

Level (L)

Averages coefficients (A)

Wavelet coefficients (D)

1

10, 4, 6, 6

2

8, 6

3, 0

3

7

1

Table 1.

The Haar transform of T = {10, 4, 8, 6} depends on the chosen level, and corresponds to merging Averagescoefficients (column 2) at the chosen level and all Wavelet coefficients (column 3) in decreasing order from the chosen level. At level 1 the representation is the same of time series: H_{1}(T) = {10, 4, 6, 6} + {} = {10, 4, 6, 6} = T. At level 2 is H_{2}(T)= {8, 6} + {3, 0} + {} = {8, 6, 3, 0}. At level 3 is H_{3}(T) = {7} + {1} + {3, 0} = {7, 1, 3 0}.

The main drawback of this method is that it is well defined for time series which length n is a power of 2 (n = 2^{m}). The computational complexity of DWT using Haar Wavelet is O(n). Chan and Fu [7] demonstrated that the Euclidean distance between two time series T and S, d(T,S), can be calculated in terms of their Haar transform d(H(T), H(S)), by preserving the lower bounding property in Eq. 25, because:

d(H(T),H(S))=2d(T,S)<d(T,S)E30

4.3. SVD

As we have just seen in Section 2, a TSDB with m time series, each of length n, can be represented by a m × n matrix A (Eq. 1). An important result from linear algebra is that A can always be written in the form [16]:

A=UWVTE31

where U is an m × n matrix, W and V are n × n matrices. This is called the Singular Value Decomposition (SVD) of the matrix A, and the elements of the n × n diagonal matrix W are the singular valuesw_{i}:

W=(w10⋯00w2⋯0⋮⋮⋱⋮00⋯wn)E32

V is orthonormal, because VV^{T} = V^{T}V = I_{n}, where I_{n} is the identity matrix of size n. So, we can multiply both sides of Eq. 35 by V to get:

AV=UWVTV→AV=UWE33

UW represents a set of n-dimensional vectors AV ={X_{1}, X_{2},..., X_{m}} which are rotated from the original vectors A={x_{1}, x_{2},..., x_{m}} [29]:

(X1X2⋮Xm)=(U1U2⋮Um)(w10⋯00w2⋯0⋮⋮⋱⋮00⋯wn)E34

Similarly to sine/cosine waves for DFT (Section 3.1) and to wavelet for DWT (Section 3.2), U vectors represent basis for AV, and their linear combination with W (that represents their coefficients) can reconstruct AV.

We can perform dimensionality reduction by selecting the first ordered k biggest singular values, and their relative entries in A, V and U, to obtain a new k-dimensional dataset that best fits original data.

SVD is an optimal transform if we aim to reconstruct data, because it minimizes the reconstruction error, but have two important drawbacks: (i) it needs a collection of time series to perform dimensionality reduction (it cannot operate on singular time series), because examines the whole dataset prior to transform. Moreover, the computational complexity is O(min(m^{2}n, mn^{2})). (ii) This transformation is not incremental, because a new data insertion requires a new global computation.

4.4. Dimensionality reduction via PAA

Given a time series T of length n, PAA divides it into w equal sized segments t_{i} (1 < i ≤ w) and records values corresponding to the mean of each segment mean(t_{i}) (Fig. 14) into a vector PAA(T) = {mean(t_{1}), mean(t_{2}), …, mean(t_{w})}, using the following formula:

mean(ti)=wn∑j=nw(i−1)+1nwitjE35

When n is a power of 2, each mean(t_{i}) essentially represents an Averages coefficient A_{L}(i), defined in Section 4.2, and w corresponds in this case to:

w=n2LE36

The complexity time to calculate the mean values of Eq. 39 is O(n). The PAA method is very simple and intuitive, moreover it is strongly competitive with other more sophisticated transforms such as DFT and DWT [21],[41]. Most of data mining researches makes use of PAA reduction for its simplicity. It is simple to demonstrate how the distance on raw representation is bounded below by the distances calculated on PAA representation (even using Euclidean distance as reference point), satisfying Eq. 25. A limitation of such a reduction, in some contexts, can be the fixed size of the obtained segments.

4.5. APCA

In Section 4.2 we noticed that not all Haar coefficients in DWT are important for the time series reconstruction. Same thing for PAA in Section 4.4, where not all segment means are equally important for the reconstruction, or better, we sometimes need an approximation with no-fixed size segments. APCA is an adaptive model that, differently from PAA, allows to define segments of variable size. This can be useful when we find in time series areas of low variance and areas of high variance, for which we want to have, respectively, few segments for the former, and many segments for the latter.

Given a time series T = {t_{1}, t_{2},..., t_{n}} of length n, the APCA representation of T is defined as [6]:

C={⟨cv1,cr1⟩,…,⟨cvM,crM⟩},cr0=0E37

where cr_{i} is the last index of the ith segment, and

cvi=mean(tcri−1+1,…,tcri)E38

To find an optimal representation through the APCA technique, dynamic programming can be used [14,30]. This solution requires O(Mn^{2}) time. A better solution was proposed by Chakrabarti et al. [6], which finds the APCA representation in O(n log n) time, and defines a distance measure for this representation satisfying the lower bounding property defined in Eq. 25. The proposed method bases on Haar wavelet transformation. As we have just seen in Section 4.2, the original signal can be reconstructed by only selecting bigger coefficients, and truncating the rest. The segments in the reconstructed signal may have approximate mean values (due to truncation) [6], so these values are replaced by the exact mean values of the original signal. Two aspects to consider before performing APCA:

Since Haar transformation deals only with time series length n = 2^{p}, we need to add zeros to the end of the time series, until it reaches the desired length.

If we held the biggest M Haar coefficients, we are not sure if the reconstruction will return an APCA representation of length M. We can know only that the number of segments will vary between M and 3M [6]. If the number of segments is more than M, we will iteratively merge more similar adjacent pairs of segments, until we reach M segments.

The algorithm for compute APCA representation can be found in [6].

4.6. Time series segmentation using PLA

As with most computer science problems, representation of data is the key to efficient and effective solutions. A suitable representation of a time series may be Piecewise Linear Approximation (PLA), where the original points are reduced to a set of segments.

PLA refers to the approximation of a time series T, of length N, using K consecutive segments with K much smaller than n (Fig. 15). This representation makes the storage, transmission and computation of the data more efficient [23]. In the light of it, PLA may be used to support clustering, classification, indexing and association rule mining of time series data (e.g. [10]).

The process of time series approximation using PLA is known as segmentation and is related to clustering process where each segment can be considered as a cluster [33].

There are several techniques to segment a time series and they can be distinguished into off-line and on-line approaches. In the former approach an error threshold is fixed by the user, while the latter uses a dynamic error threshold that changes, according to specific criteria, during the execution of the algorithm.

Although off-line algorithms are simple to realize, they are less effective than the on-line ones. The classic approaches to time series segmentation are the sliding window, bottom-up and top-down algorithms. Sliding window is an on-line algorithm and works growing a segment until the error for the potential segment is greater than the user-specified threshold, then the subsequence is transformed into a segment; the process repeats until the entire time series has been approximated by its PLA [23]. A way to estimate error is by taking the mean of the sum of the square of vertical differences between the best-fit line and the actual data points. Another commonly used measure of goodness of fit is the distance between the best fit line and the data point furthest away in the vertical direction [23].

In the top-down approach a segment, that represents the entire time-series, is recursively split until the desired number of segment or a error threshold is reached. Dually, the bottom-up algorithm starts from the finest approximation of the time series using n/2 segments and merging the two most similar adjacent segments until the desired number of segment or an error threshold is reached.

However, an open question is the choice of best k number of segments. This problem involves a trade-off between compression and accuracy of time series representation. As suggested by Salvador et al. [33], the appropriate number of segments may be estimated using evaluation graph. It is defined as a two dimensional plot where x-axis is the number of segments, while y-axis is a measure of the segmentation error. The best number of segments is provided by the point of maximum curvature, also called “knee”, of the evaluation graph (Fig. 16).

4.7. Chebyshev Polynomials approximation

By this technique, the reduction problem is resolved by considering the values of the time series T as values of a function f, and approximating it with a polynomial function of degree n which well fits f:

Pn(x)=∑i=0naixiE39

where each a_{i} corresponds to coefficients and x^{i} to the variables of degree i.

There are many possible ways to choose the polynomial: Fourier transforms (Section 4.1), splines, non-linear regressions, etc. Ng and Cai [29] hypothesized that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. It has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [27]. Thus, Ng and Cai [29] explored how to use the Chebyshevpolynomials (of the first kind) as a basis for approximating and indexing n-dimensional (n ≥ 1) time series. The Chebyshev polynomial CP_{m}(x) of the first kind is a polynomial of degree m (m = 0, 1, …), defined as:

CPm(x)=cos(marccos(x))x∈[−1,1]E40

It is possible to compute every CP_{m}(x) using the following recurrence function [29]:

CPm(x)=2xCPm−1(x)−CPm−2(x)E41

for all m ≥ 2 with CP_{0}(x) = 1 and CP_{1}(x) = x. Since Chebyshev polynomials form a family of orthogonal functions, a function f(x) can be approximated by the following Chebyshev series expansion:

S∞CP(f(x))=∑i=0∞ciCPi(x)E42

where c_{i} refer to the Chebyshev coefficients. We refer the reader to the paper [29] for the conversion of a time series, which represents a discrete function, to an interval function required for the computation of Chebyshev coefficients. Given two time series T and S, and their corresponding vectors of Chebyshev coefficients, C_{1} and C_{2}, the key feature of their work is the definition of a distance function d_{Cheb} between the two vectors, that guarantees the lower bounding property defined in Eq. 25. Since it results:

dcheb(C1,C2)≤dtrue(T1,T2)E43

the indexing with Chebyshev coefficients admits no false negatives. The computational complexity of Chebishev approximation is O(n), where n is the length of the approximated time series.

4.8. SAX

Many symbolic representations of time series have been introduced over the past decades. The challenge in this field is to create a real correlation between the distance measure defined on the symbolic representation, and that defined on original time series. SAX is the most known symbolic representation technique on time series data mining, that ensures both a considerable dimensionality reduction, and the lower bounding property, allowing enhancing of time performances on most of data mining algorithm.

Given a time series T of length n, and an alphabet of arbitrary size a, SAX returns a string of arbitrary length w (typically w << n). The alphabet size a is an integer, where a > 2. SAX method is PAA-based, since it transforms PAA means into symbols, according to a defined transformation function.

To give significance to the symbolic transformation, it is necessary to deal with a system producing symbols with equal probability, or with a Gaussian distribution. This can be achieved by normalizing time series, since normalized time series have generally a Gaussian distribution [26]. This is the first assumption to consider about this technique. However, for data not obeying to this property, the efficiency of the reduction is slightly deteriorated. Given the Gaussian distribution, it is simple to determine the “breakpoints” that will produce a equal-sized areas of probability under the Gaussian curve. What follows gives the principal definitions to understand SAX representation.

Definition 1. Breakpoints: A sorted list of numbers B = β_{1},..., β_{a−1} such that the area under a N(0, 1) Gaussian curve from β_{i} to β_{i+1} = 1/a (β_{0} and β_{a} are defined as −∞ and ∞, respectively) (Table 2). For example, if we want to obtain breakpoints for an alphabet of size a = 4, we have to compute the first (q_{1}), the second (q_{2}), and the third (q_{3}) quartiles of the inverse cumulative Gaussian distribution, corresponding to the 25%, 50% and 75% of the cumulative frequency: β_{1} = q_{1}, β_{2} = q_{2}, β_{3} = q_{3}.

Definition 2. Alphabet: A collection of symbols alpha = α_{1}, α_{2},…, α_{a} of size a used to transform mean frames into symbols.

(_{i}\a

3

4

5

6

7

8

(_{1}

-0.43

-0.67

-0.84

-0.97

-1.07

-1.15

(_{2}

0.43

0

-0.25

-0.43

-0.57

-0.67

(_{3}

0.67

0.25

0

-0.18

-0.32

(_{4}

0.84

0.43

0.18

0

(_{5}

0.97

0.57

0.32

(_{6}

1.07

0.67

(_{7}

1.15

Table 2.

A look-up table for breakpoints used for alphabet of size 2 < a < 9.

Definition 3. Word: A PAA approximation PAA(T) = {mean(t_{1}), mean(t_{2}), …, mean(t_{w})} of length w can be represented as a word SAX(T) = {sax(t_{1}), sax(t_{2}), …, sax(t_{w})}, with respect to the following mapping function:

sax(ti)=αj,iffβj−1≤mean(ti)<βj(0<i≤w,1<j<a)E44

Lin et al. [26] defined a distance measure for this representation, such that the real distance calculated on original representation is bounded below from it. An extension of SAX technique, iSAX, was proposed by Shieh and Keogh [35] which allows to get different resolutions for the same word, by using several combination of parameters a and w.

References

1.AgrawalFaloutsos.Swami1993Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningProc. of the 4th Conference on Foundations of Data Organization and Algorithms, 6984

2.Bailey, Elkan1995Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining

3.BeckmannKriegel.SchneiderSeeger.1990Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningProc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 322331

4.Bentley1975Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningCommunications of ACM. 18509517

5.CantoneFerro.PulvirentiReforgiato.2005Antipole tree indexing to support range search and k-nearest neighbour search in metric spaces”. IEEE Transactions on Knowledge and Data Engineering. 17535550

6.ChakrabartiKeogh.MehrotraPazzani.2002Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningACM Trans. Database Syst. 27, 2, 188228

7.Chan, Fu1999Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningIn proceedings of the 15th IEEE Int’l Conference on Data Engineering. Sydney, Australia, Mar 23-26. 126133

8.Cooley, Tukey1965Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningMath. Comput. 19297301

9.CormenLeiserson.Rivest1990Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningThe MIT Press, McGraw-Hill Book Company.

10.Di SalvoMontalto.NunnariNeri.Puglisi2012Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003”. Journal of Volcanology and Geothermal Research.

11.DingTrajcevski.ScheuermannWang.Keogh2008Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningVLDB.

12.DudaHart.Stork2001Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningJohn Wiley & Sons.

13.FaloutsosRanganthan.Manolopoulos1994Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningSIGMOD Conference.

14.FaloutsosJagadish.MendelzonMilo.1997A signature technique for similarity-based queries”. In Proceedings of the SEQUENCES 97 (Positano-Salerno, Italy).

15.Fink, Pratt2004Indexing of compressed time series”. In Mark Last, Abraham Kandel, and Horst Bunke, editors, Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining4365World Scientific, Singapore.

16.Golub, Van Loan1996Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining^{rd} edition. Baltimore, MD: Hopkins University Press.

17.Han and Kamber2005Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningMorgan Kaufmann Publishers, CA.

18.Itakura1975Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningIEEE Trans Acoustics Speech Signal Process. ASSP 235272

19.KeoghChakrabarti.PazzaniMehrotra.2000Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningJournal of

20.KeoghChu.HartPazzani. (2001a). An Online Algorithm for Segmenting Time Series. In Proc. IEEE Intl. Conf. on Data Mining, 289296, 2001.

21.Keogh, Kasetty2002Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningProceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 102111

22.Keogh, Ratanamahatana2002Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningIn proceedings of the 26th Int’l Conference on Very Large Data Bases. Hong Kong. 406417

23.KeoghChu.HartPazzani.2004Segmenting time series: a survey and novel approach”. In: Last, M., Kandel, A., Bunke, H. (Eds.), Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningWorld Scientific Publishing Company, 121

24.KornJagadish.Faloutsos1997Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningProceedings of SIGMOD ‘97, Tucson, AZ, 289300

25.LawrenceAltschul.BoguskiLiu.NeuwaldWootton.1993Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining

26.LinKeogh.WeiLonardi.2007Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningData Mining Knowledge Discovery, 15(2).

27.Mason, Handscomb2003Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningChapman & Hall.

28.MueenKeogh.ZhuCash.Westover2009Exact Discovery of Time Series Motifs”. SDM 2009.

29.Ng, Cai2004Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials”. SIGMOD 2004.

30.Pavlidis1976Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningIEEE Trans.Comput. C-22, 7 (July).

31.PerngWang.ZhangParker.2000Landmarks: a newmodel for similarity-based pattern querying in time series databases”. Proc.2000 ICDE, 3342

32.RatanamahatanaLin.GunopulosKeogh.VlachosDas.2010Mining Time Series Data”. Data Mining and Knowledge Discovery Handbook, 10491077

33.SalvadorChan.2004Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms”. Proceedings of the 16^{th} IEEE International Conference on Tools with Artificial Intelligence, 2004, 576584

34.Sakoe, Chiba1978Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningIEEE Trans Acoustics Speech Signal Process. ASSP 264349

35.Shieh and Keogh2008Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningSIGKDD, 623631

36.Tompa, Buhler2001Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningIn proceedings of the 5th Int’l Conference on Computational Molecular Biology. Montreal, Canada, Apr 22-25. 6774

37.VlachosKollios.Gunopulos2002Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningProc. 2002 ICDE, 673684

38.Von Storch, Zwiers2001Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningCambridge Univ Pr. 0-52101-230-9

39.WangYe.KeoghShelton.2008Similarity Measures and Dimensionality Reduction Techniques for Time Series Data MiningJCDL 2008.

40.Yang, Shahabi2004A PCA-based similarity measure for multivariate time series”. In Proceedings of the 2^{nd} ACM international workshop on Multimedia database, 6574

41.YiFaloutsosC. (2000). “Fast Time Sequence Indexing for Arbitrary Lp Norms”. VLDB.

Written By

Carmelo Cassisi, Placido Montalto, Marco Aliotta, Andrea Cannata and Alfredo Pulvirenti

Submitted: 15 April 2012Published: 12 September 2012