Definition of True (TPR) and False (FPR) Positive Rate.
Data science is an evolutionary step in interdisciplinary fields incorporating computer science, statistics, engineering, and mathematics. At its core, data science involves using automated and robust approaches to analyze massive amounts of data and to extract informative knowledge from them. Data science transforms traditional ways of analyzing problems and creates powerful new solutions. Diverse computational and analytical techniques contribute to data science. In this chapter, we review and also propose one type of data mining and pattern recognition strategy that has been under development in multiple disciplines (e.g. statistics and machine learning) with important applications ---- outlier or novelty detection [1-4].
In biomedical engineering, data science can make healthcare and medical imaging science not only more efficient but also more effective for better outcomes and earlier detection. Outlier and novelty detection in these domains plays an essential role, though it may be underappreciated and not well understood by many biomedical practitioners. From the healthcare point of view, an outlier probably reflects the need for heightened vigilance, if not full-fledged intervention. For example, an abnormally high glucose reading for a diabetic patient is an outlier which may require action. In high-dimensional medical imaging, developing automated and robust outlier detection methods is a critical preprocessing step for any subsequent statistical analysis or medical research.
An exact definition of an outlier or novelty typically depends on hidden assumptions regarding the data structure and the associated detection method, though some definitions are general enough to cope with varieties of data and methods. For example, outliers can be considered as patterns in data that do not conform to a well-defined notion of “normal” behavior, or as observations in a data set which appear to be inconsistent with the remainder of that set of data. Figure 1 shows outliers in a 2-dimensional dataset. Since most of the observations fall into clusters N1 and N2, they are two “normal” regions; while points in region O1 as well as points o2 and o3 are outliers (in red), due to their sufficiently far distance from the “normal” regions. Identifying observations inconsistent with the “normal” data, or detecting previously unobserved emergent or novel patterns is commonly referred to as
Outlier and novelty detection methods can be divided into
Another related topic is robust statistics for estimation that can handle outliers or at least is less sensitive to the influence of outliers. Robust statistics perform well with data drawn from a wide range of probability distributions, especially for distributions that are not normally distributed. Robust statistical methods have been developed for many common problems, such as estimating data properties including location and scatter or estimating model parameters as in regression analysis [10, 11]. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. A typical procedure or example of the former case is for multivariate estimation of location and covariance as well as for multivariate outlier detection. In this case, as a first step, the approaches often try to search for a minimum number of observations with a certain degree of confidence being outlier-free. Based on this starting subset, location and covariance can be estimated robustly. In a second step, outliers can be identified through computing the observations’ distances with respect to these initial estimates.
In this chapter, we review and also propose statistical and machine learning approaches for outlier and novelty detection, as well as robust methods that can handle outliers in data and imaging sciences. In particular, robust statistical techniques based on the Minimum Covariance Determinant (MCD) are introduced in Section 2, which include a classical and fast computation scheme of MCD and a few robust regression strategies. We present our newly developed multivariate Voronoi outlier detection (MVOD) method for time series data and some preliminary results in Section 3. This approach copes with outliers in a multivariate framework via designing and extracting effective attributes or features from the data; Voronoi diagrams allow for automatic configuration of the neighborhood relationship of the data points, facilitating the differentiation of outliers and non-outliers. Section 4 reviews varieties of machine learning methods for novelty detection, with a focus on probabilistic approaches. In Section 5, we present some existing and new technologies related to outliers and novelty in the area of imaging sciences. Section 6 provides concluding remarks of the chapter.
2. Robust statistical methods using Minimum Covariance Determinant (MCD)
The Minimum Covariance Determinant (MCD) estimator is a highly robust estimator of multivariate location and scatter. Since estimating the covariance matrix is the cornerstone of many multivariate statistical methods, the MCD has also been used to develop robust and computationally efficient multivariate techniques.
2.1. MCD and its fast computing algorithm
Given a dataset consisting of variables and observations, i.e. a data matrix, we can represent this multivariate data as , where , for , is the th observation and a
A point with a larger Mahalanobis distance will lie further away from the center of the data cloud than a point with a smaller Mahalanobis distance. A robust distance () measure is achieved if we substitute the MCD estimate of mean () and covariance () into Equation (1), which yields Equation (2).
The classical estimates can be sensitive to outliers, while the MCD estimate is robust [8, 12, 13]. The MCD relies on a subset of the total observations. Choosing this subset makes the algorithm robust because it is less sensitive to the influence of outlying points. Figure 3 illustrates the difference between these two estimates; it is a scatterplot of the distances for an example dataset with 70 observations and 2 variables (i.e. ). The two ellipses are two outlier thresholds, determined by the 0.975 chi-square quantile with 2 degrees of freedom when the classical and robust estimates are used, respectively. The dashed blue ellipse marks off the 97.5% outlier threshold for the classical Mahalanobis distance, suggesting that two observations lying beyond the ellipse are outliers. The 97.5% outlier threshold for the robust distance measure is marked off by the solid red ellipse, suggesting ten points are outliers.
The MCD has a user-determined parameter, , which specifies the size of the subset of data to base the estimate upon. It is constrained by . The observations are chosen such that the determinant of the sample covariance matrix is minimal (but not minimized in the formal sense, because it relies on a sampling algorithm instead of a loss function). The MCD is optimally designed for elliptically symmetric unimodal distributions, such as the commonly encountered multivariate normal distribution. The MCD is most robust when . But this causes low efficiency  (at least for normal probability distributions), which can be increased (while retaining high robustness) by applying reweighted estimators [15, 16]. Robust statistical estimators are commonly evaluated both on their breakdown value and influence functions. The MCD is a high breakdown estimator and its influence function appears bounded, which is desirable. An alternative strategy that employs Delaunay triangulation to identify a robust outlier-free subsample in an adaptive way was presented in .
Computing the exact MCD is possible but computationally difficult, as it requires the evaluation of all subsets of size . Even though the MCD is a powerful robust estimator, it has only become widely used since the development of the so-called Fast-MCD algorithm  which we summarize below. Assume we have a dataset and let represent a -subset of length constrained by . Denote this first -subset as and it is randomly chosen from the entire dataset. Compute the mean and covariance matrix of , as well as the determinant of , denoted as . Then compute the distance of all observations in the entire dataset (and not just the comprising the initial subset) using Equation (2). Next, these distances are ordered from smallest to largest. Retain an equivalent number of observations from this ordering as chosen in the initial -subset; but instead of being chosen arbitrarily as in the initial subset, these are chosen such that they have the smallest distances as defined by the order statistics. Call this subset of observations , and compute , and . Now Equation (3) must be true:
Going from to is called a C-step for “Concentration step”, because the algorithm concentrates on the observations with the smallest distances and is more concentrated (or equivalently, has a smaller determinant). This C-step is repeated numerous or sufficient times, with each iteration using a different initial -subset. The 10 subsets that yield the smallest determinants overall are retained and further concentrated until convergence is met.
2.2. Robust multivariate regression and Multivariate Least-Trimmed Squares (MLTS) estimator
Section 2.1 introduced the robust MCD estimator and showed how the MCD can be computed efficiently. In this section, we review different frameworks for applying the MCD estimator to multivariate regression. These methods offer robust alternatives to standard multiple regression analysis.
We first look at robust multivariate regression in . Suppose we have a full dataset of predictors and responses containing no outliers; computing the regression parameter estimates from the full dataset using a least squares procedure will yield accurate results. With outliers present in the dataset, the MCD is used to search for a subset of size whose covariance matrix has the smallest determinant with constrained by , where
Different from the above robust multivariate regression, the multivariate least trimmed squares (MLTS) estimator in  first fits a regression model to the subset of data and then calculates the covariance matrix of the residuals. The estimator is defined by minimizing a trimmed sum of squared Mahalanobis distances, and can be computed by a fast algorithm. Let us consider the classical multivariate regression framework. Assume we have a sample of data defined as and let denote the design (or predictor) matrix and denote the response matrix. The regression model is:
The classical least squares estimator for the regression parameter is given by:
and the classical estimator of the scatter matrix is:
These classical least squares estimators are sensitive to outliers. A robust alternative to these estimators based on the residuals is achieved as below. For any , let denote the residuals from the fitted regression model. Furthermore let be the collection of all subsets of size . For any denote the least-squares fit solely on the observations . In addition, for all and denote the covariance matrix of the residuals with respect to the fit , belonging to subset as:
where . If we let for any , the MLTS estimator is defined as:
where . The covariance of the errors can be estimated by
where is a consistency factor. The observations corresponding to the residuals with the smallest determinant of the covariance matrix can then be used to give robust results for the regression parameters.
Using the MLTS as a means to estimate the parameters of the Vector Autoregressive (VAR) Model was presented in . The VAR model is popular for modeling multiple time series. Estimation of its parameters based a typical least squares method is unreliable when outliers are present in the data. Development of robust procedures for multiple time series analysis is more crucial than for univariate time series analysis due to the data correlation structure. Experimental results in  show that applying the reweighted MLTS procedure to the VAR model leads to robust multivariate regression estimators with improved performance.
3. Multivariate Voronoi Outlier Detection (MVOD) for time series
In order to better analyze multivariate time series data, we have recently proposed a general outlier detection method based on the mathematical principles of Voronoi diagrams. It is general because different attributes or features can be extracted from the data for Voronoi diagram construction. These attributes or features can be designed based on the nature of the data and the outliers. This has the potential to increase the accuracy and precision of outlier detection for specific application problems.
3.1. Background on Voronoi diagram
Our new method requires a Voronoi diagram, which is composed of Voronoi cells . A Voronoi diagram is a way of dividing space into regions. Assume we have a set of points, in the Euclidean plane. Let denote a Voronoi cell, which is a subdivision of the plane where the set of points are closer or as close to than to any other point in . This is expressed formally in Equation (10):
where is the Euclidean distance function. The set of all Voronoi cells for all points comprises a Voronoi diagram.
Figure 4 shows part of a Voronoi diagram, assuming Euclidean distance between the points. If one used a different distance metric, the Voronoi diagram would be configured differently. The plane is decomposed into convex polygonal regions, one for each . Vertices (or nodes) are called
3.2. Our proposed MVOD method
The Voronoi Outlier Index (VOInd) used in our Multivariate Voronoi Outlier Detection (MVOD) method is based upon the Voronoi notion of nearest neighbors. For a point of set , the nearest neighbors of defined by the Voronoi polygon are the Voronoi nearest neighbor of , denoted as . In Figure 4 the nearest Voronoi neighbors to point are , , , and . For each point in the data set, our method uses the nearest neighbors to compute an index (i.e. VOInd) of how likely that point is an outlier. It is multivariate because it aggregates information across all individual time series, thus retaining features which might be common to the entire interlocking set of variables.
Our method is based upon the geometric principles of Voronoi diagrams for defining the neighborhood relationship of the data points and this facilitates the assignment of group or data membership (i.e. outliers and non-outliers). Construction of a two dimensional Voronoi diagram requires two coordinates for each data point. Based on the nature of the data and the nature of the outliers to be identified, we can embed their attributes into the coordinates via extracting different valid features from the data. Here, we present one such case of the MVOD framework for feature extraction; but many others are also possible, including nonparametric forms. Figure 5 overviews the process and the rest of this subsection explains the steps in more detail.
Although a regression model is used here in Step 2 to extract the feature value, in fact, our method does not require this model. With either Step 1 or Step 2 alone, we will have a corresponding nonparametric or parametric basis, both of which could be suitable for different applications or datasets.
Note that a Voronoi outlier factor is used in  as the index which, however, was completely univariate in nature, since the x-and y-coordinates were based on a univariate time series. One of our primary motivations for this study is to create a novel and general MVOD method, which can detect outliers in time series data in a multivariate framework with multiple, interlocking sets of variables.
3.3. Experimental evaluation and results
|Definition of TPR and FPR|
= TP / (TP + FN)
= FP / (FP +TN)
The alpha parameter in the MLTS method determines both the size of the subset to use as well as a critical value in a chi-square distribution. If an observation is greater than this threshold in the chi-square distribution, then the MLTS method flags the observation as an outlier. However, it is critical to note that a one-to-one correspondence does not exist between the alpha value chosen, and the number of outliers flagged. For instance, one could set alpha at 0.10 but only have 2 out of 100 observations flagged as outliers. Partly for this reason we considered a range of alpha values and then averaged across this range to fairly compare with the MVOD method. For all simulated time series, we considered alpha between 0.01 and 0.20.
In the results presented next, we obtained the TPR and FPR for the two methods in the following way. For a given number of outliers with a specific outlier magnitude, we averaged a total of five cases. The five cases averaged always included the threshold (MVOD) or alpha value (MLTS) corresponding with the number of outliers, but also contained the preceding four cases as well. For instance, in the 10 outlier case, we took the results for threshold=10 (MVOD), as well as thresholds of 9, 8, 7 and 6. In the corresponding MLTS case, we would have taken alpha=0.10, 0.09, 0.08, 0.07 and 0.06. The TPR and FPR for each of these five cases for each method were averaged to obtain the values shown in Table 2.
4. Machine learning methods for novelty detection
Novelty detection can be considered as the task of classifying test data that differ in some respect from the data that are available during training. This may be approached within the framework of “one-class classification” , in which a model is built to describe “normal” training data. Novelty detection methods can be categorized into several areas such as probabilistic, distance-based, reconstruction-based, domain-based, and information-theoretic techniques. In this section, we mainly introduce the first category of probabilistic approaches, and briefly summarize the others.
4.1. Probabilistic approaches
Probabilistic approaches to novelty detection are based on estimation of the generative probability density function of the data, which may then be used to define thresholds for the boundaries of “normality” in the data space and test whether or not a test sample is from the same distribution. Statistical hypothesis tests are the simplest statistical techniques for novelty detection . Among the different statistical tests for novelty detection, here we concentrate on more advanced statistical modeling methods involving complex, multivariate data distributions. Techniques for estimating the underlying data density from multivariate training data broadly fall into parametric and nonparametric methods. The former imposes a restrictive model on the data, leading to a large bias when the model does not fit the data; the later builds up a very flexible model with fewer assumptions but requires a large sample size for a reliable fit of all free parameters when the model size becomes large.
In parametric approaches, the widely used distribution form for continuous variables is Gaussian. The involved parameters are estimated from the training data via
Non-parametric methods do not assume a fixed structure of a model; the model grows in size as necessary to fit the data and accommodate the data complexity. A common non-parametric approach for probabilistic density estimation is the kernel density estimator , which estimates the probability density function with a large number of kernels over the data space. The kernel density estimator places a kernel (e.g. Gaussian) on each data point and then sums the contributions from a localized neighborhood of the kernel. This is the so-called
4.2. Other categories
Distance-based approaches, such as clustering or nearest-neighbor methods [35-37], are another types of techniques that can be used for classification or for estimating the probability density function of data. The underlying assumption is that “normal” data are tightly clustered, while novel data occur far from their nearest neighbors. These methods use well-defined distance metrics to compute the distance (e.g. similarity measure) between two data points.
Reconstruction-based methods involve training a regression model with the training data [3, 38, 39]. The distance between the test vector and the output of the system (i.e. the reconstruction error) can be related to the novelty score, which would be high when “abnormal” data occurs. For instance, neural networks can be used in this way and show many of the same advantages for novelty detection as they do for typical classification applications. Another type of reconstruction-based novelty detection is subspace-based techniques. They assume that data can be projected or embedded into a lower dimensional subspace, which makes better discrimination of “normal” and “abnormal” data easier.
Domain-based methods often aim to describe a domain that contains “normal” data through a boundary around the “normal” class following the distribution of the data without explicitly providing a distribution [40, 41]. These techniques are usually insensitive to the specific sampling and density of the interested class. The location to the boundary is the criterion for determining the class membership of unknown data. Novelty detection support vector machines (SVMs) are the “one-class SVMs”, which set the location of the novelty boundary only based on the data lying closest to it in the transformed feature space. That is, the novelty boundary is determined without considering the data that are not support vectors.
Information-theoretic methods calculate the information content of a dataset with measures such as entropy, relative entropy, and Kolmogorov complexity, etc. [42, 43]. The key idea is that novel data alter the information content in a dataset significantly. A common procedure is: metrics are computed using the entire dataset and then the subset of points whose elimination from the dataset causes the largest difference in the metric are identified. The data contained in this subset is then assumed to be novel data.
5. Robust estimator and outlier detection in high-dimensional medical imaging
The statistical analysis of medical images is challenging, not only because of the high-dimensionality and low signal-to-noise ratio of the data, but also due to varieties of errors in the image acquisition processes, such as scanner instabilities, acquisition artifacts, and issues associated with the experimental protocol . Furthermore, populations under study typically present high variability [45, 46], and therefore the corresponding imaging data may have uncommon though technically correct observations. Such outliers deviating from normality could be numerous. With emergence of large medical imaging databases, developing automated outlier detection methods turns out to be a critical preprocessing step for any subsequent statistical analysis or group study. In addition, medical imaging data are usually strongly correlated ; outlier detection approaches based on multivariate models are thus crucial and desirable. Procedures using the classical MCD estimator are not well-suited for such high-dimensional data.
In , several extensions to the classical outlier detection framework are proposed to handle high-dimensional imaging data. Specifically, the MCD robust estimator were modified so that it can be used for detecting outliers when the number of observations is small compared to the number of available features. This is achieved through introducing regularization in the definition and estimation of the MCD. Three regularization procedures were presented and compared: regularization (); regularization or ridge regularization (); and random projections (). The idea of is to run the MCD estimator on datasets of reduced dimensionality, and this dimensionality reduction is done by projecting to a randomly selected subspace. In addition, the parametric approach of the regularized MCD estimators is compared to a non-parametric procedure, the One-Class SVM algorithm (see Section 4). Experimental results on both simulated and real data show that regularization performs generally well in simulations, but random projections outperform it in practice on non-Gaussian, and more importantly, on real neuroimaging data. One-Class SVM works well on unimodal datasets, and it has a strong potential if their parameters can be set correctly.
Outlier detection methods described above can serve as a statistical control on subject inclusion in neuroimaging. However, sometimes it is controversial regarding whether or not outliers should be discarded, and, if so, what tolerance to use. An alternative strategy is to utilize outlier-resistant techniques for statistical inference, which would also compensate for inexact hypotheses including data normality and homogeneous dataset. Robust techniques are especially useful when a large number of regressions are tested and assumptions cannot be evaluated for each individual regression, as with neuroimaging data.
Both individual subject and group analyses are required in neuroimaging. At a typical single subject level, a multiple regression model is used for the time series data at each voxel [49, 50], and outliers (or other assumption violations) in the time series would impact the model fitness. Robust regression can minimize the influence of these outliers. At the group level, after spatial normalization, a common strategy is to first save the regression parameters for each subject at each voxel and then perform a test on the parameter values. Robust regression used at this level can minimize the influence of outlying subjects. Wager et al.  used simulations to evaluate several robust techniques against ordinary least squares regression, and apply robust regression to second-level group analyses in three real fMRI datasets. Experimental results demonstrate that robust Iteratively Reweighted Least Squares (IRLS) at the second level is computationally efficient; it increases statistical power and decreases false positive rates when outliers are present. Without the presence of outliers, IRLS controls false positive rates at an appropriate level. In summary, IRLS shows significant advantages in group data analysis and in the hemodynamic response shape estimation for fMRI time series data.
Outlier and novelty detection is a primary step in many data mining and analysis applications, including healthcare and medical research. In this chapter, statistical and machine learning methods for outlier and novelty detection, and robust approaches for handling outliers in data and imaging sciences were introduced and reviewed. Particularly, we also presented our new method for outlier detection in time series data based on the Voronoi diagram (i.e. MVOD). There are several key advantages of our method. First, it copes with outliers in a multivariate framework by accounting for multivariate structure in the data. Second, it is flexible in extracting valid features for differentiating outliers from non-outliers, in the sense that we have the option of using or not using a parametric model. Lastly, Voronoi diagrams capture the geometric relationship embedded in the data points. Initial experimental results show that our MVOD method can lead to accurate, sensitive, and robust identification of outliers in multivariate time series.
It is often difficult to reach a precise definition of outlier or novelty, and suggesting an optimal approach for outlier or novelty detection is even more challenging. The variety of practical and theoretical considerations arising in real-world datasets lead to the variety of techniques utilized . Therefore, there is no single universally applicable detection method due to the large variety of considerations, which could include the application domain, the type of data such as dimension, and the availability of training data, etc. Based on the application and the nature of the associated data, developing suitable computational methods that can robustly and efficiently extract useful quantitative information from big data is still a current challenge and gaining increasing interest in data and imaging sciences.
This work is supported in part by a grant from the National Institute of Health, K25AG033725.
Aggarwal CC. Outlier Analysis. New York: Springer Science + Business Media; 2013.
Markou M, Singh S. Novelty detection: a review---part 1: statistical approaches. Signal Processing 2003a; 83(12): 2481-2497.
Markou M, Singh S. Novelty detection: a review---part 2: neural network based approaches. Signal Processing 2003b; 83(12): 2499-2521.
Zwilling CE, Wang MY. Multivariate Voronoi outlier detection for time series. In: Proc. IEEE Healthcare Innovation Point-Of-Care Technologies Conference 2014; in press.
Barnett V, Lewis T. Outliers in Statistical Data. John Wiley and Sons; 1994.
Tarassenko L, Clifton DA, Bannister PR, King S, King D. Novelty Detection. In: Boller C, Chang F-K, Fujino Y (eds.) Encyclopedia of Structural Health Monitoring. 2009. Chapter 35.
Davies L, Gather U. The identification of multiple outliers. Journal of American Statistical Association 1993; 88(423): 782-792.
Rousseeuw P. Multivariate estimation with high breakdown point. In: W. Grossmann et al. (eds.) Mathematical Statistics and Applications. Budapest: AkademiaiKiado; 1985. Vol. B, p283-297.
Ben-Gal I. Outlier detection. In: Maimon O, Rockach L (eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers. Kluwer Academic Publishers; 2005. Chapter 1.
Becker C, Fried R, Kuhnt S, editors. Robustness and Complex Data Structures. Berlin Heidelberg: Springer-Verlag; 2013.
Huber PJ, Ronchetti EM. Robust Statistics. John Wiley & Sons, Inc.; 2009.
Rousseeuw PJ. Least median of squares regression. Journal of the American Statistical Association 1984; 79: 871-880.
Hubert M, Debruyne M. Minimum covariance determinant. Wiley interdisciplinary reviews: Computational statistics 2010; 2: 36-43.
Croux C, Haesbroeck G. Influence function and efficiency of the Minimum Covariance Determinant. Journal of Multivariate Analysis 1999; 71: 161-190.
Lopuhaa HP, Rousseeuw PJ. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Annals of Statistics 1991; 19: 229-248.
Lopuhaa HP. Asymptotics of reweighted estimators of multivariate location and scatter. Annals of Statistics 1999; 27: 1638-1665.
Liebscher S, Kirschstein, Becker C. RDELA –--A Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Statistics and Computing 2013; 23: 677-688.
Rousseeuw PJ, Driessen KV. A fast algorithm for the minimum covariance determinant estimator. Technometrics 1999; 41(3): 212-223.
Rousseeuw PT, Aelst SV, Driessen KV, Agullo J. Robust multivariate regression. Technometrics 2004; 46(3): 293-305.
Agullo J, Croux C, Aelst SV. The multivariate least-trimmed squares estimator. Journal of Multivariate Analysis 2008; 99: 311-338.
Croux C, Joossens K. Robust estimation of the vector autoregressive model by a least trimmed squares procedure. In: Proceedings in Computational Statistics 2008; p489-501.
Preparata FP, Shamos MI. Computational Geometry-An Introduction. Springer; 1985.
Pearson, RK. Exploring Data in Engineering, the Sciences and Medicine. Oxford University Press; 2011.
Qu J. Outlier detection based on Voronoi diagram. In: Proceedings of the ADMA International Conference on Advanced Data Mining and Applications 2008; p516-523.
Neumaier A, Schneider T. Algorithm 808: ARfit-A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Transactions Mathematical Software 2001; 27: 58-65.
Bishop CM. Pattern Recognition and Machine Learning. Springer, New York; 2006.
Carvalho A, Tanner M. Modelling nonlinear count time series with local mixtures of poisson autoregressions. Comput. Stat. Data Anal. 2007; 51(11): 5266–5294.
Hoare S, Asbridge D, Beatty P. On-line novelty detection for artefact identification in automatic anaesthesia record keeping. Med. Eng. Phys. 2002; 24(10): 673–681.
Quinn J, Williams C. Known unknowns: novelty detection in condition monitoring. In: Marti J et al. (eds.) Pattern Recognition and Image Analysis, LNCS 4477. 2007. p1–6.
Parzen E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962; 33(3): 1065–1076.
Tarassenko L, Hayton P, Cerneaz N, Brady M. Novelty detection for the identification of masses in mammograms. In: Proceedings of the 4th International Conference on Artificial Neural Networks, IET. 1995. p442–447.
Kemmler M, Rodner E, Denzler J, One-class classification with Gaussian processes. In: Asian Conference on Computer Vision (ACCV), vol. 6493. 2011. p489–500.
Basseville M, Nikiforov IV. Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs; 1993.
Reeves J, Chen J, Wang XL, Lund R, Lu QQ. A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteorol. Climatol. 2007; 46(6): 900–915.
Pires A, Santos-Pereira, C. Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the International Conference on Robust Statistics. 2005.
Yong S, Deng J, Purvis M, Wildlife video key-frame extraction based on novelty detection in semantic context. Multimed. Tools Appl. 2013; 62(2): 359–376.
Hautamaki V, Karkkainen I, Franti P. Outlier detection using k-nearest neighbor graph. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 3. 2004. p430–433.
Kit D, Sullivan B, Ballard D. Novelty detection using growing neural gas for visuo-spatial memory. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 2011, p1194–1200.
Xiao Y, Wang H, Xu W, Zhou J. L1 norm based KPCA for novelty detection. Pattern Recognit. 2013; 46(1): 389–396.
Schölkopf B, Williamson R, Smola A, Shawe-Taylor J, Platt J. Support vector method for novelty detection. Adv. Neural Inf. Process. Syst. 2000; 12(3): 582–588.
Le T, Tran D, Ma W, Sharma D. Multiple distribution data description learning algorithm for novelty detection. Adv. Knowl. Discov. Data Min. 6635. 2011. p246–257.
He Z, Deng S, Xu X, Huang J. A fast greedy algorithm for outlier mining. Adv. Knowl. Discov. Data Min. 3918. 2006. p567–576.
Filippone M, Sanguinetti G. Information theoretic novelty detection. Pattern Recognition 2010; 43(3): 805–814.
Wang MY, Zhou C, Xia J. Statistical analysis for recovery of structure and function from brain images. In: Komorowska MA, Olsztynska-Janus S (eds.) Biomedical Engineering, Trends, Researches and Technologies. 2011. p169-190.
Chen G, Fedorenko E, Kanwisher NG, Golland P. Deformation-invariant sparse coding for modeling spatial variability of functional patterns in the brain. In: Proc. Neural Information Processing Systems Workshop on Machine Learning and Interpretation in Neuroimaging, LNAI 7263. 2012. p68-75.
Staib LH, Wang YM. Methods for nonrigid image registration. In: Bayro-Corrochano E (ed.) Handbook of Goemetric Computing: Applications in Pattern Recognition, Computer Vision, Neuralcomputing, and Robotics. Springer-Verlag; 2005. p571-602.
Wang MY, Xia J. Unified framework for robust estimation of brain networks from fMRI using temporal and spatial correlation analyses. IEEE Trans. on Medical Imaging 2009; 28(8): 1296-1307.
Fritsch V, Varoquaux G, Thyreau B, Poline J-B, Thirion B. Detecting outliers in high-dimensional neuroimaging datasets with robust covariance estimators. Medical Image Analysis 2012; 16(7): 1359-1370.
Worsley KJ, Friston KJ. Analysis of fMRI time-series revisited — Again. NeuroImage 1995; 2(3): 173 – 181.
Worsley KJ, Poline JB, Friston KJ, Evans AC. Characterizing the response of PET and fMRI data using multivariate linear models. NeuroImage 1997; 6(4): 305–319.
Wager TD, Keller MC, Lacey SC, Jonides J. Increased sensitivity in neuroimaging analyses using robust regression. NeuroImage 2005; 26: 99-113.
Singh K, Upadhyaya S. Outlier detection: applications and techniques. International Journal of Computer Science Issues 2012; 9(1): 307-323.