Effect of variable p (number of + values) on Stage1 preindication labeling of training data.
1. Introduction
Epilepsy is a neurological disorder that changes the observable behavior of an individual to the point of inducing complete loss of consciousness. Pharmaceutical drugs may reduce or eliminate the problems of epilepsy, but not all people respond to pharmaceuticals favorably, and some may find the side effects undesirable. EEGbased epilepsy prediction may offer an acceptable alternative or complementary treatment to pharmaceuticals. Invasive, intracranial EEG provides signals that are directly from the brain, without the muscular activity that infests noninvasive, scalp EEG. However, intracranial EEG requires surgery, which increases risk and cost of health care, while reducing the number of people able to receive medical attention. Algorithms to predict the seizure event—the ictal state—may lead to new treatments for chronic epilepsy. Finding solutions that involve noninvasive procedures may result in treatments for the largest section of the population.
2. Background
Epilepsy prediction is greater than 1 minute of forewarning before there is any visible indication that a seizure will occur. The physician does not label the preictal periods that precede the seizure—states that may indicate a seizure is near. Event characterization only labels the start time of the seizure. Consequently, labeled data for the preictal state is nonexistent, but is necessary to train a Support Vector Machine (SVM). Other researchers address this problem by assuming that the preictal phase occurs immediately prior to a seizure [1]; see Figure 1 for an example.
The labeling scheme of Figure 1 results in better than random predictions [1] under the assumption that the preictal region immediately precedes the seizure and may be exploited for epilepsy prediction. This SVM approach provides the most obvious way to label the training and testing data without any extra information being available about the EEG.
Assuming preictal dynamics occur within an hour of the seizure has the added benefit of being more likely to satisfy caregivers’ requests to have forewarning within an hour of the seizure event. Netoff et al. achieve a specificity of 77% in classifying the preictal region with no false positives with the above approach [1]. This level of accuracy is not high enough for a marketable prediction algorithm, but suggests that indicators of a seizure occur within an hour prior to a typical seizure. Netoff et al. use a “5 minute prediction horizon” where they label the preictal region. They classify preictal as being within 5 minutes of the seizure and calculate specificities according to that labeling scheme. They assert that the short time frame makes the computational difficulty of the algorithm much more manageable than algorithms that have fewer restrictions on where the preictal region is. They have a second stage of processing as well in which they look for 3 out of 5 preictal indicators in a concentrated bundle in order to achieve prediction [1].
The assumption of preictal indications near the event seems sound because a seizure resembles a dynamical phase transition. More specifically, the brain activity changes from some “normal” phase of brain activity into hypersynchronous activity. The present work assumes that the brain dynamics within an hour of the seizure are approaching a phase transition, corresponding to measureable change in the scalp EEG. A simple example of a phase transition is liquid water becoming steam due to changes in pressure and temperature. However, scalp EEG exhibits nonlinear, chaotic features that are extremely difficult to predict over long periods and are extremely sensitive to initial conditions. Consequently, seizure prediction in a very complex system like the brain is very difficult. Indeed, Stacey
One must also choose whether to use monopolar (single channel) or bipolar EEG (difference between two monopolar channels). Mirowski et al. assert that epilepsy can be predicted more effectively with bipolar features [3] because of changes in the brain’s ability to synchronize regions during a seizure. Mirowski et al. consider their preictal period to be 2 hours. They assert, “[most] current seizure prediction approaches can be summarized into (1) extracting features from EEG and (2) classifying them (and hence the patient’s state) into preictal or interictal”. They go on to enumerate more specifically on the bipolar feature set in Figure 2.
Figure 2 enumerates a feature set from all unique channel pairs [3]. After the enumeration, they use a grid search to find appropriate parameters with their SVM with a Gaussian kernel. Mirowski et al. use intracranial EEG data and obtain 100% accuracy for patientspecific machine learning models. However, no single model provides 100% accuracy for all patients [3], so they choose from among a variety of algorithms to achieve high accuracy on a patient specific basis.
By contrast, the present work uses noninvasive, scalp EEG. Moreover, the present work uses a SVM to extract seizure forewarning from the entire patient population. The goal is high accuracy. The longterm objective (not addressed in the present work) is lower healthcare cost by using one algorithm for all patients to analyze scalp EEG on a smartphone.
Previous work by Hively
3. Phasespace analysis
We use one bipolar channel of
These data were uniformly sampled in time,
A patented zerophase, quadratic filter enables analysis of scalp EEG by removing electrical activity from eye blinks and other muscular artifacts, which otherwise obscure the event forewarning. This filter retains the nonlinear amplitude and phase information [14]. The filter uses a moving window of 2
A tradeoff is required between coarseness in the data to exclude noise, and precision in the data to accurately follow the dynamics. Thus, the artifactfiltered data (
Here, INT converts a decimal number to the closest lower integer. Takens’ theorem [15] gives a smooth, nonintersecting dynamical reconstruction in a sufficiently high dimensional space by a timedelay embedding. The symbolized data from Eq. (1) are converted into unique dynamical states by the Takens’ timedelayembedding vector,
Takens’ theorem allows the
The states from Eq. (2) are nodes. The process flow,
The value, B, is the number of base cases, which establishes a normal range of activity for the patient. The value, N, is the number of sampled points that are in a cutset and graph, The value, w, as mentioned previously is the halfwidth of the eyeblink filter in sampled point units. The value, S, is the number of bins that the EEG is discretized into in order to create the bases number represented by the vector y(i) in Figure 4. The value, d, is number of numerals in the bases number or elements in the d dimensional vector, y(i). L is a time delay embedding that specifies the interval between points sampled in order to create a node. M is a second time delay embedding that specifies the interval between two connected nodes. The parameters mentioned are all used to generate the phase space graphs illustrated in Figure 4.
The dissimilarity measures involve counting unique nodes and links (those not in common between the two graphs): (1) nodes in graph A but not in B; (2) nodes in B but not in A; (3) links in A but not in B; and (4) links in B but not in A. Nodes and links in common between graphs do not indicate change and are not useful. These measures sum the absolute value of differences, which is better than traditional measures that use a difference of averages. Each measure is normalized to the number of nodes (links) in A (for A not in B) or in B (for B not in A). This feature vector,
The present work uses a SVM approach to obtain forewarning from the normalized dissimilarity measures—namely, we find nonlinear regions in the feature space using a SVM. Figure 5 shows the calculation of the dissimilarity measures [4]. The frequency of nodes and links is not used because Takens’ theorem guarantees topology, but not density—meaning Takens’ theorem doesn’t guarantee useful information in the repetition of nodes or links.
The dissimilarity measures in Figure 5 capture topology changes between two graphs. While node and link differences are basic graph measures, they quantify the hypothesis in a very simple and general way. Less commonality of nodes and links between two graphs produces larger dissimilarity measures, which are used to capture changes in topology. Topology change is a necessary, but not sufficient condition for a phase transition [17]. Our results show that changes in topology over extended periods indicate a higher likelihood of observing a phase transition as an indicator of an impending seizure. Additionally, the four graph dissimilarity measures from nodes and links rely on two concepts from set theory and Venn diagrams. Node dissimilarity and link dissimilarity are broken into two measures of dissimilarity each. Comparing two graphs (A and B) results in differences in nodes, as well as links. The dissimilarity measures are used as SVM features (for a total of 4 features in the Stage1 SVM described below), and include the nodes in graph A that are absent from graph B, links in graph A that are absent from B, nodes in graph B has that are absent from graph A, and links in graph B that are absent from graph A. All four dissimilarity measures are normalized and vary with cutset. Figure 6 shows these dissimilarity measures varying with time and how each cutset results in features and labels (“+” for preictal, and “–“ for interictal).
Analysis of graph dissimilarity measures by a SVM allows quantification of the change in topology over time by determining how dissimilar the graphs must be to predict an epileptic event. The details of the forewarning algorithm are in Section 5—with a brief overview of Support Vector Machines in Section 4.
4. SVM with RBF kernels
SVMs are one of the most commonly used supervised learning tools. The SVM approach was originally designed as a twoclass (binary) classifier, but has been expanded to single and multiple classes. A SVM without a kernel function performs linear classification by finding a hyperplane in the feature space that best maximizes a margin of separation between two classes with a given list of features.
SVM kernels define the similarity between two points in the feature space. For example, with a radialbasisfunction (RBF) kernel, two points are said to be similar when they are proximate to one another in the feature space. The RBF kernel function for two points in a feature space evaluates to 1 when the distance between the two points approaches zero. The RBF, Gaussian kernel function evaluates to 0 as the distance between the two points becomes very large. The region where the kernel function evaluates to zero is parameterized by the value of gamma (
Each point in Figure 7 is equivalent to one instance of a class. Positive class values are denoted by positive signs and negative class values are denoted by negative signs. The Cartesian dimensions are the feature values—such as a dissimilarity measure. More than two features (two dimensions) can be used with a SVM, but it is more difficult to visualize when more than 3 dimensions (features) are involved. The main requirement of a RBF kernel is that the training set has a representative sample of the data that will be observed in the future and enough features to make distinctions between classes. Additionally, the range and scale of each feature has a large effect on the value of
Here,
Once the vector
Once the vector
Without a kernel, the vector
LIBSVM makes the details of the linear algebra of the training and testing transparent to the user and is easily used with the intuitions given in this Section [19]. The effect of a RBF kernel is that points proximate to one another will be labeled as belonging to the same class. Multiclass classification is treated as several binary classifications and is beyond the scope of the present work.
5. SVM forewarning algorithm using graph dissimilarity features
Labeling training data as preictal or interictal requires assumptions that must be sound. Current epilepsy prediction algorithms offer guidance about acceptable assumptions. The goal is enough forewarning to stop or mitigate an event. Patients and caregivers [20] suggested 16 hours for safety, planning the day, and “driving myself to the hospital.” Nonparent caregivers preferred 25 minutes to 1 hour for travel to the patient’s location. Others gave 35 minutes, because longer forewarning was seen as more stressful to the patient. These requirements—as well as previous research indicating that these constraints are a reasonable request—led to the labeling scheme used for preictal indications for a SVM. For epileptic event data sets, the preictal region is labeled as being anywhere from 3.3 minutes to 70 minutes before the seizure. Each epileptic patient is labeled as being preictal for the same length of time prior to the seizure. Each plus and minus sign in Table 1 represents a 3.3 minute window (consistent with a cutset length of 49716 points, sampled at 250 Hz). The number of pluses is determined by a parameter (p) that is varied during cross validation. Figure 6 shows how the signs in Table 1 relate to graphs, features, cutsets, and dissimilarity measures.



Event data set  +  +++++++++++++++++++++ 
Nonevent data set     
Value of p  1  21 
The input labeling (e.g., Table 1) assumes only approximate correctness and uses class weights to vary the correctness likelihood. The SVM methodology is implemented in three Stages with 10fold cross validation. Stage1 constructs a classifier that can label the preictal state indicators. Stage2 determines how long a patient must exhibit preictal indicators in order to be certain of a seizure. Stages 1 and 2 establish cross validation accuracy and error. The SVM forewarning algorithm and previous voting method algorithm [4] both imply that patients must be in abnormal states a higher portion of the time before they are likely to have a seizure. Datasets without seizures can have infrequent abnormal states as well. Stage3 obtains two models that can be used for seizure prediction in an ambulatory setting. Cross validation results in k different classifiers that leave out disjoint sets of data to establish an offtrainingset error (OTS error) to avoid overconfidence in accuracy. However, k slightly handicapped classifiers result in either less accuracy than is possible or more complexity in creating an ensemble. Stage3 avoids this unnecessary choice by performing cross validation to create a final SVM model that includes all of the available data. Accuracy and error rates are statistically stronger, when they are reported from cross validation. The statistical claims are less robust, when one trains and tests on the same data. SVM with a RBF kernel is particularly susceptible to overfitting—implying the need for cross validation. Figure 8 shows an outline of this threeStage algorithm.
Figure 9 shows how Stages 1 and 2 flow together. Stage3 involves training the RBF Model on all of the Stage1 cutsets (4244 rows, instead of approximately 90% of it) and training the linear model on the all of the Stage2 results (60 rows, instead of 90% of it). Then, one predicts on the training data to verify that the model is working as expected to produce
Figure 9 shows that event datasets are labeled in Stage1 as preictal (+) in a window of p cutsets prior to the seizure and interictal () outside of this window. All cutsets in nonseizure datasets are labeled as interictal (). A cost sensitive SVM is used to account for the uncertainty in the preictal and interictal labeling. The motive for this labeling scheme is the caregiver’s desire to have forewarning within an hour of the event. Indicators are assumed to be near the event, and the time window is varied by the parameter (p) that is tested during cross validation. The assumption behind the design choice is that the preictal state is a rare occurrence. Because there may be similar points in the feature space outside the hour timeframe being labeled as preictal and interictal, the variable for class weights are varied via a MonteCarlo search over the parameter space during cross validation to determine how to tiebreak. Other parameters are also varied randomly during the MonteCarlo search, as shown in Table 2. To compensate for labeling uncertainty, the cost sensitive SVM adjusts a weight on the class labels to indicate how certain the labeling scheme is for the preictal and the interictal classes. The labeling scheme combined with the features, training set, class weights, and gamma creates regions in the feature space that will be associated with one class or another. Additionally, we use stratified cross validation—maintaining a ratio of 4 event patients and 2 nonevent patients in each strata of cross validation. Cross validation is performed on 90% of the patients—54 patients in each training set and 6 patients in each test set—having varying numbers of cutsets due to having varying length observations. This process is repeated 10 times with disjoint sets of patients in each test set.
Successive, contiguous occurrences of preictal indicators trigger an alert (prediction of an event). Nonevent datasets have more interictal indicators labeled with negative symbols, while event patients have fairly dense preictal indicators—labeled with plus symbols. A single preictal indicator is usually not enough to make accurate predictions.
The accuracy of the assumptions is reflected in the success rate of the predictions during cross validation. The parameters that appear to be uncertain are left as search variables, recognizing that more free parameters create a more computationally complex search. Too many variables result in a computational explosion in the CPU time to explore the search space. Each point in the parameter space corresponds deterministically with a crossvalidation error rate. Assumptions about the certainty of the training or testing example’s labels are represented by class weights in a cost sensitive SVM. Table 2 shows the variables for the SVMs that were searched during the research for this paper.






Sets the radius of the contribution of a single point to the decision boundary  Adjusts the pliability of the Stage1 decision boundary given additional points 
Weighs how powerfully the + class influences the decision boundary  Weighs how powerfully the – class influences the decision boundary  Adjusts the pliability of the Stage2 decision boundary given additional points  Number of cutsets prior to the event that are labeled as being in the positive class. 
Figure 4 and Table 3 lists other variables that might be searched in addition to those of the SVM. Those parameters were fixed in the present analysis because Takens’ theorem is sufficiently powerful to show significant changes in topology with many sets of parameters when those topological changes are normalized properly. The dissimilarity measures for a patient reflect relative differences in graph topology. The parameter values for the phasespace graph generation are shown in the Table 3. Recall that these parameters were described near and are contained in Figure 4. Throughout the entire analysis, the parameters to generate the phase space graphs were kept fixed. These values were found in our prior work [4] to give good forewarning using ensemble voting methods mentioned in the background section.







12  7  56  77  49716  3  29 
Statistical validation of forewarning requires measures of success. One measure is the number of true positives (TP) for known event datasets (Ev), to yield the true positive rate (sensitivity) of TP/Ev. A second measure is the number of true negatives (TN) for known nonevent datasets (NEv). The true negative rate is TN/NEv (specificity). The goal is a sensitivity and specificity of unity. Consequently, minimizing the distance from ideal (
Eq. (6) is the objective function to be minimized for the OTS error rate and overfit error rate. The OTS error rate is found from 10fold cross validation from Stages 1 and 2. The overfit error rate verifies that the final models in Stage3 can correctly predict the training examples. Excessive false positives (inverse of a true negative) will cause real alarms to be ignored and needlessly expend caregiver resources. False negatives (inverse of a true positive) provide no forewarning of seizure events. A MonteCarlo search is used over variables in Table 2 because the prediction distance has very irregular, fractal behavior—with sparse parameters generating good predictions and the gradients of the parameter space being highly irregular with many local maxima and minima [4].
The algorithm for forewarning is done in two Stages of processing after producing diffeomorphic graphs and their dissimilarity measures for each graph. Stage1 uses a cost sensitive SVM type (CSVC) from LIBSVM [19] with a RBF kernel. For each iteration of the cross validation, the algorithm labels event data sets as having preictal indicators within a onehour window prior to the seizure; see Table 1. The length of this window (p cutsets) is part of the Monte Carlo search. All other values—nonevent data and event data far away from the seizure (outside the variable window)—are labeled as interictal indicators. The analysis trains on k1 sets, then predicts on a single, leftout set; this analysis is repeated k times in a kfold cross validation (with k being 10). Table 4 shows a small sample of Stage1 predictions for two patient outputs, but in actuality there are 60 sets of predictions—like Table 7 in the results section.



Event Data set  +++++++++E 53  3 
Non Event Data set  NE no prediction  0 
After making the six predictions on six patients for one of ten cross validation runs, one creates a new set of cross validation folds (representing sets of patients) for the Stage2 analysis out of the Stage1’s predictions on the omitted set (similar to the middle column of Table 4). For the Stage2 single feature, one scans each of the predictions made by the Stage1 for the maximum number of contiguous occurrences of preictal indicators. Specifically, the third column of Table 4 is obtained from scanning the second column of Table 4. Stage2 of the algorithm has training and testing files that follow the format of the second and third columns of Table 5.



Event Data set  +1.0  3 
Non Event Data set  1.0  0 
The analysis labels the Stage2 training and testing values as either event or nonevent data sets. In general, there are fewer preictal indicators (+ in Table 4) in the nonevent data sets on successful cross validation runs. One determines the cross validation average prediction distance by training on values of maximum contiguous, preictal indicators from k1 subsets at the Stage2, making k predictions on the omitted subset, and then taking the average of the prediction distances.
Stage3 obtains the crossvalidation prediction distance via two models that predict on the basis of all of the data available instead of 90% of it. Specifically, Stage3 takes the 4244 cutsets labeled in Stage1 and the sixty predictions from Stage2 to create two SVM models for predicting on future patients. Stage3 involves retraining both the RBF model (from Stage1) and linear model (from Stage2). Figure 10 illustrates the flow in Stage3 after optimal parameters have been discovered via cross validation. Once the two models are obtained, they are used to predict on the original data sets to verify the model’s validity.
6. Representative results
From Eq. (6), the scenario of a classifier never getting the answer correct is
Table 6 shows representative SVM parameter values (from the Monte Carlo search) and results.
Table 6 shows that the best cross validation accuracy with an average prediction distance of 0.287 and a final model prediction distance of 0.056. Table 6 shows additional representative cross validation averages—D(Avg)—and final model prediction distances—D(final). Hundreds of runs resulted in cross validation prediction distances of <0.5. The best cross validation accuracy of.287 achieved thus far also has a fairly decent overfit error rate of 0.056—which has a value of the objective function,










1  3.929  9.732  19.160  24.860  1  3.723  0.287  0.056  21 
2  2.640  87.825  32.867  81.832  1  2.945  0.342  0.025  22 
3  2.860  10.022  85.200  50.843  2  8.033  0.374  0.075  18 
4  0.964  85.719  71.740  33.237  2  9.259  0.396  0.125  19 
5  7.709  2.587  28.705  26.903  2  2.347  0.413  0.050  20 
6  2.207  75.920  52.493  36.102  2  8.994  0.438  0.125  18 
7  6.783  18.035  49.974  30.441  2  6.985  0.456  0  21 
Example 7 in Table 6 has an average cross validation accuracy of.456 with D(final)=0 (perfect prediction). Cross validation average accuracy or error rate is the more valid statistical claim. The best cross validation accuracy represented in Table 6 is in the same realm of accuracy as Netoff et al.’s intracranial methodology. Recall that Netoff et al. claim a specificity of 77.8% with no false positives, which is approximately a prediction distance of approximately.22 (
Figure 11 shows a plot of forewarning times (typically less than 1.5h) for the finalmodel of Example 7. The number of successive contiguous indicators to trigger forewarning was found to be 2 successive + values. Table 7 shows the Stage1 predictions (Stage3g of Figure 8 for all 60 patients) to produce the distribution of forewarning times in Figure 11. See Table 7 for the cutset indications (+ or ) that correspond to the parameters in Example 7 from Table 6.
In Figure 11, the solid black line is the occurrence frequency (arbitrary units) in halfhour bins. The blue line is the cumulative distribution of forewarning versus time. The red Hbar with the star in the middle indicates the mean value of the forewarning times (approximately 1 hour) and the sample standard deviation. The result in Example 7 of Table 6 is better than random guessing or biased heuristics with D(final)=0, despite poorer cross validation accuracy than other examples. This example shows most of the forewarning times of ≤1.5 hours with a statistically significant accuracy. One can visually make a prediction by scanning Table 7 from left to right and looking for 2 contiguous plus values. When 2 values are found, the seizure is highly likely to occur. One may be tempted to reduce the forewarn time by increasing the value of positive values that trigger a forewarning, but that would likely result in bad OTS error, which is why it is not the value found by the second and third stage SVMs.
7. Discussion
Ideally, we would like to achieve an average cross validation OTS prediction distance of zero, final model prediction distance of zero, and all forewarning times <1 hour. In order to achieve this goal, additional features will need to be explored that exploit topology and the distance metric that Takens’ theorem guarantees. Some modifications for Stage2 improve the results (e.g., the choice of p cutsets prior to the event as a variable, and use of a RBF kernel instead of a linear one). More search parameters in Stage1 (e.g., those in Table 3) should lead to better results with enough CPU time. Additional graph dissimilarity measures may be helpful. We have discovered more features for Stage1 that may be of use. More data is needed for a robust statistical validation of the model.
The choice of optimal features is very difficult. Theorems guide the choice of parameters, features, and the algorithm. Some combination of theorembased feature selection and occasional intuition derived from experimentation is the only way to keep the cost of the research initiative practical. Feature selection is one of many hard problems involved in epilepsy prediction. When one adds features, one often needs more data to make meaningful statistical assertions. Other important choices involve the type of kernel and the thresholding strategy. Linear kernels and thresholding strategies may perform well while Radial Basis Function (RBF) kernels perform poorly and vice versa. Our previous work [4] used a voting method that performed well. There is no guarantee that a set of features will behave similarly with different kernels and strategies to determine the threshold. Other measurement functions are possible under Takens’ theorem to create the phasespace states. Use of a singleclass or multiclass SVM could also prove fruitful.
The results in Tables 67 and Figure 11 are encouraging, despite several limitations, which are discussed next. (1) We analyzed 60 datasets, 40 with epileptic events and 20 without events. Much more data (hundreds of datasets) are needed for strong statistical validation. (2) These data are from controlled clinical settings, rather than an uncontrolled (realworld) environment. (3) The results depend on careful adjustment of training parameters. (4) Only physicianselected portions of the EEG are available, rather than the full monitoring period. (5) The present approach uses retrospective analysis of archival data on a desktop computer. Realworld forewarning requires analystindependent, prospective analysis of realtime data on a portable device. (6) The results give forewarning times of 4 hours or less. A timetoevent estimate is needed. (7) All EEG involved temporal lobe epilepsy; other kinds of epilepsy need to be included. (8) A prospective analysis of longterm continuous data is the acid test for any predictive approach. Prospective data were unavailable for the present analysis. Clearly, much work remains to address these issues.
8. Conclusions
The present work uses Support Vector Machine analysis to extend earlier work by Hively
Our noninvasive (scalp) EEG analysis resulted in cross validation error rates comparable to other invasive EEG approaches. Additional accuracy could be obtained by applying this methodology to specific patients on a per patient basis for custom EEG models if the data were available. Modifications could conceivably be made to the algorithm to improve the computational feasibility of per patient machine learning models. A research team could allow patients to start with groupbased models that are less accurate while the patients collect and upload ambulatory data from their devices as they use them in realworld settings. Furthermore, businesses could be compensated for creating patient specific models from patients’ ambulatory data. Additionally, our algorithms have other applications as well, such as failure forewarning in machines [21] and bridges [22].
Acknowledgments
This manuscript has been authored by UTBattelle, LLC, under contract DEAC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paidup, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
References
 1.
Netoff T, Park Y, Parhi K. Seizure prediction using costsensitive support vector machine. Engineering in Medicine and Biology Society (2009) EMBC 2009 Annual International Conference of the IEEE, pp. 3322–5.  2.
Stacey W, Le Van Quyen M, Mormann F, SchulzeBonhage A, “What is the presentday EEG evidence for a preictal state?” Epilepsy Res. (2011);97(3):243–51.  3.
Mirowski PW, LeCun Y, Madhavan D, Kuzniecky R. Comparing SVM and convolutional networks for epileptic seizure prediction from intracranial EEG. Machine Learning for Signal Processing (2008) IEEE Workshop on MLSP, pp. 244–9.  4.
Hively LM, McDonald JT, Munro NB, Cornelius E, "Forewarning of Epileptic Events from Scalp EEG," peerreviewed proceedings paper for Biomedical Science and Engineering Conference at ORNL (May 2013).  5.
Hively LM, Protopopescu VA, Munro NB, “Enhancements in epilepsy forewarning via phasespace dissimilarity,” J Clin Neurophysiol . (2005);22(6):402–9.  6.
Percha B, Dzakpasu R, Żochowski M, and Parent J, “Transition from local to global phase synchrony in small world neural network and its possible implications for epilepsy,” Phys. Rev. E (2005);72, paper #031909  7.
Van den Broeck C, Parrondo JMR, and Toral R, “Noiseinduced nonequilibrium phase transition,” Phys. Rev. Lett. (1994);73, 33953398  8.
Pittau F, Tinuper P, Bisulli F, Naldi I, Cortelli P, Bisulli A, et al., “Videopolygraphic and functional {MRI} study of musicogenic epilepsy. A case report and literature review,” Epilepsy & Behavior (2008);13(4):685 – 692.  9.
Jenkins JS. “The Mozart effect,” J. Royal Soc. Med. (2001); 94:170172  10.
"1020 System (EEG)." Wikipedia . Wikimedia Foundation, 22 July 2013. Web. 16 Aug. 2013.  11.
Hively LM, Protopopescu VA, “Channelconsistent forewarning of epileptic events from scalp EEG,” IEEE Trans Biomed Eng . (2003);50(5):584–93.  12.
Hively LM, “Prognostication of Helicopter Failure,” ORNL/TM2009244 , Oak Ridge National Laboratory, Oak Ridge, TN (2009).  13.
Protopopescu VA, Hively LM, Gailey PC, “Epileptic event forewarning from scalp EEG,” J Clin Neurophysiol . (2001);18(3):223–45.  14.
Hively LM, et al., “Nonlinear Analysis of EEG for Epileptic Seizures,”ORNL/TM12961 , Oak Ridge National Laboratory, Oak Ridge, TN (1995).  15.
Takens F, “Detecting strange attractors in turbulence,” In: Rand D, Young LS, editors. Dynamical Systems and Turbulence , Warwick 1980 [Internet]. Springer Berlin Heidelberg; 1981. p. 366–81.  16.
Bondy JA, Murty USR, Graph Theory , Springer (2008).  17.
Franzosi R, Pettini M, “Topology and phase transitions II. Theorem on a necessary relation,” Nuclear Physics B (2007);782(3):219 – 240.  18.
Ng A. "Machine Learning: SVM Kernels." Coursera. Stanford University, n.d. Web. 10 Oct. 2013.  19.
Chang CC, Lin CJ, “LIBSVM: A library for support vector machines” ACM Trans. Intell. Syst. Technol. ( 2011);2(3):27:1–27.  20.
Arthurs S, Zaveri HP, Frei MG, Osorio I, “Patient and caregiver perspectives on seizure prediction,” Epilepsy Behav . (2010);19(3):474–7.  21.
Protopopescu V, Hively LM, “Phasespace dissimilarity measures of nonlinear dynamics: Industrial and biomedical applications,” Recent Res. Dev. Physics (2005);6, 649688.  22.
Bubacz JA, Chmielewski HT, Pape AE, Depersio AJ, Hively LM, Abercrombie RK, “Phase Space Dissimilarity Measures for Structural Health Monitoring,” (2011) ORNL/TM2011/260 (Oak Ridge National Laboratory).