A Review on Machine Learning and Deep Learning Techniques Applied to Liquid Biopsy

For more than a decade, machine learning (ML) and deep learning (DL) techniques have been a mainstay in the toolset for the analysis of large amounts of weakly correlated or high-dimensional data. As new technologies for detecting and measuring biochemical markers from bodily fluid samples (e.g., microfluidics and labs-on-a-chip) revolutionise the industry of diagnostics and precision medicine, the heterogeneity and complexity of the acquired data present a growing challenge to their interpretation and usage. In this chapter, we attempt to review the state of ML and DL fields as applied to the analysis of liquid biopsy data and summarise the available corpus of techniques and methodologies.


Introduction
Biological and medical sciences are becoming increasingly data-rich and informationintensive. This tendency, along with the growing availability of such data, provides a better understanding of important questions regarding functions of organisms, causes of diseases, etc. However, both the inherently massive complexity of biological systems and the high dimensionality and noisiness of data thus acquired can make it remarkably difficult to correctly infer such mechanisms. Machine learning (ML) and deep learning (DL) techniques are quickly becoming highly useful tools for solving difficult problems in biology and medicine by providing mathematical apparatus for analysing vast amounts of information that would otherwise be difficult to process and interpret. Additionally, these fields themselves provide new challenges for machine learning that can ultimately advance existing ML techniques and give rise to new ones.
The mutual history of machine learning and biological and medical disciplines is both long and complex. An early ML technique, the perceptron, was made in attempt to model the behaviour of biological neurons [1] and was used early on to define the start sites of translation initiation sequences in E. coli [2], and can be considered the starting point of the entire field of machine learning. In the last few decades, the power, flexibility, and accessibility of ML and DL techniques have grown considerably, and it can be expected that they will provide significant assistance in the discovery and understanding of the mounting volume of biological and medical data.
In this chapter, we first provide an overview of the commonly used ML and DL techniques and strategies and outline their broad areas of applicability with regard to processing and analysis of biological and medical data. Next, we attempt to summarise the available corpus of research and development concerning the application of ML and DL techniques to the process of analysis and interpretation of biomedical data, focusing on liquid biopsy analysis, outline several of the main avenues of such research, and predict the potential improvements and changes in this highly dynamic and quickly developing field. Expertise in ML is not a prerequisite for this chapter, although we assume basic overall familiarity with the most well-known ML and DL models, techniques, and methodologies.

Machine learning strategies
This overview is limited to classical software-based tools and techniques for brevity's sake. Several hardware-based approaches are mentioned in the Future Prospects section.
The ML ecosystem is both extensive and complex [3][4][5], with many possible ways to subdivide or classify its members. One frequently used classification scheme outlines two broad groups of ML algorithms: supervised learning, where the model is presented with both a set of labelled example inputs and desired outputs (called the training dataset), with the goal to learn a mapping from inputs to outputs, and unsupervised learning, where no labels are given to the model, leaving it to learn the input-output mapping in unstructured data. A notable specific case of supervised learning is reinforcement learning (RL), where training data consists only of positive ("reward") and negative ("punishment") feedback, given according to the model's performance in the training environment.
Another informative approach to classifying ML algorithms is based on the desired type of output of the given model, such as classification (division of the input data into two (binary classification) or more (multi-label classification) predetermined groups), clustering (similar to classification but with the groups not known beforehand), dimensionality reduction (simplification of high-dimensional input data by mapping them into a lower-dimensional space), search, etc. Of these, clustering is particularly notable due to its broad and general applicability and the wide range of models, methods, and algorithms [6][7][8] that can be employed to carry out cluster analysis.
The notion of "cluster" is often not precisely defined and tends to serve as an umbrella term for various types of data objects, typically groups of data points with small distances (appropriately defined) between group members, higher-density areas of some parameter space, particular statistical distributions, etc. The desired clustering algorithm, therefore, depends on both the given data set and the intended application of the returned results. Due to these complications, clustering, like many other data analysis methods, is typically not fully automated even within the domain of machine learning but instead tends to partially rely on preprocessing and initial parameter selection, based on the specifications of the task at hand.
Deep learning is a subclass of machine learning problems, the distinction being based on the training data representations instead of specific algorithms. Similar to ML in general, deep learning can be both supervised and unsupervised [9]. Deep learning models tend to be vaguely similar to information processing patterns in biological brains (and are therefore often called artificial neural networks), in that they use multiple layers [10] of non-linear processing units (frequently called "neurons", even though their similarity to biological neurons is usually limited) for pattern recognition and transformation, with each successive layer using as inputs the output from a previous layer, forming a hierarchy of representations and levels of abstraction. The number of hidden layers of an artificial neural network broadly determines the "computational power" of the network [11] (Figure 1).
Machine learning models have been applied to a wide variety of fields and problem classes, including computer vision, natural language processing, machine translation, bioinformatics and biochemistry [12], with results often similar or superior [13] to those produced by human domain experts.

Classifying blood cells with deep convolutional neural networks
An important part of the data acquired by blood tests is the number of white blood cells (WBCs) or leukocytes, usually differentiated into total and differential WBC count, where the latter describes the absolute and relative numbers of WBC subtypes (neutrophils, lymphocytes, basophils, eosinophils, and monocytes) in the sample. The amount of WBCs in the sample provides information on the state of the patient's innate and adaptive immune system, e.g., a significant changes in the WBC count relative to the patient's baseline is evidence that their body is being affected by an antigen, whereas variations in the specific WBC subtypes can correlate with specific types of antigens or different pathways of immune and inflammatory reaction. Therefore, detailed measurement and understanding of the WBC counts is an important part of the quantitative picture of health and the organism's general condition.
Traditional methods of estimating the WBC count generally fall into one of two categoriesmanual and automated. The historical manual inspection of the blood sample involved counting the number of cells in a blood sample under a microscope and extrapolating under the assumption of uniform cell distribution across the entire bloodstream. Automated methods involve specialised equipment such as Coulter counters [14] or laser flow cytometers [15] which can provide accurate results and good performance [16] but are generally expensive and require specialised training to operate.
In this light, the ML-based approach provides a potential improvement over the aforementioned techniques due to several reasons. First, it requires far less expensive equipment due to being built around simple imaging solutions. Furthermore, unlike earlier methods, it is able to provide almost instantaneous results after the initial training stage. Finally, its performance can be expected to improve over time, in proportion to growing dataset sizes and, being mostly software-based, it can be expanded and advanced continually and "over the air", without requiring extensive changes in the underlying infrastructure.
We illustrate this approach using an example problem provided by Athelas team [17], namely, binary classification of a stained image of a WBC as either polymorphonuclear or mononuclear. 1 The training dataset consisted of hand-labelled images of stained WBCs of all given types in various proportions. Before the dataset could be used, several preprocessing steps were taken, including removing images with multiple cells and using transformations such as flips and rotations in order to increase the size and variability of the training dataset. By using transformed versions of the images, the training dataset size was increased from approximately 350 to 10 4 . For the ML model, Athelas team used the LeNet-5 [18] convolutional neural network (CNN) [3,4,9] due to its simplicity and availability. The model was tested against a test dataset of 71 images (20% of the original training set and 0.7% of the training set after transformations), with the high accuracy of 98.6%. While the presently used model performs less well (accuracy of 86%) when classifying WBCs into multiple individual type categories as opposed to binary classification, given the high performance and simplicity of this purely software-based approach, Athelas team plans to extend it to more complex problems, including datasets containing other cell types, which could enable faster improvement cycles, increased accessibility, and better patient outcomes, compared to previously used methods of cell count analysis.

Using deep neural networks for detection of ageing-related biomarkers
During the last decade, human ageing research has received an increasing amount of mainstream interdisciplinary attention [19,20], with an emerging tendency to approach various aspects of the natural ageing process as potentially treatable conditions.
Insilico team developed a DL system designed to predict human chronological age from biochemical data obtained from a basic blood test [21], narrowing an extensive set of potential ageing-related biomarkers to a limited subset of the most salient ones. A dataset of > 6 Â 10 4 records was used, with each record consisting of a patient's age, sex, and 46 blood biochemical markers. The dataset was preprocessed, normalising all blood marker values to 0-1 range, and then split into training and test datasets with ratio 90:10.
An ensemble of 21 feed-forward deep neural networks (DNNs) was created as the ML model, with a range of values assigned to DNN parameters such as the number of hidden layers, the number of processing units per layer, activation function, and optimization and regularisation methods. The permutation feature importance method [22] was used to evaluate the relative importance of the various biochemical markers with regard to ensemble accuracy. Batch normalisation [23] was used to reduce the effects of overfitting and increase the stability of convergence of the models.
The best results were obtained from a DNN with five hidden layers, using regularised mean squared error (MSE) function as the loss function, parametric rectified linear unit (PReLU) [24] activation function in each layer, and AdaGrad [25] optimiser of the loss function. The highestscoring DNN performed with 82% ε-prediction accuracy at ε ¼ 10 (i.e., considering the sample as correctly recognised if the predicted age is AE10 years of the true age), out-performing several classes of competing ML models. Multiple models for combining individual DNNs into an ensemble (stacking) were evaluated, with the best being the elastic net model [26]. The most important blood markers were discovered to be albumin, glucose, alkaline phosphatase, urea, and erythrocyte count.
Insilico team created an online service (http://www.aging.ai) to make the DNN ensemble available to the general public, allowing patients to use their blood test data to evaluate the age prediction system and serving as a proof of concept for estimating ageing-related variables using readily available biochemical data. Additional data sources, including transcriptomic and metabolomic markers from liquid and individual organ biopsies, as well as imaging data, are being considered. Insilico team suggests that similar systems could also be developed for model organisms in order to perform cross-species analysis of individual biological markers and their importance in predicting both chronological and biological age.

Machine learning-based approach to Alzheimer disease biomarker discovery
In their study, Smalheiser team has developed [27] a ML-based model for predicting Alzheimer disease (AD) status of individual samples with high accuracy, using miRNAs and other small RNAs extracted from circulating exosomes obtained from liquid biopsy (blood plasma) samples.
A sample set of N ¼ 70 was used to construct the training dataset consisting of normalised miRNA expression data across 465 loci. Cross-validation was used in feature selection to evaluate the impact of values from specific loci as features. The samples were randomly divided into 7 partitions of 5 positive and 5 negative samples each and cross-validation was performed on these partitions, using 6 partitions for training and 1 for evaluation. The random partitioning was repeated 10 times in order to acquire 70 estimate points of the performance measures of interest, one for each sample in the set. These values were averaged and their relative performance was assessed using area under the curve of the receiver operating curve (ROC), Matthews correlation coefficient (MCC) [28], and F1 score.
Smalheiser team evaluated three different ML classifier algorithms-C4.5 decision trees [29] (using the J48 implementation), support vector machines (SVMs) [30], and adaptive boosting (AdaBoost) [31]. After selecting 50 most significant features, as per Mann-Whitney U test [32], the C4.5 classifier produced the best results, based on which it was selected as the feature selection method. The feature significance was measured by the number of times the given miRNA locus was used as a node in the decision tree over the 70 runs. The 18 highest-scoring features were selected to move on to the next step. AdaBoost algorithm was used for the final feature selection from the set of 18 features, producing an optimised set of 7 features which were then used with all 70 data samples to produce the final dataset.
The best model used by Smalheiser team was able to correctly classify, on average, 29 out of 35 samples from the AD group and 31 out of 35 samples from the control group, yielding accuracy in the range of 83-89%. Smalheiser team concluded that ML-based classifiers are able to produce highly accurate predictions of AD occurrence, using a dataset of only 7 miRNAs and that integrating exosome miRNA data with other data is likely to further increase performance of these models.

Detection and classification of circulating tumour cells using machine learning methods
The presence of circulating tumour cells (CTCs) in blood samples indicates the tumour response to chemotherapeutic drugs and contributes to the mechanism for subsequent growth of derived tumours (metastasisation) in distant tissues. Evaluation of CTCs can yield the diagnosis or help to follow the tumour response to chemotherapeutic drugs.
Mao team designed a deep (six layers) CNN for image-based circulating tumour cell detection with automatically learned network parameters [33]. They used a dataset of 45 phase contrast microscopy [34,35] images, of which 35 randomly selected images were used for training and the remaining 10 for testing the network. The experiment was repeated 5 times in order to minimise network bias.
The CNN received normalised 40 Â 40 pixel images as input. They were passed to a layer of 6 convolutional filters with the size of 5 Â 5, followed by a max-pooling layer in order to extract the local signal in every 2 Â 2 pixel region, defined by the max-pooling function, where p; q ð Þ-pixel coordinates, y-input map, z-output map. This layer was followed by another convolutional filter layer, consisting of 12 filters, and, subsequently, by another maxpooling layer. The last layer was fully connected to the output layer by way of dot product between the weight and input vectors, passed to the sigmoid function which maps the values to the À1; 1 ½ range. The filter parameters, network bias terms, and weight matrices were automatically adjusted by backpropagation with learning rate set to 0:1.
Mao team compared their CNN-based classifier to a simpler, SVM-based method that depended on hand-crafted feature sets. Using the F-score (harmonic mean of precision and recall scores) as the comparison metric, they found that, after two rounds of five iterations, the F-score of the CNN-based classifier was 0.97, by 18.6 points exceeding the F-score (0.784) of the SVM-based classifier and hand-crafted feature set. They concluded that the CNN-based classifier presents a promising development towards automated CTC detection in images taken from blood samples, and that the technique could be adapted for use with microfluidicsbased liquid biopsy platforms for early diagnosis and monitoring.

Using artificial neural networks for lung cancer detection and diagnosis
Goryński team describes [36] an artificial neural network (ANN)-based model class used for early detection and diagnosis of lung cancer. In their study, a dataset consisting of a wide range of biochemical parameters obtained from blood samples, as well as results from medical interviews (48 values in total) from 193 patients of mixed age and sex was used to train a family of 10 multilayer perceptron network (MLP) [3,4] architectures, using a range of activation functions (linear, logistic, and tanh) for both hidden and output layers, as well as varying number of processing units ("neurons") in the hidden layer and different training algorithms (gradient descent, Broyden-Fletcher-Goldfarb-Shanno (BFGS) [37], and scaled conjugate gradient (SCG)) [38].
Goryński team found that two of the trained models, named MLP 48-9-2 2 (trained using BFGS algorithm and using linear and tanh activation functions for hidden and output layers, respectively) and MLP 48-15-2 (SCG algorithm, logistic and tanh activation functions) gave highly accurate results in terms of inferring the presence or absence of lung cancer from the given set of variables, with ROC value reaching 99.83%.
Goryński team concluded that these, relatively simple, ANN solutions, while not viable as a full substitute of expert opinion, are nonetheless efficient in early diagnosis and risk prognosis of lung cancer and therefore are promising as potential improvements over and additions to the existing inventory of diagnostic and prognostic methods.

Mutation prediction and early lung cancer detection in liquid biopsy using convolutional neural networks
The proliferation of cancer cells is driven by specific somatic mutations in the cancer genome [39]. To fulfil the high expectations associated with liquid biopsy, such as comprehensive characteristics of the whole tumour in contrast to limited sampling in the traditional tissue biopsy, or dynamic assessment during treatment, the somatic mutations must be detected with high sensitivity and accuracy; limited coverage depth is not sufficient. Kothen-Hill team has demonstrated a CNN-based classifier system named "Kittyhawk" [40] that enables the detection of cancer-related mutations even in extremely low variant allele frequencies (VAFs), more than 2 orders of magnitude lower than is possible with the currently available methods.
For training dataset, whole genome sequencing (WGS) data from 4 non-small cell lung cancer (NSCLC) patients and 3 melanoma patients were used, with > 1:2 Â 10 7 reads in total. To ensure adequate genetic context regardless of variants appearing at the end of the read, additional bases were added to both ends of the read. Additional bases were also added to ensure equal read length in cases where a read is shorter than 150 bp.
Kothen-Hill team chose an 8-layer CNN with a single fully connected output layer, similar to the VGG 3 architecture [41], with a perceptive field of size 3 used to convolve the features, based on results of [42] who showed that the tri-nucleotide context contains distinct mutagenesisrelated signatures. After 2 successive convolutional layers, downsampling by max-pooling with a receptive field of 2 and a stride of 2 was applied, forcing the model to retain only the highest-importance features, as per [43]. The output of the last convolutional layer was directly connected to a fully connected sigmoid output layer for final classification. A logistic regression layer was used to retain the features associated with the position of the read.
The model was trained using minibatch stochastic gradient decent (SGD) with batch size of 256, initial learning rate of 0.1, and momentum of 0.9, with batch normalisation [23] and a rectified linear unit (RLU) [44] applied after each convolutional layer.
Kothen-Hill team presents the Kittyhawk architecture as a first of its specific kind, being able to avoid the information loss associated with similar earlier architectures. To evaluate the performance of the model, a test dataset consisting of > 2 Â 10 5 reads that were split off the training set of reads from the 4 NSCLC patients was used. Kothen-Hill team found that the model achieves F1 accuracy of 0.961 when using this test dataset, and 0.92 when using data from an additional independent NSCLC case. When further tested against data from a melanoma case, F1 accuracy of 0.71 was achieved, indicating that the model had learned specific mutation patterns associated with NSCLC, as well as a more general pattern associated with both NSCLC and melanoma.
Kothen-Hill team presents the Kittyhawk CNN model as the first ML architecture designed specifically for detecting cancer-related mutations in a low allele frequency environment, such as liquid biopsy and might serve as the foundation for novel early stage cancer detection techniques that could be used for both screening and prognosis.

Machine learning and nanofluidics in pancreatic cancer diagnosis
Issadore team has developed a ML-based platform [45] for isolating exosomes from liquid biopsy samples and, using the RNA inside these exosomes to diagnose pancreatic cancer in human and murine cohorts.
Using the Exosome Track-Etched Magnetic Nanopore (ExoTENPO) nanofluidics chip developed as part of the study, Issadore team successfully isolated exosomes from cell cultures, as well as human and mouse liquid biopsy (blood plasma) samples. Exosomal mRNA was subsequently extracted and used to develop a predictive panel for pancreatic cancer biomarkers.
Training datasets of 15 mouse and 10 patient profiles, respectively, were created. Linear discriminant analysis (LDA) [46] was used to identify combinations of mRNA profile that discriminated between healthy and tumour-bearing samples. The prediction algorithm was generated by running LDA on the training set, which produced a vector that was used to calculate a weighted sum such that it maximally separates the control group from the sample group with tumours. Two independent blinded test sets, mouse (N ¼ 18) and patient (N ¼ 34), respectively, were used to evaluate the performance of the LDA classifier. Fisher's exact test was used to quantify the predictive value of the classifier, yielding P < 0:001.
Although in their study Issadore team focused primarily on the development and evaluation of the ExoTENPO nanofluidics platform, they conclude that even very simple ML algorithms such as LDA can produce good quality predictive models for classifying biochemical and genetic markers and note that more advanced ML solutions could be used in future research in order to further improve performance.

Machine learning-based RNA sequencing for multi-class cancer diagnostics
Wurdinger team demonstrated a ML-based approach to sequencing and analysis of mRNAs obtained from tumour-educated platelets (TEPs) [47] as a tool for accurate tumour diagnosis, both within a single class and across six different tumour classes.
The initial dataset consisted of blood platelet samples from healthy donors (N ¼ 55) and both treated and untreated patients with six different tumour types (NSCLC, colorectal cancer, glioblastoma, pancreatic cancer, hepatobiliary cancer, and breast cancer) in various stages of advancement and metastasis (N ¼ 228). After the mRNA extraction, amplification, and sequencing, a set of approximately 5000 different mRNAs was selected for further analysis.
The accuracy of TEP-based multi-class cancer classification in the training dataset (N ¼ 175) was estimated, using an SVM algorithm. To cross-validate the SVM for the entire sample set, leave-one-out cross-validation (LOOCV) method was applied. The percentage of correct predictions was reported as the accuracy score. The algorithm was performed 175 times, in order to classify and cross-validate the entire dataset. To determine specific input gene lists for the algorithm, Wurdinger team performed ANOVA testing. They selected a set of 1072 mRNAs to use with the training dataset, yielding final accuracy of 96% and ROC value of 0.986. From the patient cohort, all 39 patients with localised tumours and 33 of the 39 patients with primary tumours in the CNS were classified as cancer patients.
Wurdinger team concluded that using the SVM classifier with TEP-based data produces highaccuracy, high-specificity models for liquid biopsy-based diagnostics for several common cancer types. They expect that using more advanced ML algorithms capable of self-learning could further improve the performance of these diagnostic models. They also suggest evaluating systemic factors such as inflammatory diseases and other non-cancerous diseases as potential factors that can influence the mRNA profile.

5.
Using machine learning to accelerate DNA sequencing and biomarker development 5.1. A supervised machine learning-based approach to DNA sequence analysis DNA sequencing and sequence analysis is an important task in many scientific and medical fields that is well-known for being both data-rich and computationally intensive. Memeti & Pllana describe a ML-based solution for optimised DNA sequence analysis [48,49]. Their algorithm leverages the increased performance and parallelisation capabilities of heterogeneous (a host central processor (CPU) in combination with a 61-core Intel Xeon Phi coprocessor) multi-core computing platform.
Memeti & Pllana used the widely known Aho-Corasick (AC) algorithm [50] as the basis for their work, since DNA analysis is a specific case of a string matching problem, where the input text is the given DNA sequence and the alphabet consists of characters corresponding to the four nucleotide bases. AC uses finite automata (FA), a simple type of formal machine in the form of a prefix tree with additional links between internal nodes. These links allow for fast failure transitions (also known as ε-transitions) between branches of the tree that share a common prefix, thus avoiding backtracking. A known drawback of the AC algorithm is its being non-deterministic. Memeti & Pllana solved the non-determinism issue by modifying the AC finite automaton so that it computes the correct transition for each state, thus eliminating failure transitions and guaranteeing that every character always has the same number of operations associated with it.
A boosted decision tree regression-based predictor [51] was used to estimate the execution time of DNA sequence analysis for both the host CPU and the Intel Xeon Phi co-processor. The predictor's output was used to partition the DNA sequence based on the S-factor, where T host and T device are execution times for the host CPU and the co-processor, respectively, and using the partitioning scheme where I is the original DNA sequence, I host is the part of I analysed by the host CPU, and I device is the part of I analysed by the co-processor.
Memeti & Pllana used the "single instruction, multiple data" (SIMD) parallelism [52] of both the host CPU and the Xeon Phi co-processor to achieve teraFLOP (10 12 floating point operations per second) performance. For experimental evaluation of their deterministic finite automata (DFA) algorithm, Memeti & Pllana used reference genomes of human and 11 different animals from the GenBank sequence database of the National Center for Biological Information, with the average dataset size of 2043 MB. In total, data from approximately 4000 experiments was used to train the performance predictor and to evaluate the DFA performance. The DFA performance was evaluated using different thread affinity modes (compact, balanced, and scatter) and numbers of threads for each of the DNA sequences. The balanced thread affinity mode evenly distributes the threads among the computing cores, compact mode completely fills a single core with threads before assigning the remaining threads to the next core, while the scatter mode distributes threads among the cores in a round-robin sequence.
Memeti & Pllana discovered that the balanced thread affinity mode is overall fastest for all of the tested DNA sequences, with second best being the scatter mode. The evaluation of DFA with regard to varying thread counts showed that the algorithm scales well up to approximately 120 threads, whereas in the 180-240 thread range the performance improvement becomes modest due to overhead from thread management operations. Performance-wise, Memeti & Pllana found that the parallel version of DFA running on a heterogeneous platform has a speed-up from 35:6Â up to 206:6Â, compared to a sequential (single-thread) version running on the host CPU, with the exact speed-up degree depending on the given host CPU.
Memeti & Pllana. intend to use this work to study and develop highly parallel DNA analysis solutions on more powerful hardware in the future.

Future prospects
While the ML models currently used in liquid biopsy analysis in particular and biological and medical research in general (typically different classes of neural networks and linear classifiers) appear to both produce accurate results and show generally high performance, they represent only a narrow subset of machine learning and artificial intelligence solutions [5]. For instance, a potentially valuable research direction might be in the form of highly advanced probabilistic graphical models [53] augmented with functionality such as one-shot learning [54] and probabilistic program synthesis [55], which could potentially allow researchers to reduce the size of the commonly massive training datasets required for creating ANN-or DL-based models.
Furthermore, with a single exception, all of the studies reviewed here have been focused on the performance and accuracy of software ML models, which is currently the predominant class of machine learning solutions. However, recent advances in general purpose computation using both graphics processing units (GPUs) and specialised application-specific integrated circuits (ASICs) tailor-made for machine learning [56] provide a strong case for the exploration and exploitation of hardware or hybrid ML solutions, as evidenced by, e.g., the results from the AlphaGo experiments and public performance [57].

Conclusions
Liquid biopsy-based approaches open many so far little explored and promising opportunities for studying and measuring biological and biochemical markers with broad applications for the monitoring, diagnosis, and prognosis of a large class of diseases and processes. Machine learning, with its advanced pattern recognition capabilities, will likely play an increasingly important role in these fields, as the amount and complexity of data produced by scientific and medical sources already by far exceeds the capacity of unaided human experts and is rapidly increasing with no foreseeable slowdown.
In addition, machine learning tools form a natural synergy with distributed, highly parallel, or cloud-based computation solutions, thus easily yielding to collaboration among researchers and medical professionals from distant locations and involving amounts of data storage and processing power previously available only on dedicated high performance computing (HPC) platforms and supercomputers. It is likely that in the near future the importance of decentralised collaboration will continue to grow, increasing the demand for powerful and easy to use toolset for analysis and processing of biological data.
Based on these trends, we expect that the next generation of liquid biopsy technologies will include many types of machine learning as an integral part of their operation and that this trend could have a significant positive impact on both diagnosis and treatment of patients.