## 1. Introduction

At the latter years of the past century, cell biology experienced a fast growth, thanks to the convergence of several techniques, which have substantially improved the confocal microscopy field. Now, the observation in real time of the structural and functional unit of life is possible. The ultrarefrigerated CCD-cameras with electromultipliers; the implementation of confocality based on disk spinning without the necessity of high-energy lasers (which could damage the living cells in a few seconds); the increasing capacity of computational processors; and the ability of the genetic engineering for coding fluorescent proteins mutants [1, 2], offering the possibility of a color palette that was previously unthinkable for the cell molecular biologists [3, 4]. All together with the ability to generate cells with fluorescent compartments, opened the doors to maybe the most remarkable and important scientific and technological development for a new era in cell biology named Live Cell Imaging. Before the 1990’-s this kind of research was known by the unpopular name “time-lapse video microscopy”, as it is detailed in a protocol book widely known at that time written by A.J. Lacey [5].

At the beginning of the new millennium, the necessity of introducing new and improved mathematical and computational tools was made evident. This was because the amount of data produced in a single experiment could overload the capacity of personal computers and the conventional software was not loaded with the required algorithms to process such data. Then a strategic alliance with researchers on the areas of artificial intelligence, applied mathematics, and physics was apparent. These new cooperations make perfect sense due that even from the beginning of life science studies, it was clear that the dynamical rules involved were complex, non-linear, and possibly not even deterministic but probabilistic. The virologists, for example, have discovered that the infection rate is governed by a non-linear pattern and the cellular physiology of several processes turn out to be more complicated than it was expected. In consequence, the mathematical modeling became the main strategy in the journey for knowing and understanding the cell biology. The amount of the data available nowadays could not be analyzed by conventional human heuristics. Fortunately, the computational biology field and its tools offer the required resolution and robustness in diverse problems. It goes even further, because the computational algorithms work evenly in any case. When dealing with complex biological problems, to have a working computational model will get us closer to the reality and help us avoiding the human bias present in heuristic approaches. It is in complex problems where the convenience of using powerful statistical tools to build models became apparent. The main strategy here is to try to “learn” the model directly from the experimental or observational data.

The term “machine learning” (ML) refers to a branch of the artificial intelligence field, that concerns to the study and construction of algorithms with the ability to learn from the existing data. In such algorithms, a set of parameters is fitted to provide the best input-output relationship between the information available. When talking about a computational code that implements the techniques, algorithms, or principles found in machine learning theory, it is usually called a machine learning program. The literature on this topic is quite large; however, some very popular books are those by Duda et al. [6], Webb [7], and Bishop [8].

A commonly accepted definition of the process of “learning” is due to Tom M. Mitchell [9]: *a machine learns to perform a task T if its performance as measured by*
*increases with the experience E*. The experience *E* is the feedback the machine received to validate its output. The ML set of techniques have a broad range of applications in several fields of knowledge, including the building of autonomous robots [10], the astrophysical data mining [11], the study of dynamical systems and complex networks without the explicit knowledge of the dynamical equations [12], the patterns and shape recognition [6, 8] in images as used by the face recognition programs in social networks web pages, the hand writing OCR (optical character recognition), and of course, in medicine (automatic diagnosis based on symptoms) and biology (gene sequencer, classification of cellular morphology, etc.).

Particularly, the shape recognition capabilities have important applications to live cell image processing. For example, in 2006, Neumann and collaborators use live cell imaging to study the RNAi screening [13]. They developed a ML that recognize the morphologies present in the cells images and associate them (classify) with the corresponding phenomenology: interphase, mitosis, apoptosis and binucleated cells morphologies were studied through a multi-class classifier using support vector machine (SVM). They reported to obtain up to 97% accuracy from the SVM in comparison with “manual” classification through the observation by some very well-trained biologists.

Due to the huge amount of data provided by a live cell image (LCI) measure, it becomes unpractical to relay only on the lecture and interpretation by a well-trained researcher. It is also possible to have different interpretations coming from different scientists when analyzing the same image. Then, the ML ability to recognize and characterize particular morphologies present in an image is very useful to avoid the slow and tedious process of visual discrimination. Also, it can avoid some human bias by following well-defined rules. However, to fully train the machine, it can be necessary to have a large number of image samples from the phenotype under study. It means, to have enough sample cells expressing the phenotype and some other cells to use as a control group. Sometimes, that condition is not fulfilled. In order to asset this kind of problem, Thouis R. Jones and collaborators implement a ML with interactive feedback to characterize diverse and complex morphological phenotypes [14]. They use the criteria of well-trained researchers as a feedback in the learning stage of the machine, and provide the code [15] for the world to use under a free license.

Several generic implementations of ML techniques have been developed and presented as toolbox in scientific software. However, it is pretty common to find the particular phenomenon under study to be better fitted by some unique implementation developed explicitly to deal with it. This can be a consequence of the particularities of the problem or sometimes this is just due to the lack of proper documentation on the available tools.

This chapter is organized as follows: first, a brief description of some common methods used to build a ML are provided, followed by a description of the performed experiment and computational analysis to obtain the information from the graphical data. Finally, the results and a proposal to create a ML to characterize viral infection are presented.

## 2. Machine learning concepts overview

From the mathematical point of view, a ML can have one of two primary objectives: regression and classification. When the machine is used to compute the best response to a given situation among a continuous range of possible answers, it is called a regression problem. And when the machine is due to choose among a discrete set of possibilities, it is called a classification problem.

The shape recognition and feature extraction from images is a classification problem, where the duty of the machine is to find the class which has the highest probability to contain the current input value. In this context, a class is defined based on a set of measurable attributes found in an image; it can be geometrical attributes (length, shape, eccentricity, size), pixel intensity, etc. In general, the input for a ML program is a set of measurable variables or attributes, which are set in vectors. It is common in ML literature to call these attributes *features* and the vectors *feature vectors*, so these names will be used in such framework in the rest of this chapter. Each input in a feature vector represents an attribute and each vector represents a state of the system.

Before the machine is ready to be used as a classifier or predictor, it needs to be trained among some data. Here, “training” refers to the process of parameters optimization, where the machine is optimized to get the best result against the training set. This process is not perfect, and some human criteria need to be implemented. If the model has not enough freedom to fit to the training set, it gets under fitted and do not reproduce the characteristics of the system under study. On the other hand, having too much freedom in the ML leads to a model that fits pretty well in the training set, but is unable to predict accurately the outcome for a feature vector outside of the training set. This is called bias. To avoid bias, it is customary to split the available data in two sets, the training set and the testing set. A trained machine is challenged with the testing set, and the accepted ML model is the one that has the best results against it.

There are two main paradigms for the training of a classifier, the supervised and the unsupervised learning. In supervised learning, each sample in the training set is consisting on a features vector and a class flag, i.e., the classes to which each sample in training set belongs are known a priori. After the learning process, a computational model that can predict the right class flag for most of the training set is obtained, and hopefully, it would predict the correct class for a new sample with high accuracy. Also, the classifier must return information about how confident its prediction is, i.e., a value of dude must be reported.

When no predefined classification is available, a ML algorithm can be used to search for common patterns or similarities into the training set, which do not contain class information yet. The machine would cluster samples with similar feature vectors to define a class, and then, it will use the found class to characterize new input data. To do that, it is necessary to define some measure of similarity (Euclidean distance in features space, for example) that can be used to group the input vectors into clusters. The objectives of this kind of ML are first to cluster the data from the training set into classes, and then, set a classifier to characterize new inputs.

ML are suitable to treat complex problems in which the explicit mathematical form describing the interactions occurring in the process are not known, i.e., the dynamical equation ruling the systems are unknown. Being so, the computations involved in a model built with ML are not deterministic but probabilistic, based on the information gathered by direct measures. The more data are available to train the machine, the more accurate the prediction will become.

### 2.1. Supervised learning

The objective of the classifier is to draw a frontier that splits the feature space into *k* disjoint subsets [16] called the border line or border hyperplane. Hopefully, each subset will contain the feature subspace associated with one single class.

#### 2.1.1. Hypothesis function

Let the features space be called

where

Another approach comes from a probabilistic interpretation of the hypothesis function. Suppose that

As any set

The principle of maximum likelihood establishes that the best model representation of the data is given by the set of parameters

#### 2.1.2. Logistic regression

Suppose the problem at hand is to determine if the measure of some experiment belong to one out of two possible outputs (like, for example, to determine if a tumor is benign or malign). A class flag 0 or 1 must be associated for each output. In this case, a common approach is to propose a logistic function (also known as sigmoid function) as a classifier, it is called a *logistic regression*:

The sigmoid function

Which can be summarized in a single probability density function (PDF)

Once the PDF is set, the process of learning consist in maximizing the likelihood of such PDF to the training data set. By computational simplicity, it is convenient to maximize instead some monotonically increasing function of the likelihood. It is common to work with the logarithm of the likelihood (log-likelihood function). When using the logistic regression, this hypothesis function would not return the prediction of an output class, but the probability for the sample feature vector belongs to a given class.

If it is needed to get a class value as an output, it can be done by setting

then minimizing the cost function Eq. (1). This last strategy is known as the *perceptron learning algorithm*. The classifier in this example can be extended to *k* classes by a simple one vs all algorithm. It is, defining a logistic regression to compute the probability for any of the *k* classes,

#### 2.1.3. Non-linear classifiers

One limitation of the techniques summarized so far is that they provide a linear model for the classifier, i.e., the decision border is represented by a straight line or hyperplane. This can work perfectly fine if the data are linear separable, or if a linear border line provides enough accuracy in the final prediction. What happens if the feature space requires a more complex non-linear decision border? i.e., if the decision border is given considerably better by an hyper surface? One possible way to create non-linear models is to use *neural networks* (NN). A neuron is a computational unit, i.e., a piece of code that performs a single task or function, usually called the activation function. This method was developed as a gross mimic of a biological neuron, where each computational neuron has a set of wires connecting it with its input and a set of wires that are used to communicate its output to the next set of neurons. Each wire between two neurons has associated a parameter, sometimes called “weight”, which is adjusted in the leaning process. So the computational model is created as an array of neurons, configured in layers that can be fully or partially connected. This kind of computational structure allows the creation of pretty complex nonlinear functions just by the selection of the network wires. After optimization, the NN computes a continuous function ready to be used for regression. For classification, a helper selection function can be implemented. A simple and yet powerful arrangement is the feedforward structure (see Figure 1), in which the neurons are arranged in a network where each layer receives its inputs directly from the layer before, and provides its outputs only to the layer after it, i.e., where the i-th layer receives the information from the (i - 1)-th layer, and sends its output to be the inputs to the (i + 1)-th layer. A common neural network used for classifiers is a three layer fully connected network, i.e., each neuron receives information for any neuron in the layer before. The first layer, called input layer, has a computational unit for each attribute in the feature vector. This unit sends its associated value to any neuron in the second or hidden layer, where each neuron computes a sigmoid function

Any neuron in the second and third layer has an associated vector of parameters that need to be trained. The training of a NN is a difficult task, where the weights connecting each pair of neuron must be learned for all neuron in any layer. For the output layer, the cost function can be computed taking into account the expected values in the training set. But for the inner layers, no expected value is known. As a consequence of this, the cost function associated to a NN is in general a non-convex function. This has strong repercussions in the optimization problem. Due to the existence of several local minimum, the convergence to a global minimum is not guaranteed.

The variational parameters have several ways to be changed that will provide approximately the same level of correction from one iteration to the next. One strategy to find the “right direction” to move the network is to take minimal changes, i.e., from all the possible variation of parameters providing the same level of correction, the network is changed in the way that the set of parameters defers the less from its previous state. This is done by applying a generalization of the gradient descent method to deal with multilayer networks, called the back propagation algorithm, the complete description of which will be found elsewhere [17].

#### 2.1.4. Support vector machine

Another method to create a nonlinear classifier is the *support vector machine* SVM. To illustrate the idea behind SVM, consider a two class problem. The main objective is to draw the decision border between two classes that provides the best separation of sub sets among all the possible border lines (see Figure 2).

SVM finds the border line that has the largest margin or distance from the closest sample of each class. The equation of the decision border is then the equation of the hyperplane:

and the optimization problem is to find the optimal values for

which includes the Lagrange multipliers

under the constrains

In real application, it is common to find that the training set at hand is not separable, i.e., it is not possible to find a border hyper surface that splits the feature space without misclassification of some training samples. And forcing the model to fit any training vector will produce high bias. Then it is a good practice to implement *regularization*. This is done by introducing the regularization term

#### 2.1.5. Confusion matrix

To assess the accuracy of the ML model a *confusion matrix* for our classifier can be build. The confusion matrix is an evaluation of how many feature vectors in the training and/or testing set are misclassified. The matrix is built by contrasting the predicted class flag with the real one for each feature vector. For example, consider a two class problem (like the benign or malign tumor problem) that when trained against a set of 100 feature vectors report the classification presented in table 1:

In the example, 45 benign tumors have been classified as benign by the ML, and 15 have been misclassified as malign tumors. The error estimate for a ML code is computed as the average of miscalculated classes over the total of samples. It is the sum over all the off diagonal elements over the total number of cases. In the example, the error range of 30%. And the accuracy, defined as 1 − * err* is 70%.

### 2.2. Unsupervised learning

In this case, the first objective of the ML is to find some similarities among the data, which can be used to divide it into clusters. Each cluster will then define a class, and new inputs to the machine (outside the training set) will be classified following the clustering of the training space.

#### 2.2.1. Clustering

So far, the feature vector is represented as a set of numerical values in real space. From the mathematical point of view, each input “box” inside a vector is called a dimension and the value in the box is a coordinate. The length of the vector is the number of dimensions of the containing space. A space is a collection of vectors that follow a set of rules (an algebra). If a distance measure for any two points in the space (features vectors) can be build, then the data can be clustered [19]. A common choice is the Euclidean distance, defined as the square root of the sums of the squares of the differences between the coordinates of the two vectors in each dimension, i.e., if **A**, **B** are vectors in

#### 2.2.2. K-means

A common clustering algorithm known as K-means is as follows: chooses a number of clusters k to be found, and initialize the clusters centroids

#### 2.2.3. Hierarchical clustering

It is a quite expensive algorithm to obtain clusters which is based on finding a partition hierarchy among the data. It can be started by making each feature vector a cluster with one single member. Then, the distance between any pair of vectors is computed. If the distance is lesser than some selection parameter, the clusters are mixed to form one. In the new distribution of clusters, each cluster is represented by its centroid, and iterates the process until some convergence criteria is achieved. For example, a predefined number of clusters is reached. Due to the computational cost involved in hierarchical clustering, it is not recommended for problems with large training set.

#### 2.2.4. The CURE algorithm

This is a large-scale-clustering algorithm. When centroids are used to define clusters, it is expected that any cluster would have a regular shape in features space, and the space is expected to be Euclidean. The clustering using representatives (CURE) algorithm is a little more general, due that it can handle irregular shaped clusters. This method defines a cluster in terms of a set of representative members of the cluster. These representatives must be chosen in a way that they are as far as possible from each other. Then, the representatives are points on the “surface” of the cluster. This kind of construction allows any shape for the cluster, including rings. To apply the CURE algorithm, first an initial clustering must be done, then the representatives are chosen for each cluster, and finally, two clusters are united if they have a pair of representatives that are close enough following some user-definedcriteria.

## 3. LCI experiments and data extraction

### 3.1. Experiment description

#### 3.1.1. Materials and methods

### 3.1.1.1. Cell lines expressing fluorescent mithondria (Vero-Mito)

The Vero epithelial cells (ATCC) were maintained under standard culture conditions as described in other works from this lab [20-21], and in another chapter of this book [22]. The temperature was set at 37°C in a humidified atmosphere of 95% air and 5% carbon dioxide. The monoclonal cell line over-expressing the plasmid pmKate2-Mito (Evrogen®) were obtained with a cell sorter (Moflo XDP, Beckman Coulter®), with ulterior antibiotic selection (Kanamycin) of transfectants during 21 days in accordance to the experimental procedures described in detail in [23].

### 3.1.1.2. Virus preparation, titration and infection protocols

The strain New Guinea of Dengue Virus Serotype 2 (DENV-2) was grown and maintained in insect cells C6/36 HT under the standard practices as described in [20, 21], and this book [22]. Briefly the DENV were amplified at a very low MOI (multiplicity of infection) to avoid genetic drift and apparition of DIs (defective interfering particles), which could be altering the whole data concerning the real synchronized infections [24]. Viral titers were detected by plaque assay, using a Vero cell monolayer culture under 1% methylcellulose overlay medium as it was reported by [20, 21]. The viral infections were done by the same way of our previously reported works [20, 21], with the difference that for live cell imaging the cells were seed and registered in 35-mm glass bottom dishes (MatTek Corporation) with 0.7 mm in thickness of the glass coverslides, which is adequate in refraction index for this kind of inverted confocal microscope for registering living cells. The negative controls of infections were named mock infections, as it had been standardized traditionally for the virology community [24].

### 3.1.1.3. Live cell imaging

The Vero-Mito (3x105 cells) cell line was seed in Petri dishes adequate for living cells with bottom with coverslide of 0.17 mm, and previous to the register the normal culture medium used was changed by a DMEM without red phenol for avoiding the autofluorescence of this pH-indicator chemical. The videos of living cells over expressing fluorescent mitochondria (+/- infections) were obtained with a confocal microscopy based on disk spinning Unit (Olympus®IX-81 DSU), coupled to incubator and mixing gases Tokai-Hit Co® systems, which regulate the micro environment of cell culture with temperature and carbon dioxide in all system. The mock infections and infections of the overnight micrographs were captured in an OrcaR_{2}CCD (Hamamatsu®) ultra-refrigerated camera with electro multiplier, coupled to the illumination systems with Arc burners of 150 W constituted by mercury-xenon or xenon lamps (Olympus®-MT10 Illumination System). The photonic signals emitted by the biological specimen were transduced to electromagnetic waves for the CCD (charge coupled device), transmitted by a light fiber 2 m of single quartz to the Workstation Xcellence-Pro (Olympus®) for image processing, which also include the application of deconvolution tools for improving the signal/noise ratio of images.

### 3.2. Image segmentation and extraction of information

A total of nine videos generated by LCI where studied. Four of them correspond to mock cells (uninfected) and five to infected cells. Each video has 36 frames taken each 20 min for a period of 12 h. The videos are recorded in color at 1024x1344ppi resolution. As the color has no relevant information, they have been converted into gray scale. The last 34 vertical lines of each frame were dismissed in order to get rid of the microscope watermark. Then the resolution has been decreased to 495 x 672. The cells in this study have been selected under the following criteria:

The whole cell is present in any frame of the video and the nucleus was clearly distinguishable on any frame.

There were not other cells too close, so the mitochondrial distribution does not seem to be overlapped with those from neighbor cells.

Under this procedure, 11 cells were selected, 4 mock and 7 infected. Any selected cells were modeled as having an elliptical nuclei, by manually choosing four points on the nuclei borderline and applying the Hough transform [25] (see Figure 3 and 4). The nucleus and the cell are proven to be approximately aligned [26] so the nuclear envelope is approximated by another ellipsis centered in the nuclei, with the same inclination and the axis twice as long. This rudimentary model provides us with the necessary segmentation to perform the cell tracking by simply creating a mask over the region where the ellipsis is located for any frame in the video. The images are stored as intensity matrices where each entry is a pixel. Masks are stored as matrices of the same dimension, whose entries take values of zero or one. If the pixel belongs to the segmented region the associated mask value is one. Those matrices are stored in binary format to be processed in a Python script that takes advantage of numpy and Scipy libraries for further analysis.

### 3.3. Information processing

In the gray-scale video, the mitochondrial distribution is shown as bright points, been more brilliant those places where the density of mitochondrium is higher. Then, the density distribution of mitochondrium can be estimated as proportional to the intensity distribution

The shape of the density distribution of mitochondrium is the same shape of the intensity distribution. Those functions differ only by a constant of proportionality that became irrelevant when the density function is normalized. So further in this reading both functions would be referred indistinguishable as

The continuous density function can be approximated from the set of pixel intensity measurements by some interpolation method. In this work, a two-dimensional interpolation in terms of bivariate splines has been used on each segmented frame. This procedure allows us to extract important information about the mitochondrial behavior. Each frame is taken after a fixed period of time of 20 mins, so they form a time series of the density distribution function

## 4. Results and discussion

In cell biology studies live cell imaging is a newcomer; however, the innovation in computational biology tools is been forced by the convergence of distinct research programs. Being so, LCI is no longer only a “technique” but a new exploratory science [27], that brings the possibility of encompassing cross-disciplines. In this sense, the subcellular patterns of distinct cellular organelles and macromolecular structures within the cell are important for dynamical studies, which will be useful in predictive medicine [28].

The mitochondrial morphology is a remarkable area for biomedical research since more than a decade [29], because these cellular organelles change under physiological and pathological conditions, like metabolism, thermogenesis, homeostasis of calcium and several kinds of cell death [30, 31]. But there is lacking information about the subcellular distribution of mitochondrium after a cell injury like the viral infections, and this quantitative information is key for tracking some cellular events of virus cycle that have been covered to the computational cell biology exploration.

New developments have been focused in the high-resolution microscopy images of the fine morphology of mitochondria [32]. Having in mind the improved time resolution, this information is decisive for understanding of the dynamics and functioning of these cellular organelles at high-throughput screenings [33].

But here, the work was mainly directed to study and characterize the subcellular distribution of mitochondria with and without Dengue virus infections on epithelial cells that are constitutively expressing these organelles in red fluorescence.

Recently, it had demonstrated that not only shape, number and size of the organelles are important for the cellular function, but also their subcellular distribution, which is the consequence of the intracellular transport [34]. Since Dengue virus like many other members of the most diverse viral families are using the cytoskeleton [22], here we have tried to follow indirectly the infection process using the alterations of subcellular distributions of mitochondria.

Both mock and DENGV infected cells have been prepared by a standardized procedure that provides approximately the same initial state among all cells of each type even when different experiments are considered. So at frame 0 each of the studied cells provides a possible initial state. Each of these states must adjust to the density distribution of mitochondrium. Assuming that any possible initial state is equally probable, then our approximation to the density function at time zero

The distribution as a function of two variables given by the pixel position related to the center of the cell is

A Python script was written to automatically determine the major semi axis of each cell in each frame by measuring the maximum distance between two points into the segmented region. The center of the ellipsis is found as the average of all the coordinates in the image. To get *ρ*(*r, θ, t*) for any cell referred to the same polar axis, and so be able to compute the density average, the polar axis is set equal to the major semi axis in each of the cells analyzed. In polar coordinates

The average density of mitochondrium distribution *ρ*(*r, θ, t*) is shown for three frames (t = 0, 320, 720 minutes) in Figures 5, 6, and 7. Those are the initial state, one intermediate state and the final state of the study. The vertical axis is the level of intensity (proportional to the mitochondrium density) and the horizontal axis is the distance to the nuclei center in pixels. In each subplot the projection of *ρ*(*r, θ, t*) for a given angle in radians is presented.

It can be seen that mock cells present prominent peaks for some radial positions, i.e., the distance between the peak and the closest local minimum is large compared with the background density. Also, the background density is low. This suggests that mock cells have the mitochondrium distributed in clusters around the nuclei. At variance, the infected cells average mitochondrium density distribution presents a higher background intensity compared with the maximum of the distribution. The peaks are less defined than those for mock cells, which means that the mitochondrium clusters are less defined or inexistent. And the mitochondrium tends to fill all the space available. Is also noticeable that the infected cells show more local maximum, which suggest that the distribution is somehow disorganized (more random).

These findings imply that a general structural change in mitochondrium distribution is caused by viral infection and it can be evidenced directly by examination of a cell’s picture. The clustered behavior presented by mitochondrium on mock cells implies that they are grouped when normal function of the cell is in process. This is an organized distribution. On the other hand, the lack of clusters in infected cells shows that when infected the mitochondrium distribution became erratic, maybe random, which will be associated with a lack of organization.

A possible way to detect the presence of a viral infection will be to measure the level of randomness present in mitochondrium distribution. Remembering that *ρ*(*r, θ, t*) is experimentally measured through the pixel intensity in each video frame, it is found that the randomness in mitochondrium distribution will be the same than the randomness in pixel intensity distribution of the segmented image.

An image on gray scale is described digitally in terms of intensity values ranging from 0 (black) to 255 (white), a total of 256 possible shades of gray, each one of those possibilities is known as a level of intensity. A common descriptor used to classify a picture is the *entropy*

This is a measure of how “random” the levels of intensity are distributed on a gray scale picture. *ρ*(*r, θ, t*), it can be expected that for the entropy in mock cell’s image to be lesser than the entropy in the image of an infected one.

Another commonly used descriptor for images is *uniformity*, defined as:

which is a measure of how much the levels of intensity change through the image. This descriptor is maximum if all the image presents one single level, and decreases with the level changes. By carefully looking at Figures 5, 6, and 7 the reader will note that infected cells shows more oscillations in the density of mitochondrium (local peaks). In terms of pixel intensity, it means that the intensity is changing more frequently, so the tone in the pictures is less uniform. Then it can be expected from the uniformity descriptor on the picture of an infected cell to be low.

In Figure 8, a plot on the uniformity vs entropy parameters space shows the computed values for those image descriptors for all studied cell in all frames. It can be seen that the mock and infected cells occupy mainly different regions on parameter space. So, these descriptors constitute a promising candidate to be a feature vector (or a part of it) in a machine learning code designed to classify infected cells.

## 5. Conclusion

A detailed analysis of the mitochondrium distribution around the nuclei for seven infected cells and four mock cells in nine videos has been performed. The study shows that mock cells clusters its mitochondrium and present an organized distribution in space. The organized character of the mitochondrium density distribution is maintained through time. At variance, infected cells loose these organized characteristics and the distribution of mitochondrium become erratic. This suggests that the mitochondrium are clustered when the healthy cell is performing its natural process. But when a DENV infection is affecting the cell, those natural process are interrupted and it is reflected in the way how mitochondrium behaves. From this analysis, two image attributes are found to be suitable to be used as *features* in a ML classifier between infected and mock cells. These features are simple common image processing descriptors: *entropy* and *uniformity*, whose computation is easy and fast. Entropy is related with the randomness presented in the gray tones of the image and uniformity is related with the prevalence of a single gray tone. Both image attributes in a LCI photograph taken over a cell culture prepared with coloured cells are directly related with the mitochondrium density distribution behavior *ρ*(*r, θ, t*). The presence of clusters in mock cells and the softer *ρ*(*r, θ, t*) behavior are translated to higher values of uniformity and lower values of entropy image descriptors than those present in infected cells images.