Open access peer-reviewed chapter

Object Tracking Using Adapted Optical Flow

Written By

Ronaldo Ferreira, Joaquim José de Castro Ferreira and António José Ribeiro Neves

Submitted: 13 September 2021 Reviewed: 25 January 2022 Published: 28 April 2022

DOI: 10.5772/intechopen.102863

From the Edited Volume

Information Extraction and Object Tracking in Digital Video

Edited by Antonio José Ribeiro Neves and Francisco Javier Gallegos-Funes

Chapter metrics overview

210 Chapter Downloads

View Full Metrics

Abstract

The objective of this work is to present an object tracking algorithm developed from the combination of random tree techniques and optical flow adapted in terms of Gaussian curvature. This allows you to define a minimum surface limited by the contour of a two-dimensional image, which must or should not contain a minimum amount of optical flow vector associated with the movement of an object. The random tree will have the purpose of verifying the existence of superfluous vectors of optical flow by discarding them, defining a minimum number of vectors that characterizes the movement of the object. The results obtained were compared with those of the Lucas-Kanade algorithms with and without Gaussian filter, Horn and Schunk and Farneback. The items evaluated were precision and processing time, which made it possible to validate the results, despite the distinct nature between the algorithms. They were like those obtained in Lucas and Kanade with or without Gaussian filter, the Horn and Schunk, and better in relation to Farneback. This work allows analyzing the optical flow over small regions in an optimal way in relation to precision (and computational cost), enabling its application to area, such as cardiology, in the prediction of infarction.

Keywords

  • Object tracking
  • vehicle tracking
  • optical flow
  • gaussian curvature
  • random forest

1. Introduction

Object tracking is defined as a problem of estimating the object’s trajectory, done by means of a video image. There are several tools for tracking objects and are used in various fields of research, such as computer vision, digital video processing, and autonomous vehicle navigation [1]. With the emergence of high-performance computers, high-resolution cameras, and the growing use of so-called autonomous systems that, in addition to these items, require specialized tracking algorithms, increasingly accurate and robust for automatic video analysis, has currently been the target of numerous research on the development of new object tracking techniques [2, 3].

Object tracking techniques are applicable to motion-based reconnaissance cases [4], automatic surveillance systems [5], pedestrian flow monitoring in crosswalks [6], traffic control [7], and autonomous vehicular navigation [8]. Problems of this type are highly complex due to the characteristics of the object and the environment, generating many variables, which impairs performance and makes the application of tracking algorithms unfeasible to real-world situations. Some approaches seek to resolve this impasse by simplifying the problem, reducing the number of variables [9]. This process, in most cases, does not generate good results [10, 11], making it even more difficult to identify the main attributes to be selected to perform a task [12, 13].

Most of the object tracking problems occur in open environments, so-called uncontrolled [14]. The complexity of these problems has attracted the interest of the scientific community and generated numerous applied research in various fields of research. Current approaches, such as the ones that use convolutional neural networks—CNN, deal well with the high number of variables of these types of problems, providing space–temporal information of the tracked objects, through three-dimensional convolutions [15, 16, 17]. This ends up creating an enormous number of learnable parameters, which ends up generating an overfitting [11]. A solution to reduce this number of learnable parameters was combining space–time data, extracted using the optical flow algorithm, used in the Two-Stream technique [18, 19, 20]. However, this technique presents good results only for large datasets, showing itself to be inefficient for small datasets [15, 21].

In recent years, research using machine learning has been applied to tracking problems, gaining notoriety due to the excellent results obtained in complex environments and attribute extraction [21, 22, 23]. Deep learning stands out among these techniques for presenting excellent results to unsupervised learning problems, [24], object identification [25], semantic segmentation [26]. Random trees are also examples of machine learning techniques, and their excellent results, due to their precision and great capacity to handle a large volume of data and low overfitting tendency [27, 28], and widely used in research areas such as medicine, in the prediction of hereditary diseases [29], agriculture to increase the productivity of a given plantation crop and in astronomy, acting on the improvement of images captured by telescopes, in the spectrum electromagnetic radiation not visible to the human eye [30]. The possibilities of applications, and new trends and research related to machine learning techniques, with particular attention to random trees, allow the development of algorithms that can be combined with existing ones, in the case of optical flow algorithms, (belonging to computational field of view) taken advantage of in this way, the advantages of each [31, 32, 33].

Developing an algorithm whose objective is to track objects, using the particular advantages of these techniques in a combined way, justifies creating a tracking algorithm that combines the optical flow technique, adapted in this work in terms of the Gaussian curvature associated with a minimal surface, with a random trees waiting for it to capture on this surface a minimum number of optical flow vectors that characterize the moving object, accurately and with low computational cost, contributing not only in the fields of computational vision but in other branches of science, such as in medicine, it can help in the early identification of infarctions.

Advertisement

2. Related works

Due to the large number of studies related to the technique of object tracking, only a small number surrounding this theme will be addressed. The focus of this project is not to make a thorough study on the state of the art. With this in this item, the main works in the literature, associated with the tracking of objects, will be presented. Among the various approaches used for this context, we highlight those focused on the techniques of optical flow, and others belonging to machine learning, such as those that use identifications of patterns, which allow relating, framing, and justifying the development of this proposal and its importance, through its contribution, to the state of the art.

2.1 Object tracking

Object tracking is defined as a process that allows you to uniquely estimate and associate the movements of objects with consecutive image frames. The objects considered can be from one, the set of pixels belonging to a region of the image. The detection of pixels is done by a motion detector or objects, which allows to locate objects with similar characteristics that move, between consecutive frames.

These characteristics of the object to be tracked are compared with the characteristics of a reference object modeled by a classifier over a limited region of the so-called region of interest frame, where the probability of detection of the object is greater. Thus, according to [33], the detector of traced objects, locate several objects on the different parts of the region of interest and performs the comparison of these objects with the reference object. This process is performed for each frame and each object detected, candidate to be recognized as the greatest possible similarity, to the reference object can be represented, through a set of fixed-size characteristics, extracted from this region containing a set of pixels, which can be represented by a numerical array of data.

Thus, mathematically, the region containing a set of pixels belonging to the regions of the object of interest, where the characteristics that allow to test whether the region of the frame, in which the object to be traced is, is given by:

OCit=OCtLOCtLORi1<εE1

where, LOCt is the position xy of the centroid of the candidate object OCt, LORi1, is the position of the object traced to the i1—frame of the video and 0<εR, is an actual value associated with the size of the region of the object of interest.

According to the works of [34, 35], learning methods are used to adapt the changes of movement and other characteristics such as geometric aspect and appearance of the tracked object. These methods are usually used adaptive tracked object trackers and detectors. The following will be presented other types of object trackers, found in the literature.

According to [36], a classifier can be defined with a f belonging to a family of functions F parameterized by a set of classifier parameters. They form a detector of objects to be tracked which in turn is an integral part of a tracker. A classifier can also be training and thereby generate a set of classification parameters, producing the function f, that allows you to efficiently indicate the classes vi of the test data xi from a training set Ct={x1y1,,xnyn}. The data is points in the space of the characteristics, which can be entropy, the gray level, among others.

The classifier aims to determine the best way to discriminate the data classes, on the space of characteristics. The test data form a set containing the characteristics of the candidate objects, which have not yet been classified. The position of the object to be tracked in the frame is defined as the position corresponding to the highest response of the detector of the object to be tracked on the ith-candidate objects. Therefore, the position of the object to be tracked is determined by the position of the ith-candidate object, which is most likely to belong to the class of the crawled object, given by the following equation:

LORt=LargmaxiPyi=COROCit,PRtE2
Pyi=COROCit,PRt=1/N,seLOCitLORtt1<ε0,otherwiseE3

where the variable COR in the equation (3), are the classes of the tracked objects and the candidate objects OC, all with equiprobability of occurrence. According to the types of classifiers used object detectors to be tracked, along with the initial detector, it is possible to use some learning technique to train them. One of the ways used is offline training [36] and adjusting the parameters of the classifier before running the tracker.

Offline-trained classifiers are generally employed in object detectors designed to detect all new objects of interest that enter the camera’s field of view [37]. The training set Ct, must contain characteristics xi extracted from the objects to be traced and diverse environmental characteristics. This allows new objects, with varied geometric characteristics and aspects, to be detected more efficiently. As for online training, the adjustment of the parameters of the classified is performed during the tracking process. For online trained classifiers, they are generally used in object detectors to be tracked. Thus, in each frame, the new extracted characteristics are used to adjust the classifiers.

2.1.1 Binary classification

In [38], trackers that use the detection tracking technique deal with object tracking, as a binary classification problem whose goal is to find the best function f that separates the objects to be tracked R, of other objects in the environment. Object tracking seen as a binary classification problem, which is currently one of the subjects that receives the most attention in research in computing vision.

In [39], were developed trackers that used detectors of objects to be tracked, formed by classifiers in committee formed by binary classifiers said weak. For [40], a binary classifier is defined as a classifier, used in problems where the class yi of a OCi belongs to the set Y=11. The negative class {−1} refers to the characteristics of the environment and other objects. A positive class {+1} refers to the class of the object to be tracked.

A classifier is said to be weak, when it has a probability of “hitting” a given data class, only slightly higher than a random classifier. The detector of the object to be tracked must separate the crawled object from the other objects and the environment. Its purpose and determine the position of the tracked object, according to the equations (1)(3). According to [41, 42] each of the ithcandidate object classes OCi, it is defined according to Bayesian theory of decision, through minimal classification error. This means that, the decision given by observing, the sign of the difference between Pyi=COROCit,PRt and Pyi=CNOROCit,PRt, so that the sum of these probabilities is unitary.

2.1.2 Monitoring systems

For [43], the term monitoring system, refers to the process of monitoring and autonomous control, without human intervention. This type of system has the function of detecting, classifying, tracking, analyzing, and interpreting the behavior of objects of interest. In [44, 45], this technique was used combined with statistical techniques for controlling people’s access to a specific location. It was also observed the use of intelligent monitoring systems, applied to building, port, or ship security [46, 47].

The functions comprised by a monitoring system are so-called low- and high-level tasks. Among some high-level tasks, we highlight the analysis, interpretation and description of behavior, the recognition of gestures, and the decision between the occurrence or not of a threat. Performing high-level tasks require that for each frame, the system needs to perform low-level tasks, which involve direct manipulation of the image pixels [48, 49, 50, 51, 52, 53, 54, 55, 56]. As an example, we highlight the processes of noise elimination, detection of connected components, and obtain information on the location and geometric aspect of the object of interest.

A monitoring system consists of five main components, which are presented in Figures 9. Some monitoring systems may not contain all components. The initial detector aims to detect the pixel regions of each frame that have a significant probability of containing an object to be tracked. This detector can be formed by a motion detector that detects all moving objects based on models of objects previously recorded in a database or based on characteristics extracted offline [40, 41]. The information obtained by the initial detector is processed by an image processor], which will have the function of eliminating noise, segmenting, and detecting the connected components.

The regions containing the most relevant pixels are analyzed and then classified as objects of interest by the classifier [50, 51, 52, 53, 54]. Objects of interest are modeled and are now called reference objects so that the tracker determines its position frame by frame [55, 56]. The information obtained by the initial detector is processed by an image processor], which will have the function of eliminating noise, segmenting, and detecting the connected components.

A tracker, an integral part of a detector, is defined as a function that allows estimating the position of objects at each consecutive frame, through and defines the region of the object of interest, for each ith object being tracked within a region of interest. This estimation of the movement is performed through the correct association of the captured and tracked objects, to consecutive video frames. The trace often and interpreted as a data binding problem. Figure 1 shows a schematic of the main components of a monitoring system.

Figure 1.

Main component of a monitoring.

2.2 State of the art in object tracking with optical flow

Several techniques that allow the calculation to have been developed in recent years to calculate the optical flow vector [57]. These methods are grouped according to their main characteristics and the approach used for the calculation of the optical flow. Thus, the differential methods performed in the studies in [56], the methods d and calculation of the optical flow through the frequency domain [46] the phase correlation methods [58], and the method of association between regions [59].

The method proposed in [56], allows the calculation of the optical flow for each point around a neighborhood of pixels. In [60], it is also considered a neighborhood of pixels, but in this case, the calculation of the optical flow is performed geometrically. In the work presented by [61] it is adding of the restrictions of regularization. In [62] turn active compare performance analyses were performed between the various algorithms and optical flow present in the literature.

This technique is considered robust for detaining and tracking moving objects from your images, both those captured by fixed or mobile cameras. This gives this technique, but high computational cost makes most practical applications unfeasible. Thus, to reduce this complexity, techniques of increasing resolutions were adopted in [63]. Also, for the same purpose, we used the techniques of subsampling on some of the pixels belonging to the object of interest to obtain optical flow [52].

Other authors also use a point of interest detector to select the best pixels for tracking and calculate the optical flow on these points [52, 64]. The reduction in the number of points to be tracked is associated with a decrease in computational complexity, so in [52] the points of interest were selected using the FAST algorithm [64].

The method developed by Lucas-Kanade [56], it is a differential method and widely used in the literature and having variations modifications. It allows you to estimate the optical flow for each point xyt calculating the like transformation (TAxt, applied to the pixels of a pixel grid, with center in xy by the following function fxt, that is:

fxt=minxpixel gridQxt1QTAxtgxE4

where gx is a Gaussian smoothing filter centered on x.

New variations of the techniques were being proposed to make the calculation of the optical flow faster and faster. In [65] a tracker was proposed based on the algorithm of [56]. The translation of a point represented by a grid of rectangular sized pixels 25 × 25, was calculated and its validity is evaluated by calculating the SSD1 in the grid pixels in Qt and in Qt1. If the SSD is high, the point is dropped and stops being traced.

In [51] objects were detected by subtracting the image from the environment and removed the movement of the camera with the calculation algorithm of the optical flow vector proposed by [56]. In the studies carried out in [66, 67], they showed that the reliability of the estimated optical flow reduced the case of some points of the object of interest whose optical flow cannot be represented by the same matrix given by the related transformation TAxt of the other points. Thus, to improve the robustness of the algorithm of [56, 67] proposed a calculation of the independent optical flow vector for each of the N points belonging to the object of interest selected with the SURF (Speeded Up Robust Features) point detector in the initial frame.

In [67] they also modified Lucas - Kanade’s algorithm [56] by inserting the Hessian matrix in the calculation of the value of the variation of the related transformation ΔTAxt. The algorithm allows for more effective tracking when partial occlusions, deformations, and changes in lighting occur, as optical flow is not calculated considering all points of objects of interest.

Already in the proposal presented in [68] was the development of algorithm to detect people in infrared images that combines the information of the value of pixels with a method of motion detection. The algorithm forms a relevant pixel map by applying thresholding segmentation. While the camera is still, an image M is built with the differentiation between frames. If the camera is in motion, M is filled with the pixels obtained by the analysis of the moment of the optical flow calculated by the algorithm of [56]. The map of relevant pixels is replaced by the union between M and the Pixel Map relevant to the first case and the second an interception between M and the pixel map relevant case to compensate for the movement of the camera.

The method for tracking swimmers presented in [46], uses the information of the movement pattern by the optical flow and the appearance of the water that is modeled by a MoG.2 This allows you to calculate an optical flow vector for each pixel of the video independently of the other, through B which is an array composed of gradients in the directions xand ypixels in a grid of pixels.

In [69], a method was presented that incorporated physical restrictions to the calculation of optical flow. The tracker uses the constraints to extract the moving pixels with a lower failure rate. The calculation can be impaired when occlusions occur or when the environment has low light. The operator defines the physical constraints and selects the points of the OR that are tracked by optical flow. Constraints can be geometric, kinematic, dynamic, of the property of the material that makes up the OR or any other type of restriction.

In [70], the points that are tracked with the optical flow are defined by applying the Canny edge detector on the pixels of the reference pixel map. Pixels that produce a high response to the Canny detector are the selected points.

In [43], optical flow is used as a characteristic for tracking the contour of the object. The contour is shifted in small steps until the position in which the optical flow vectors are homogeneous is found.

In [64], they performed an estimate of the translation and orientation of the reference object by calculating the optical flow of the pixels belonging to its silhouette. The coordinates of the centroid position are defined by minimizing the Hausdorff distance between the mean of the optical flow vectors of the reference object and the candidate object to be chosen as the object of interest.

2.2.1 Optical flow as a function of Gaussian curvature

Optical flow is defined as a dense vector field associated with the movement and apparent velocity of an object, given by the translation of pixels from consecutive frames in an image region. It can be calculated from the brightness restriction, considered constant, from the corresponding pixels in consecutive frames.

Mathematically be a pixel xy, associated with a luminous intensity Ixy, over an image surface or plane, and a time interval and a sequence of frames associated with an apparent offset of the pixel over that image surface or plane. Thus, the rate of variation of light intensity in relation to a time interval, associated with the apparent movement of the pixel, on a surface or plane of the image, being considered practically null can be given by:

dIxydt=Ixyxdxdt+Ixyydydt+IuvdtE5
=Ixux+Iyvx+ItE6
dIxydt=0Ixux+Iyvx+It=0E7

So that equation (7) is called optical flow restriction and where the terms Ix, Iy, It denote the derivatives relative to the brightness intensity relative to the coordinates x,y and time t, and u and v, uxyvxy are the horizontal and vertical components of a vector representing the optical flow field, for the pixel xy in question.

The number of variables in equation (6) is greater than that of equations, which does not allow estimating components and vector, and determining a single solution for the optical flow restriction equation. With this, Lucas and Kanade proposed a solution to solve this problem. The solution method proposed by them considers the constant flow in a region formed by a set of pixels N×N, so you can write the optical flow restriction equation for each pixel in this region, thus obtaining a systems of equations with 2 variables, that is:

Ix1vx+Iy1vy+It1=0Ix2vx+Iy2vy+It2=0Ixpvx+Iypvy+Itp=0E8

Passing the set of equations given by equation (8) to the matrix form we have:

Ix1Iy1IxpIypvxvy+It1Itp=00E9

Using the least squares method, in the system of equations (9) in the form of matricial, the same can be solved. Therefore, the optical flow v=vxvy can be estimated for a particular region or window with N×N pixels, that is:

Ix1Iy1IxpIypIx1Iy1IxpIyptvxvy=It1ItpIx1Iy1IxpIyptIx1Iy1IxpIypIx1Iy1IxpIyptIx1Iy1IxpIypIx1Iy1IxpIypt1vxvy=It1ItpIx1Iy1IxpIyptIx1Iy1IxpIypIx1Iy1IxpIypt1E10

Where:

Ix1Iy1IxpIyp=Apx2=A

Therefore, one has that:

Ix1Iy1IxpIypIx1Iy1IxpIyptIx1Iy1IxpIypIx1Iy1IxpIypt1=
=A·AtA·At1=1001=Idpxp=IdE11

Thus:

Id=It·AtA·At1E12

This method has a reduced computational cost to determine optical flow estimation when compared to other methods because it is simple, that is, it is since the region in which the variation of light intensity between pixels is minimal has a size 2×2, contained in a region NxN. In this way, the Optical Flow is determined in a region of 2×2, between these two pixels, using only one matrix inversion operation (equation (12)).

To calculate the optical flow over the size region NxN, partial derivatives must be calculated in each pixel. However, considering almost null the variation of the intensity of light between pixels, over the region, the small differences in the accumulated intensities of brightness between pixels compromise the accuracy of the Optical Flow in relation to the determination of the actual motion object, that is, it gains in the processing speed and loses precision in the determination of the motion. When deriving equation (5) we have equation (13), that is:

ξα2=axα1+ayα2+a3u2+a4v2+a5uv+a6u+a7v+a8E13

Where the terms αx=vxt, αy=vyt, are called the components of the acceleration vector, vxvy is the components of the velocity vector and the terms a1=Ix=xIxyt;a2=Iy=yIxyt;a3=Ixx=2x2Ixyt;a4=Iyy=2x2xyt;a5=Ixy=xyIxyt;a6=Ixt=xtIxyt; a7=Iyt=ytIxyta8=Itt=t2Ixyt, are the first and second partial derivatives of the Ixyt.

In view of the small variations present and accumulated along the vector field associated with the optic flow, which cause an additional error in equation (13), a regularization adjustment was made, given by equation (14):

ξc2=xvxxy2+yvxxy2+yvyxy2+xvyxy2E14

Thus, combining equations (13) and (14), the error ξ can be minimize by the equation (15):

ξα2+α2ξc2dxdyE15

where α is the value of the weights required for smoothing the variation of the associated optical flow. So, to get the vx=vxxy e vy=vyxy, thus using the resources of the variational calculation one has:

2α3vx+α5vy=α22vxb1α5vx+2α4vy=α22vyb2E16

where 2vx is the Laplacian of vx e 2vy is the Laplacian of vy and the coefficients de b1, b2 can be given as:

b1=∂tα1+α6b2=∂tα2+α7E17

and replacing the coefficients Ix, Iy, Ixx, Iyy, Ixy, Ixt, Iyt, Itt in equation (16), one has:

Ixxvx+Ixyvy=α232vyIxtE18
Ixyvx+2Ixyvy=α232vxIytE19

whereas 2vx=vx1jk¯vx1jk, e 2vy=vy1jk¯vj1jk, are the Laplacians of equations (18) and (19), given in their discretized digital forms together with equation (20),

λ=α23E20

It is possible to reduce the data system by (17), such as:

λ2Ixx+Iyy+λ2+κvx=λ2Iyy+λ2vx¯λ2Ixy+vx¯+c1E21
λ2Ixx+Iyy+λ2+κvx=λ2(Iyyvx¯+λ2Iyy+λ2vy¯+c2E22

where the term κ=IxIyyIxy2, it is called Gaussian curvature of the surface. And it is also that:

c1=IxyIytIxxIyy+λ2c2=IxxIyyIxxIxx+λ2E23

Where c1,c2 they’re real constants.

Therefore, isolating terms vx, vy and still replacing c1 e c2 in equations (21) and (22) respectively, resulting in equations (24) and (25):

vx=vx¯λ2Iyy+λ2vx¯λ2Ixy+vx¯+c1λ2Ixx+Iyy+λ2+κvxE24
vy=vy¯λ2(Iyyvx¯+λ2Iyy+λ2vy¯+c2λ2Ixx+Iyy+λ2+κvxE25

The Algorithm 1 is a pseudocode to generate the proposed optical flow vector, through equations (24) and (25) and that allow estimating the speed and position of an object, through a sequence of video images.

Algorithm 1. Adapted optical flow (Gaussian curvature κ).

Begin

Input: Image sequence (video)

Output: Vector optic flow generator (vx, vy)

   For I = 1…N do

      Convert images to a gray tone

      Calculate the partial derivatives of 1°and 2°orders of Ixyt

      Calculate constants ax, ay, a1,,a8, b1,b2,λ,c1,c2

      Calculate the discretized Laplacians of 2vx, 2vy

      Calculating Gaussian curvature κ

      Calculate flow components (u, v)

   End For

End

2.2.2 Random forests

Developed by Breiman [63] in the mid-2000s, and later revised in [71] random trees are considered one of the best-supervised learning methods used in data prediction and classification. Due to its simplicity, low computational cost, great potential to deal with a large volume of data, and still present great accuracy of results, currently this method has become very popular being applied in various fields of science as data science [72]. Bioinformatics, Ecology, in real-life systems and recognition of 3D objects. In recent years, several studies have been conducted with the objective of making the technique more elaborate and seeking new practical applications [73, 74, 75].

Many studies were carried out with the aim of narrowing the existing gap between theory and practice can be seen in [58, 76, 77, 78]. Among the main components of random tree forests, one can highlight the bagging method [63], and the criterion of classification and regression called cart-split [79], which play critical roles.

Bagging (a bootstrap-aggregating contraction) is an aggregation scheme, which generates samples through the bootstrap method, from the original dataset. These methods are nonparametric and belong to the Monte Carlos method class [80], treating the sample as a finite population. Still, these methods are used when the distribution of the target population is not specified, and the sample is the only information available. How in this way a predictor of each sample is constructed, so that the decision is made through an average, and is more effective computational procedures to improve the indexable estimates, especially for large sets of high-dimensional data, where finding a good model in one step is impossible due to the complexity and scale of the problem. As for the cart-split criterion, it originates from the CART program [63], and is used in the construction of individual trees to choose the best cuts perpendicular to the Axes. However, while bagging and the CART division scheme are key elements in the random forest, both are difficult to mathematically analyze and are a very promising field for both theoretical and practical research.

In general, the set of trees is organized in the form of {T1 (Ɵ1), T2 (Ɵ2)… Ti (Ɵi}, where TB is every tree and ƟB are bootstrap samples with spare dimensions qx mtry, where mtryis equal to the number of variables that will be used on each node during the construction of each tree and qis approximately 0,67×n. Each of the trees produces a response y1,i for each of the samples W {T1W = y1,i, T2W = y2,i, .., TiW = y2,B} and the mean (regression) or majority vote (classification) of the tree responses will be the final response of the model for each of the samples.

Advertisement

3. Methodology

The methodology employed consisted of combining the optical flow algorithm in terms of Gaussian curvature, developed in this work together with the technique of random forest. The language used for the development of this algorithm was the MATLAB programming language, executed on a 64-bit 8th generation notebook, CORE i7. The input data is a video extension Avi, lasting 5 min of a vehicle and two cyclists, circulating in the vicinity of the beach of Costa Nova, in the locality of Ilhavo, in Aveiro, Portugal. The video was fragmented into a set of frames, analyzed two by two by the algorithm for the generation of the vector field of optical flow. After that, the resulting image associated with the flow and a minimal surface region, given by the Gaussian curvature. Next on this surface, the random trees analyzed which vectors presented important characteristics to characterize in an “optimal” way, the movement of the object (see Figure 2).

Figure 2.

Representative model of operation of a random forest.

After finishing the process of analysis of the movement of the objects, the execution times and accuracy of the results obtained by the proposed algorithm were compared in relation to the algorithms of Lucas Kanade, Horn and Shunck, Farneback and Lucas Kanade with or without Gaussian filter, allowing to validate the results obtained. After that, the implementation of the developed algorimo began.

Advertisement

4. Results

Figure 3 shows the vehicle and the two cyclists that were used to collect the image to which the results proposed in this work were obtained so that the choice was random on the right side. A graphical representation of the vector field of optical flow generated by the sequence of two consecutive frames, over 5 minutes of video is shown.

Figure 3.

(a) Left side: vehicle shift between moments t1 and t. (b) Right side: representation of the corresponding optical flow.

On the right side of Figure 3, the optical flow associated with the movement of the vehicle between the time intervals from t1 to t is being represented. Note that the vector representation of this field associated with this flow was performed in such a way that the vectors generated by the field were superimposed in the horizontal direction of the central axis of the figure. Although there were other objects present at the site, that is, two cyclists and a car in the upper left corner, the object of interest considered was the vehicle close to the cyclists. This is shown on the right side of Figure 3, by the layout of this horizontal arrangement of vectors, which allows indicating whether the current movement and the predicted movement of the considered object is to the left or to the right.

The region with the highest horizontal vector density in Figure 3 is located on the left side, in blue. It is also observed that the number of vectors in this region, despite being spaced, starting from the center to the left, is greater in relation to the number of vectors on the right side. It is also possible, through it, to visually evaluate the movement behavior of the considered objects. This region, containing a higher vector density, corresponds to the current direction in which the object is heading and its predicted displacement. It is also possible to observe that this vector density increases towards the left side, passing through the central part, coming from the right, clearly indicating the direction of movement of the object, that is, the object moves to the left. In Figure 4, this process can be understood more clearly.

Figure 4.

Prediction and actual displacement of the object obtained through the optical flow.

In a similar way to the one mentioned in Figure 3, on the right side of Figure 5, the optical flow generated by the displacement of the moving vehicle is represented, between the instants t to t+1. It is possible to observe that the vector representation of this field was performed in such a way that its vectors representing this field were also superimposed on the axis in the horizontal direction of the figure, creating a vector density created by this superposition. The object of interest considered remains the vehicle close to the cyclists. As can be seen on the right side of Figure 3, the arrangement of the horizontal vectors also allows to indicate the current movement and, if its movement prediction is to the left or to the right.

Figure 5.

Object remains on the right side, but with a medium offset to the right and displacement estimate still to the left.

It is possible to observe a small increase in the vector density to the left, but that has a great influence on the determination of the real and predicted position of the object in the considered time intervals. The Object continues with its actual movement to the left, as well as the predicted movement of the object to the left. However, he showed a slight movement to the left (direction where the cyclists are).

In Figure 6, a small variation of the optical flow is observed again in the associated movement between the instants t+1 to t+2. In this figure, the vehicle is next to the cyclists, both in the opposite direction to the vehicle. The movement of the vehicle continues without great variation in the direction, causing no period for cyclists or other vehicles in the opposite direction to it on the left side.

Figure 6.

Object remains on the right side, but with a slight shift to the right and offset estimate to the left.

In Figure 7, there was no optical flow variation in the associated movement between the time intervals t+2 to t+3. In this figure, the vehicle can be seen as it passes the two cyclists. This means that the non-significant variation in the optical flow vector field, keeping the number of vectors higher on the left side is associated with maintenance in the direction of movement of the object considered, that is, it continues to move on the left side.

Figure 7.

Object moving and keeping on the left for consecutive frames.

In Figure 8, the vehicle can be seen completely overtaking the two cyclists and approaching another vehicle in the opposite direction in the upper part of the image (left). The variation of the optical flow vector field remains the same. This indicates that the vehicle continues its trajectory, on the left side to the cyclists, however without posing a danger of collision for the other vehicle in the opposite direction.

Figure 8.

Object with unchanged offset pattern.

Advertisement

5. Analysis and discussing the results

This item will show how the performance evaluation of the proposed algorithm and accuracy was performed in relation to the Algorithms of Luca and Kanade, with or without Gaussian filter, Horn and Schunck, and Farneback.

The algorithm allowed to show on the display in real-time the displacement of the object on the right side and the set of vectors capable of representing the movement of the real-time or accumulated indicating the tendency, in this case, of the direction that the object should perform. This process was carried out in a similar way, using the other algorithms to make it possible to compare them. The behavior of the proposed algorithm and the other will be graphically shown.

The technique developed in this work allowed to generate an optical flow considering important geometric properties allowing to identify similar categories of moving objects and same characteristics. These geometric properties are intrinsically associated with the curvature of the object’s surface in three-dimensional space, called Gaussian curvature, in this case in a 2D image.

The modified optical flow, considering these properties, generated a dense optical flow, allowing the generation of a band, describing a track on the 2D plane. This allowed tracking the movement of the considered object. In the same Figure 8, it is possible to observe that at each time interval in which the object was monitored, the dispositions of the vectors for the left and right sides, as shown in Figures 37 were responsible for drawing the track associated with the displaced and that allowed tracking the object as it moves.

Figure 9 shows the vehicle that, when moving, generated the optical flow. In Figures 10 and 11, the variations of the optical flow between two-time intervals, Δti and Δtn, (i<n) are shown. In this way, the algorithm allowed tracking the progressive movement of the object (movement adopted as progressive, in this work) and, as this happens, it is possible to predict in which direction it is moving, that is, to the left or to the right (or keeping straight) line.

Figure 9.

Variation of the optical flow of the moving object.

Figure 10.

Vehicle movement.

Figure 11.

Object moving to the left side.

5.1 Optical flow algorithm in terms of the curvature

In the following items, the implementations of the Lucas and Kanade algorithms without or with a Gaussian filter, Horn and Schunck, and Farneback will be shown, using as input data the same sequence of video images used in the algorithm developed in this work. For each, the performance and accuracy obtained will be verified.

5.2 Lucas and Kanade algorithm without Gaussian filter

5.3 Lucas and Kanade Algorithm with Gaussian filter

5.4 Algoritmo de Horn and Schunck

5.5 Algoritmo de Farneback

For each of the 5 algorithms, 1 frame is shown containing 4 figures, with 2 upper and 2 lower. In each frame, the figure at the top left shows the variation of the vector field between two frames. The right frame, on the other hand, corresponds to the variation of the object’s movement in real-time. The lower ones, except for the proposed algorithm, correspond to the number of points on the right or the left, and with this, the movement will occur to the side that has the greatest number of points. In the case of the proposed algorithm, the process will take place through the analysis of vector density. So, to the side where there is greater vector density, this is the side to which the movement will be occurring (see Figures 1220).

Figure 12.

Object moving to the right side.

Figure 13.

Variation of the optical flow of the moving object.

Figure 14.

Vehicle movement.

Figure 15.

Object moving to the left side.

Figure 16.

Object moving to the right side.

Figure 17.

Variation of the optical flow of the moving object.

Figure 18.

Vehicle movement.

Figure 19.

Object moving to the left side.

Figure 20.

Object moving to the right side.

Comparing the results presented by the algorithms, it is observed that in the developed model, it was possible to see a dense vector trail of the object, with a slight tendency of displacement to the left, as it continues its movement. In the other models, this was not possible, and it is necessary to resort to a score of points, in the lower table. This process is also possible in the proposed model, but not necessary, which means a reduction in computational cost (see Figures 2128).

Figure 21.

Variation of the optical flow of the moving object.

Figure 22.

Vehicle movement.

Figure 23.

Object moving to the right side.

Figure 24.

Object moving to the left side.

Figure 25.

Variation of the optical flow of the moving object.

Figure 26.

Vehicle movement.

Figure 27.

Object moving to the left side.

Figure 28.

Object moving to the left side.

Comparing the results, it is observed that the Farneback algorithm also presents high vector density. But the proposed model, as previously said, presents a well-defined vector trail which suggests the non-use of the point count in the lower frame, which does not occur for the Farneback algorithm, indicating higher computational cost, which can affect the accuracy of this algorithm when compared to the proposed algorithm.

Comparing the Horn and Schunck algorithm, a low vector density is observed when compared to the proposed algorithm, which indicates lower accuracy when compared to the proposed algorithm.

Although the two techniques of Lucas and Kanade, are faster applications, indicating low computational cost when compared to the proposed algorithm, the factor of low vector density results in low precision in relation to the proposed method.

Advertisement

6. Final considerations

The proposed method presented good results, showing to be accurate and reasonable speed. This allows this application to be used in critical problems, i.e., to real-world problems. However, it presented limitations that could be verified when compared to the model with Lucas and Kanade, with a Gaussian filter, which is faster and presents good accuracy.

The proposed Method reached only approximately 50% execution speed in relation to the Lucas and Kanade Method, which motivates further improvements to the Method. The technique presented can be applied to other fields of research as in cardiology due to presenting great precision when submitted to small region, which is important because it can be applied with the objective of predicting infarctions and as a current contribution, for the state of the art is to characterize the optical flow in terms of Gaussian curvature, that makes it possible to highlight fields of research such as computational vision and differential geometry.

Advertisement

Acknowledgments

The authors of this work would like to thank the Institute of Electronics and Informatics Engineering of Aveiro, the Telecommunications Institute of Aveiro, and the University of Aveiro for the financial, technical-administrative, and structural support provided that allowed the accomplishment of this work.

References

  1. 1. Gonzalez RC, Woods RE. Digital Image Processing. 2002
  2. 2. Abbass MY, Kwon KC, Kim N, Abdelwahab SA, El-Samie FEA, Khalaf AA. A survey on online learning for visual tracking. The Visual Computer. 2020:1-22
  3. 3. Khalid M, Penard L, Memin E. Application of optical flow for river velocimetry. International Geoscience and Remote Sensing Symposium. 2017:6265-6246
  4. 4. Kastrinaki V, Zervakis M. A survey of video processing techniques for traffic applications. Image and Vision Computing. 2003;21(4):359-381
  5. 5. Almodfer R, Xiong S, Fang Z, Kong X, Zheng S. Quantitative analysis of lane-based pedestrian-vehicle conflict at a non-signalized marked crosswalk. Transportation Research Part F: Traffic Psychology and Behaviour. 2016;42:468-468
  6. 6. Tian B, Yao Q, Gu Y, Wang K, Li Y. Video processing techniques for traffic flow monitoring: A survey. In: ITSC. IEEE; 2011
  7. 7. Laurense VA, Goh JY, Gerdes JC. Path-tracking for autonomous vehicles at the limit of friction. In: ACC. IEEE; 2017. p. 56665591
  8. 8. Yilmaz A, Javed O, Shah M. Object tracking: A survey. ACM Computing Surveys. 2006;38(2006):13
  9. 9. Veenman C, Reinders M, Ebacker E. Resolving motion matching for densely moving points. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;23(1):54-72
  10. 10. Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep Learning. Vol. 1. Massachusetts, USA: MIT Press; 2016
  11. 11. Santos Junior JMD. Analisando a viabilidade de deep learning para reconhecimento de ações em datasets pequenos. 2018
  12. 12. Kelleher JD. Deep Learning. MIT Press; 2019
  13. 13. Xiong Q, Zhang J, Wang P, Liu D, Gao RX. Transferable two-stream convolutional neural network for human action recognition. Journal of Manufacturing Systems. 2020;56:605-614
  14. 14. Khan MA, Sharif M, Akram T, Raza M, Saba T, Rehman A. Hand-crafted and deep convolutional neural network features fusion and selection strategy: An application to intelligent human action recognition. Applied Soft Computing. 2020;87(73):74986
  15. 15. Abdelbaky A, Aly S. Human action recognition using three orthogonal with unsupervised deep convolutional neural network. Multimedia Tools and Applications. 2021;80(13):20019-20065
  16. 16. Rani SS, Naidu GA, Shree VU. Kinematic joint descriptor and depth motion descriptor with convolutional neural networks for human action recognition. Materials Today: Proceedings. 2021;37:3164-3173
  17. 17. Farnebäck G. Two-frame motion estimation based on polynomial expansion. In: Proceedings of the Scandinavian Conference on Image Analysis (SCIA). 2003. pp. 363-370
  18. 18. Wang Z, Xia C, Lee J. Group behavior tracking of Daphnia magna based on motion estimation and appearance models. Ecological Informatics. 2021;61:7278
  19. 19. Lin W, Hasenstab K, Cunha GM, Schwartzman A. Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment. Scientific Reports. 2020;10(1):1-11
  20. 20. Xu Y, Zhou X, Chen S, Li F. Deep learning for multiple object tracking: A survey. IET Computer Vision. 2019;13(4):355-368
  21. 21. Pal SK, Pramanik A, Maiti J, Mitra P. Deep learning in multi-object detection and tracking: State of the art. Applied Intelligence. 2021:1-30
  22. 22. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, et al. A survey of deep learning-based object detection. IEEE Access. 2019;7:51837-51868
  23. 23. Pal SK, Bhoumik D, Chakraborty DB. Granulated deep learning and z-numbers in motion detection and object recognition. Neural Computing Applied. 2020;32(21):16533-16555
  24. 24. Chung D, Tahboub K, Delp EJ. A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 1983-1671
  25. 25. Choi H, Park S. A survey of machine learning-based system performance optimization techniques. Applied Sciences. 2021;11(7):3235
  26. 26. Abdulkareem NM, Abdulazeez AM. Machine learning classification based on Radom Forest algorithm: A review. International Journal of Science and Business. 2021;5(2):51-142
  27. 27. Iwendi C, Jo O. COVID-19 patient health prediction using boosted random Forest algorithm. Frontiers in Public Health. 2020;8:9
  28. 28. Dolejš M. Generating a spatial coverage plan for the emergency medical service on a regional scale: Empirical versus random forest modelling approach. Journal of Transport Geography. 2020:10 Available from: https://link.springer.com/book/10.687/978-981-15-0637-6
  29. 29. Reis I, Baron D, Shahaf S. Probabilistic random forest: A machine learning algorithm for Noisy data sets. The Astronomical Journal. 2018;157(1):16. DOI: 10.38/1538-3881/aaf69
  30. 30. Thomas B, Thronson H, Buonomo A, Barbier L. Determining research priorities for astronomy using machine learning. Research Notes of the AAS. 2022;6(1):11
  31. 31. Yoo S, Kim S, Kim S, Kang BB. AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification. Information Sciences. 2021;546:420-655
  32. 32. Liu C, Gu Z, Wang J. A hybrid intrusion detection system based on scalable K-means+ random Forest and deep learning. IEEE Access. 2021;9:75729-75740
  33. 33. Paschos G. Perceptually uniform color spaces for color texture analysis: An empirical evaluation. IEEE Transactions on Image Processing. 2001;10:932-937
  34. 34. Estrada FJ, Jepson AD. Benchmarking image segmentation algorithms. International Journal of Computer Vision. 2009;56(2):167-181
  35. 35. Jaiswal JK, Samikannu R. Application of random forest algorithm on feature subset selection and classification and regression. In: 2017 World Congress on Computing and Communication Technologies (WCCCT). IEEE; 2017. pp. 65-68
  36. 36. Menezes R, Evsukoff A, González MC, editors. Complex Networks. Springer; 2013
  37. 37. Jeong C, Yang HS, Moon K. A novel approach for detecting the horizon using a convolutional neural network and multi-scale edge detection. Multidimensional Systems and Signal Processing. 2019;30(3):1187-1654
  38. 38. Liu YJ, Tong SC, Wang W. Adaptive fuzzy output tracking control for a class of uncertain nonlinear systems. Fuzzy Sets and Systems. 2009;160(19):2727-2754
  39. 39. Beckmann M, Ebecken NF, De Lima BSP. A KNN undersampling approach for data balancing. Journal of Intelligent Learning Systems and Applications. 2015;7(04):72
  40. 40. Yoriyaz H. Monte Carlo method: Principles and applications in medical physics. Revista Brasileira de Física Médica. 2009;3(1):141-149
  41. 41. Wang X. Intelligent multi-camera video surveillance: A review. Pattern Recognition Letters. 2013;34(1):3-19
  42. 42. Wu J, Rehg JM. CENTRIST: A visual descriptor for scene characterization. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(8):1559-1501
  43. 43. Cremers D, Schnorr C. Statistical shape knowledge in variational motion segmentation. Israel Network Capital Journal. 2003;21:77-86
  44. 44. Siegelman N, Frost R. Statistical learning as an individual ability: Theoretical perspectives and empirical evidence. Journal of Memory and Language. 2015;81(73):74-65
  45. 45. Kim IS, Choi HS, Yi KM, Choi JY, Kong SG. Intelligent visual surveillance—A survey. International Journal of Control, Automation, and Systems. 2010;8(5):926-939
  46. 46. Chan KL. Detection of swimmer using dense optical flow motion map and intensity information. Machine Vision and Applications. 2013;24(1):75-69
  47. 47. Szpak ZL, Tapamo JR. Maritime surveillance: Tracking ships inside a dynamic background using a fast level-set. Expert System with Applications. 2011;38(6):6669-6680
  48. 48. Fefilatyev S, Goldgof D, Shceve M, et al. Detection and tracking of ships in open sea with rapidly moving buoy-mounted camera system. Ocean-Engineering. 2012;54(1):1-12
  49. 49. Frost D, Tapamo J-R. Detection and tracking of moving objects in a maritime environment with level-set with shape priors. EURASIP Journal on Image and Video Processing. 2013;1(42):1-16
  50. 50. Collins RT, Lipton AJ, Kanade T, et al. A System for Video Surveillance and Monitoring. Technical Report. Pittsburg: Carnegie Mellon University; 2000
  51. 51. Viola P, Jones MJ. Robust real-time face detection. International Journal of Computer Vision. 2004;57(2):63-154
  52. 52. Rodrigues-Canosa GR, Thomas S, Cerro J, et al. Real-time method to detect and track moving objects (DATMO) from unmanned aerial vehicles (UAVs) using a single camera. Remote Sensing. 2012;4(4):770-341
  53. 53. Frakes D, Zwart C, Singhose W. Extracting moving data from video optical flow with Fhysically-based constraints. International Journal of Control, Automation and Systems. 2013;11(1):55-57
  54. 54. Sun K. Robust detection and tracking of long-range target in a compound framework. Journal of Multimedia. 2013;8(2):98 73, 74
  55. 55. Kravchenko P, Oleshchenko E. Mechanisms of functional properties formation of traffic safety systems. Transportation Research Procedia. 2017;20:367-372
  56. 56. Lucas BD, Kanade., T. An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence. 1981
  57. 57. Gong Y, Tang W, Zhou L, Yu L, Qiu G. A discrete scheme for computing Image’s weighted Gaussian curvature. IEEE International Conference on Image Processing (ICIP). 2021;2021:1919-1923. DOI: 10.1109/ICIP42928.2021.9506611
  58. 58. Hooker G, Mentch L. Bootstrap bias corrections for ensemble methods. arXiv preprint arXiv:1506.00553. 2015
  59. 59. Tran T. Semantic Segmentation Using Deep Neural Networks for MAVs. 2022
  60. 60. Horn BAND, Schunk B. Determining optical flow. Artificial Intelligence. 1981;17:156
  61. 61. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 EEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 1. IEEE; 2005. pp. 886-893
  62. 62. Rosten E, Drummond T. Fusing points and lines for high performance tracking. In: 10th IEEE International Conference on Computer Vision. Vol. 2. Beijing, China; 2005. pp. 1508-1515
  63. 63. Smolka B, Venetsanopoulos AN. Noise reduction and edge detection in color images. In: Color Image Processing. CRC Press. 2018. pp. 95-122
  64. 64. Li L, Leung MK. Integrating Intensity and Texture Differences for Robust Change. 2002
  65. 65. Shi J, Tomasi C. Good features to track. In: 9th IEEE Conference on Computer Vision and Pattern Recognition. Seattle WA, USA; 1674. pp. 593-600
  66. 66. Cucchiara R, Prati A, Vezzani R. Advanced video surveillance with pan tilt zoom cameras. In: Proceedings of the 6th IEEE International Workshop on Visual Surveillance. Graz, Austria; 2006
  67. 67. Li J, Wang Y, Wang Y. Visual tracking and learning using speeded up robust features. Pattern Recognition Letters. 2012;33(16):2094-2269
  68. 68. Fernandez-Caballero A, Castillo JC, Martinez-Cantos J, et al. Optical flow or image subtraction in human detection from infrared camera on Mobile robot. Robotics and Autonomous Systems. 2010;66(12):503-511
  69. 69. Frakes D, Zwart C, Singhose W. Extracting moving data from video optical flow with physically-based constraints. International Journal of Control, Automation and Systems. 2013;11(1):55-57
  70. 70. Revathi R, Hemalatha M. Certain approach of object tracking using optical flow techniques. International Journal of Computer Applications. 2012;53(8):50-57
  71. 71. Breiman L. Consistency for a Simple Model of Random Forests. 2004
  72. 72. Biau G, Devroye L, Lugosi G. Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research. 2008;9(9)
  73. 73. Meinshausen N, Ridgeway G. Quantile regression forests. Journal of Machin Learning Research. 2006;7(6)
  74. 74. Ishwaran H, Kogalur UB. Consistency of random survival forests. Statistics & Improbability Letters. 2010;80(13–14):746-744
  75. 75. Biau G. Analysis of a random forests model. The Journal of Machine Learning Research. 2012;13(1):743-775
  76. 76. Genuer R. Variance reduction in purely random forests. Journal of Nonparametric Statistics. 2012;24(3):565-562
  77. 77. Wager S. Asymptotic theory for random forests. arXiv preprint arXiv:1405.0352. 2014
  78. 78. Scornet E, Biau G, Vert JP. Consistency of random forests. The Annals of Statistics. 2015;65(4):1716-1741
  79. 79. Murphy KP. Machine Learning: A Probabilistic Perspective. MIT Press; 2012
  80. 80. Yoriyaz H. Monte carlo method: Principles and applications in medical physics. Revista Brasileira de Física Médica. 2009;3(1):141-149

Notes

  • Actual amount of data rate for actual data recorded.
  • MoG: mixture of Gaussian distributions.

Written By

Ronaldo Ferreira, Joaquim José de Castro Ferreira and António José Ribeiro Neves

Submitted: 13 September 2021 Reviewed: 25 January 2022 Published: 28 April 2022