Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient Estimation

Alexander Tashlinskii; Pavel Smirnov

doi:10.5772/intechopen.83489

Abstract

The effectiveness of the use of stochastic gradient estimation to detect motion in a sequence of images is investigated. Pixel-by-pixel reverse estimation of the deformation field is used. The representation of the shift vector is considered both by the projections on the basic axes and by the polar parameters. Two approaches to estimate the parameters of the deformation field are proposed and analyzed. In the first approach, the stochastic gradient procedure sequentially processes all rows of an image to find estimates of shifts for all points of the reference image. The joint processing of the results allows compensating the inertia of the stochastic estimation. In the second approach, to improve the accuracy of estimation, the correlation of image rows is taken into account. As a criterion for the formation of the resulting estimate, the minimum of gradient estimation and correlation maximum were investigated. The computational complexity of the proposed algorithms is investigated. The algorithms are compared with the MVFAST algorithm. Examples of experimental results on the formation of the deformation field, the selection of a moving object area, and the finding of the movement and trajectory parameters of a moving object in a video sequence are given.

Keywords

image processing
video sequence
motion
shift vector
detection
stochastic estimation
stochastic gradient
trajectory
reverse estimation
correlation

Author Information

Show +

Alexander Tashlinskii
- Ulyanovsk State Technical University, Russia
Pavel Smirnov*
- Ulyanovsk Instrument Manufacturing Design Bureau, Russia

*Address all correspondence to: rtcis@mail.ru

1. Introduction

One of the challenges in video processing is moving object detection and tracking. Some tasks require only detection of the motion, while others require extraction of the moving object or the motion area boundary. The biggest challenge is to estimate trajectory parameters of a moving object in a video sequence. A solution quality to the problem largely depends on the accuracy of moving object area detection, since all the information needed to determine motion parameters and trajectory of the object is extracted from the image.

There are various approaches to identify the area of moving object based on the inter-frame difference [1, 2], background subtraction [2, 3], the use of statistics [2, 4], block estimation [5], and optical flow analysis [6]. The processing can be presented as estimation of inter-frame geometric deformations of two images, one of which can be considered as the reference image Zs=zi,js and the second as deformed image Zd=zi,jd, where zi,j∗ is the value in node ij of the image. The field H=h¯i,j of inter-frame shift vectors h¯i,j of all points of the reference image corresponding to the nodes of the sample grid will be called deformation field or shift vectors’ field.

An important task of estimating the trajectory of a moving object is to increase the spatial accuracy of detecting the area of motion in an image sequence. The solution of this problem depends on the quality of estimation of the deformation field. When finding estimates of shifts for each node of the deformed image, the approach of using stochastic gradient estimation [7, 8, 9] is promising. It finds coordinates xy of points of the reference image on the deformed image. That is, it evaluates vectors h¯i,j=hijxhijyТ of inter-frame shifts of points ij (Figure 1) of the reference image.

Figure 1.
Representation of shift of reference image point ij.

In view of the insignificant change in brightness on adjacent frames of a video sequence, it is advisable to use mean square inter-frame difference as the objective function of estimation. Then, for the projections, hijx and hijy of shift vector, we get [10]

h¯̂i,j+1=h¯̂i,j−λhsignβ¯hzi,j+1dh¯̂i,j,i=1,Nx¯,j=1,Ny−1¯,E1

where λh is the learning rate, which determines the rate of change of the estimated parameters, β¯h=βhxβhyT is the gradient estimation of an objective function, and Nx×Ny is the image size.

The projections of stochastic gradient can be represented as [11]

βhx=z˜x+Δx,ys−z˜x−Δx,ysz˜x,ys−zi,jd,

βhy=z˜x,y+Δys−z˜x,y−Δysz˜x,ys−zi,jd,

where xy are the coordinates of the reference image point on the deformed image, z˜x,ys is the brightness of the continuous image Z˜s obtained from Zs by means of interpolation, and Δx,Δy are the steps of finding estimates of derivatives dz˜x,ys/dx and dz˜x,ys/dy [12].

Note that inter-frame shift vector h¯i,j can be also represented in polar form: h¯i,j=ρi,jφi,j (Figure 1), where ρi,j is the length of the vector and φi,j is the angle with respect to the x axis (Figure 1). Functionally, these representations are equivalent. However, due to the inertia of estimates of shift vectors’ field H, when using stochastic gradient descent, the use of parameters ρi,j and φi,j does not give equivalent estimates h¯̂i,j compared with the use of hijx and hijy. This is due to the fact that the sets of parameters have different physical meanings [13]. The answer to the question of which set is preferable for solving the problem of moving object area detection is not obvious and requires research.

2. Variants of estimation algorithms

Let us compare the efficiency of moving object area identification using the sets of parameters hxhy and ρφ. The following method for estimating the shift vectors’ field H is used. At the first stage, stochastic gradient descent algorithm sequentially processes all nodes of the reference image row by row with the set of parameters hxhy. It processes each row i bidirectionally [14], first, from the left to the right, getting, according to Eq. (1), the estimates

ĥij+1xl=ĥijxl−λhsignβhxzi,j+1dh¯̂i,j,ĥi1xl=ĥi−11xl,

ĥij+1yl=ĥijyl−λhsignβhyzi,j+1dh¯̂i,j,ĥi1yl=ĥi−11yl,

and then from the right to the left getting the estimates

ĥiNy−jxr=ĥiNy−j+1xr−λhsignβhxzi,Ny−jdh¯̂i,Ny−j+1,ĥiNyxr=ĥi−1Nyxr,

ĥiNy−jyr=ĥiNy−j+1yr−λhsignβhyzi,Ny−jdh¯̂i,Ny−j+1,ĥiNyyr=ĥi−1Nyyr,

where parameter λh is determined by the maximum speed of moving objects.

When using the set of parameters (ρ,φ), considering that ∂x/∂ρ=cosφ, ∂x/∂φ=−ρsinφ, ∂y/∂ρ=sinφ, and ∂y/∂φ=ρcosφ, for gradient estimation β¯=βρβφT, we get [11]

βρ=z˜x+Δx,ys−z˜x−Δx,ysz˜x+Δx,ys+z˜x−Δx,ys−2zi,jdcosφ+z˜x,y+Δys−z˜x,y−Δysz˜x,y+Δys+z˜x,y−Δys−2zi,jdsinφ,

βφ=−z˜x+Δx,ys−z˜x−Δx,ysz˜x+Δx,ys+z˜x−Δx,ys−2zi,jdρsinφ+z˜x,y+Δys−z˜x,y−Δysz˜x,y+Δys+z˜x,y−Δys−2zi,jdρcosφ.

Then according to Eq. (1), we get the estimates

ρ̂i,j+1l=ρ̂i,jl−λρsignβρzi,j+1dρ̂i,jφ̂i,j,

φ̂i,j+1l=φ̂i,jl−λφsignβφzi,j+1dρ̂i,jφ̂i,j,

ρ̂i,Ny−jr=ρ̂i,Ny−j+1r−λρsignβρzi,Ny−jdρ̂i,Ny−j+1φ̂i,Ny−j+1,

φ̂i,Ny−jr=φ̂i,Ny−j+1r−λφsignβφzi,Ny−jdρ̂i,Ny−j+1φ̂i,Ny−j+1i,j,

where the parameters λρ and λφ are also determined by the maximum speed of moving objects.

At the second stage, for each node ij, the estimates ĥijxl and ĥijxr are processed jointly [15]. The optimal value of ĥijx is found between the estimates with some step Δh. Step Δh is determined by the required accuracy. If the absolute difference between the estimates ĥijxl and ĥijxr is less than Δh, then the estimate ĥijx of the deformation parameter is assumed to be equal ĥijxl. Otherwise, the set of possible values of the deformation parameter estimate is given by

ĥijxm=ĥijxl+mΔh,m=0,k+1¯,k=ĥijxr−ĥijxl/Δh.

The optimal value of ĥijx from the resulting set is found using some criterion.

2.1 Criteria for the formation of the resulting estimation

Two criteria for the formation of the resulting estimation are studied [16]:

Gradient estimation minimum
minm=0,k+1¯βhĥijxl+mΔh,E2
Correlation coefficient maximum (CC)
maxm=0,k+1¯CCz˜xm+p,ym+sszi+p,j+sd,p=−a,a¯,s=−b,b¯,E3

where (xm,ym) are the coordinates of the point ij of the image Zs in the image Zd for the estimate ĥi,jm and 2a+1×2b+1 is the window size for calculation of CC. The optimal value ĥijy is determined similarly.

Joint processing of estimates allows compensating the inertia of the recursive estimation. The results given below were obtained with a=b=1.

The joint processing of estimates ρ̂i,jl, ρ̂i,jr and φ̂i,jl, and φ̂i,jr as well as finding the optimal values of ρ̂i,j and φ̂i,j are performed as described above using the same criteria of gradient estimation minimum and CC maximum.

Thus for joint processing of the estimates, depending on the criterion, we obtain two versions of the estimation algorithm:

Algorithm A: reverse processing using the criterion Eq. (2);

Algorithm B: reverse processing using the criterion Eq. (3).

2.2 Comparison of the estimation algorithm efficiency

Let us first consider the efficiency of the criterion Eq. (2). Figure 2 shows an example of images used for the study. These are two adjacent frames of a video sequence where the vehicle located in the center is moving and the vehicle on the right is motionless. The parameters of inter-frame spatial shift of the moving vehicle are hx=3, hy=2,95 or ρ=4,2, and φ=45∘.

Figure 2.
An example of adjacent frames of a video sequence with a moving object.

For example, Figure 3 shows typical results of shift magnitude estimation for a single row of the reference image (Figure 2) while using Algorithm A. For a correct comparison of the estimates ρ̂i,j, φ̂i,j and ĥijx, and ĥijy, the latter is recalculated to polar parameters:

ρh=ĥijx2+ĥijy2,φh=arctgĥijx/ĥijy,

Figure 3.
An example of estimates of shift magnitude for a row using criterion Eq. (2).

Figure 3a shows dependencies ρ̂i,jl and ρ̂i,jr on i point in a row, Figure 3b shows the result of their joint processing, Figure 3c shows dependencies ρhl and ρhr on i, and Figure 3d shows the result of their joint processing. The solid gray line represents the true value of the deformation parameter. The results for parameters ρφ are visually better. Estimates given in Table 1 also confirm that. Table 1 shows mean value mh¯ and variance σ̂h¯2 of estimation error for the area with and without motion for results shown in Figure 3b and d.

Algorithm	Motion area				Area without motion
	mh¯		σ̂h¯2×10−2		mh¯		σ̂h¯2×10−2
	hxhy	ρφ	hxhy	ρφ	hxhy	ρφ	hxhy	ρφ
For a single row
A	0.46	0.31	21.27	10.41	0.77	0.49	21.91	26.11
B	0.18	0.1	11.22	0.66	0.3	0.15	3.82	0.89
C	0.37	0.06	19.8	4.84	0.44	0.22	3.4	1.42
D	0.06	0.01	0.38	0.26	0.03	0.02	0.29	0.03
MVFAST	0.05		25.3		0.01		0.01
For the entire image
A	0.61	0.53	23.6	14.3	0.73	0.45	23.8	18.8
B	0.42	0.34	7.3	1.7	0.29	0.15	2.6	1.5
C	0.13	0.14	15.9	10.1	0.46	0.25	2.8	2.9
D	0.07	0.01	1.4	0.69	0.09	0.02	0.28	0.02
MVFAST	0.08		18.6		0.02		0.05

Table 1.

Estimation error of shift vectors of deformation field.

Note that with the use of parameters ρφ in the motion area, the mean value of the estimation error is lower, and the variance is more than two times less than with the use of the set of parameters hxhy. In the area without motion, the use of parameters ρφ is also preferable since the bias of the estimates is less despite the slightly larger variance. For comparison, Table 1 also shows the results for the mean value and estimation of the variance σ̂h¯2 of estimation errors averaged over the entire image.

Figure 4 shows the results of the joint processing of the sets of estimates ĥijxĥijy and ρ̂i,jlρ̂i,jr using the criterion Eq. (3) of the formation of the resulting estimate (Algorithm B). The results are for the same row as the results in Figure 3. Figure 4a corresponds to the set of parameters hxhy, and Figure 4b corresponds to ρφ. Table 1 summarizes the numerical data of the estimation error for the row and averaging over the entire image.

Figure 4.
The resulting estimates of shift magnitude for a row using criterion Eq. (3).

The results show that the use of CC maximum as the criterion can significantly improve the results of the joint processing of the estimates. For both sets of parameters, the mean value of the error for the motion area does not exceed 0.2. The variance of the error is approximately halved for the set of parameters hxhy and 15 times less for parameters ρφ. For the area without motion, the mean error is less by about 2.6 and 3.3 times. The variance of the estimates also decreases. However, when using CC, the computational complexity of the estimation procedure increases significantly.

Figure 5 shows the visualization of estimates of the shift vectors’ field H for the four estimation algorithms described above. It shows the magnitudes of the estimated vectors as a function of coordinates of nodes of the reference image. Figure 5a and b shows the results for criteria Eqs. (2) and (3) for the set of parameters hxhy and Figure 5c and d for the same criteria when parameters ρφ are used. The figures illustrate and confirm the conclusions made above.

Figure 5.
Visualization of estimating shift vectors’ field by algorithms A and B with different criteria.

2.3 Use of inter-row correlation of adjacent rows of the image

The approach to estimate the field H considered above processes images row by row as one-dimensional signals. Let us consider the possibility to increase processing efficiency by taking into account correlation of the adjacent rows of image [13, 17]. For each node ij of the reference image, we will also form two estimates h¯̂i,j1 and h¯̂i,j2 of the parameters of the vector h¯i,j. To form the estimate h¯̂i,j1, the stochastic gradient procedure processes rows one after the other with the change in direction after each row, for example, the odd rows from the left to the right, forming estimates h¯̂i,j1l, and the even rows, from the right to the left, forming estimates h¯̂i+1,j1r. The current shift estimate h¯̂i+1,j1 in a node is compared with the estimate h¯̂i,j1, obtained from the previous row. If the estimate h¯̂i,j1 is considered better using some criterion, then it is taken as the current estimate.

As the criterion we use CC maximum Eq. (3) with the difference that here the set of possible values is limited by the estimates h¯̂i+1,j1 and h¯̂i,j1. The estimate h¯̂i,j2 is also formed in a similar way with the difference that, when processing, the odd rows are processed from the right to the left and the even rows from the left to the right.

The joint processing of estimates h¯̂i,j1 and h¯̂i,j2 is carried out as described above. On the one hand, when taking into account inter-row correlation, the formation of h¯̂i,j1 and h¯̂i,j2 requires a greater amount of computation. On the other hand, the set of possible estimates for their joint processing turns out to be significantly smaller, which reduces the computational complexity.

Thus we have two more algorithms to estimate the shift vectors’ field:

Algorithm C: processing of adjacent rows using the criterion Eq. (2); and
Algorithm D: processing of adjacent rows using the criterion Eq. (3).

2.4 Comparison of algorithms C and D

Figure 6 shows the typical results of joint processing of the sets of estimates ĥijx, ĥijy (Figure 6a) and ρ̂i,j, and φ̂i,j (Figure 6b) using Algorithm C. Figure 6c and d shows the results for the same sets and Algorithm D. It can be seen from the figures that the use of inter-row correlation can significantly improve the estimation results. For the parameters hxhy and Algorithm C, mean value of the error for motion area decreases by 1.2 times, error variance by 1.1 times, and for Algorithm D by 3 times and 29 times, respectively. For the set of parameters ρφ and Algorithm C, the mean value of the error for motion area decreases by 5 times, variance by 2.1 times, and for Algorithm D by 10 times and 2.5 times, respectively. The numerical values are listed in Table 1.

Figure 6.
Example of estimates of shift magnitude for a row using inter-row correlation.

Figure 7 shows visualization of the estimates of shift vectors’ field H for the estimation algorithms. Figure 7a and b corresponds to the algorithms C and D and set of parameters hxhy, and Figure 7c and d corresponds to the same algorithms and the set of parameters ρφ. It can be seen that criterion Eq. (3) provides a greater accuracy of estimation (but also a larger computational costs). When maximum spatial accuracy is required, it is advisable to use one of algorithms B and D. The error for the Algorithm D is smaller, but the algorithm has a greater computational complexity.

Figure 7.
Visualization of shift vectors’ field while using inter-row correlation.

Figure 8 shows moving object area and its contour obtained from the results of algorithms C (Figure 8a) and D (Figure 8b) by thresholding with the threshold equal to 0.1 of the maximum value of shift magnitude. Note that there are practically no errors of the second kind for Algorithm D.

Figure 8.
Results of moving object area detection.

3. Efficiency of the algorithms for complex motion

Let us consider the possibilities of the developed algorithms when estimating shift vectors’ field for more complex types of motion, in particular, for the case when inter-frame geometric deformations of images of a moving object can be described by a similarity model with a vector of parameters α¯=h¯φκ. To compare the results of estimation of the shift vectors’ field with the results obtained for the images in Figure 2, the right image was modeled with the corresponding parameters of inter-frame deformations: h¯=23T, φ=4∘, and κ=1.

The results presented below are for algorithms B and D which were shown to have the highest efficiency with the use of Eq. (3) as a criterion. Figure 9 shows typical results for a single row of the image; here graphs (a) and (b) correspond to the Algorithm B and the set of parameters hxhy and ρφ, respectively, and graphs (c) and (d) correspond to Algorithm D. In this case the estimates for the set of parameters ρφ are closer to the true values (gray line). Figure 10 shows visualization of estimates of the shift vectors’ field for the entire image. Figure 10a and b corresponds to Algorithm B, and Figure 10c and d corresponds to Algorithm D with the set of parameters hxhy and ρφ, respectively.

Figure 9.
Example of estimates of shift magnitude for a row in case of complex motion of an object.

Figure 10.
Visualization of shift vectors’ field in case of complex motion of an object.

In the presented figures, in order to remove outliers significantly exceeding the true value of the shift, the estimation module was limited to ρmax=10. When using Algorithm B, we note a significant number of false estimates ρ≥ρmax both for and outside the area of motion. In particular, when using the parameters hxhy, the estimates for a significant number of bottom rows of the image exceed the threshold. The use of the inter-row correlation significantly reduces the number of outliers for the set of parameters hxhy and virtually eliminates them for the set ρφ.

4. Comparison of the developed algorithms with the MVFAST algorithm

For block approach [18, 19], one of the most effective algorithms for the formation of shift vectors’ field is the MVFAST (Motion Vector Field Adaptive Search Technique) algorithm [20]. Therefore, we compare the effectiveness of the proposed algorithms and the MVFAST algorithm.

In the MVFAST algorithm, the deformed image is divided into many nonoverlapping square blocks of a predetermined size. In most cases the block sizes are 2 × 2, 4 × 4, etc., but the algorithm allows setting blocks of any size, starting from one pixel. The estimation of the shift vector for the current block is determined by the estimates of the motion vectors of the neighboring blocks. The maximum value of the motion vector length of neighboring blocks is compared with two threshold values L1 and L2, the values of which depend on the type of video sequence. As a result motion activity is classified as low, medium, and high. The criterion is the minimum of the sum of absolute differences (SAD). This characteristic for a block is the sum of the modules of the differences between the nodes of the current block and the corresponding nodes of the image on which the block location is searched. To improve performance and eliminate the influence of noise in the MVFAST algorithm, an early stop of the search is used. The search is terminated if the SAD is less than the specified threshold value T. Early stopping of the search can be disabled by setting the threshold T to zero.

Let us compare the results of estimation of the deformation field by algorithms D and MVFAST. For correctness of the comparison, below are the results obtained for the images in Figure 2 with the parameters of Algorithm D, λh=λρ=0.1, λφ=π/180, Δx=Δy=1, Δh=Δρ=0.2, and Δφ=5π/180, and parameters of the MVFAST algorithm, L1=1, L2=2, T=1, and block size 1x1. Figure 11a shows the estimates of the shift magnitude of the image points corresponding to the nodes of a single row of the reference image and formed by the MVFAST algorithm. The results of Algorithm D with the sets of parameters hxhy and ρφ are shown in Figure 6c and d, respectively.

Figure 11.
Example of estimates for the shift magnitude for a single row and visualization of the deformation field generated by the MVFAST algorithm.

It can be seen from the figure that the MVFAST algorithm forms fairly accurate estimates of node shifts, but, unlike Algorithm D, it has errors at the boundaries of the image of an object. Errors also occur in low-contrast areas inside the object. The latter is due to the fact that SAD in the center of the search for them often does not exceed the threshold T. Setting the threshold T=0 when a block size is 1 pixel also does not solve this problem. Algorithm D, due to the inertia of changes of estimates, is deprived of this disadvantage.

The values of the mathematical expectation and variance of estimation errors when using the MVFAST algorithm for a row and for the entire image are shown in Table 1. The table shows that the mathematical expectation of the MVFAST algorithm for the motion area is almost the same as for Algorithm D with the set of parameters hxhy but several times higher (about five times for a row, eight times for the entire image) than for Algorithm D with the set of parameters ρφ. The variance of the estimation error for the motion area for the MVFAST algorithm significantly exceeds the variance for Algorithm D (13 times with the set of parameters hxhy and 27 times with the set of parameters ρφ. In the absence of noise for area without motion, the MVFAST algorithm shows slightly better results compared to Algorithm D with the set of parameters hxhy, but the results for Algorithm D with the set of parameters ρφ is better. This is well illustrated by Figure 11b, which shows the visualization of the deformation field estimates for the entire image. The results of using Algorithm D with the sets of parameters hxhy and ρφ are shown in Figure 7b and d, respectively.

Let us also compare the efficiency of the algorithms with the complex motion for the same conditions and parameters of inter-frame deformations (h¯=23T, φ=4∘, κ=1) which were used in the analysis of algorithms B and D.

Figure 12a shows the estimates of shift magnitudes for a single row of the reference image (the same row that was selected in the analysis of Algorithm D), when the MVFAST algorithm is used. The results of Algorithm D with the sets of parameters hxhy and ρφ are shown in Figure 9c and d, respectively. The visualization of deformation field for the MVFAST algorithm is shown in Figure 12b. The deformation fields obtained by Algorithm D with the sets of parameters hxhy and ρφ are shown in Figure 10c and d, respectively.

Figure 12.
Example of estimates of shift magnitude for a single row and visualization of the deformation field formed by the MVFAST algorithm for complex motion.

It can be seen from the figure that when estimating a deformation field for complex motion, the MVFAST algorithm gives significantly worse results in terms of spatial accuracy compared to the developed algorithms, and it is practically inapplicable for estimating the parameters of the motion trajectory.

Note that the increase of block size in the MVFAST algorithm allows to get rid of detection gaps inside the object image but still does not solve the problem of errors at its boundaries. At the same time, the increase of block size reduces the accuracy of moving object area detection. The proposed algorithms, in contrast to the MVFAST algorithm, provide subpixel accuracy while estimating the parameters of the shifts that is fundamentally impossible for the MVFAST algorithm.

5. Analysis of computational costs

Along with the accuracy of estimation, computational complexity of the algorithms is also important. In this case, the required amount of computations per pixel of the reference image with coordinates ij consists of two components:

Computational cost of finding estimates h¯̂i,jl and h¯̂i,jr or ρ̂i,jl, ρ̂i,jr and φ̂i,jl, φ̂i,jr;
Computational cost of obtaining the resulting estimate in the joint processing of estimates h¯̂i,jl and h¯̂i,jr or a set of estimates ρ̂i,jl, ρ̂i,jr and φ̂i,jl, φ̂i,jr.

The second component depends on many factors: complexity of the background, noise, number and speed of moving objects, etc. It can be evaluated with the given limitations of the particular problem being solved. Therefore, in the framework of this work, we restrict ourselves from analyzing the first component of computational costs with two sets of parameters for the estimated shift vectors hxhy and ρφ for the three algorithms discussed above:

Algorithm A: single row reverse processing
Algorithm C: joint processing of adjacent rows using gradient estimation minimum criterion
Algorithm D: joint processing of adjacent rows using CC maximum criterion

Detailed analysis of the computational cost requires taking into account not only algorithm itself but a large number of other factors (type of the computing device, time of memory access and for other operations, etc.). Many of these factors depend on the particular imaging device and computing resources. Therefore, we analyze only the computational complexity of the calculated ratios of the algorithms. We will consider the number of operations of addition (+), subtraction (−), multiplication (×), division (÷), tacking the root (), trigonometric functions (sin,cos), and interpolations for a node (z˜).Table 2 shows the number of such operations needed to process one node of an image.

Operation	Number of operations
	Algorithm A		Algorithm C		Algorithm D
	hxhy	ρφ	hxhy	ρφ	hxhy	ρφ
+,−	12	16	24	32	156	160
×	6	20	10	38	114	128
÷	—	—	—	—	4	4
z˜	8	8	16	16	44	44
sin,cos	—	4	—	8	—	4
	—	—	—	—	4	4

Table 2.

The number of operations needed to process a single node of an image.

The total computational cost is defined as

N=c±N±+c×N×+c÷N÷+cz˜Nz˜+cN+csinNsin,E4

where c±,c×,c÷,cz˜,c,csin are the coefficients characterizing the time needed to perform corresponding operations and N±,N×,N÷,Nz˜,N,Nsin are the number of operations.

Equation (4) allows estimating the computational cost of the algorithms for particular computing facilities. For example, for a PC with an AMD Turion II X2 M500 processor: c± = 1.4, c× = 1.4, c÷ = 13.6, cz˜ = 63, c = 135, and csin = 212 ns. Then for the set of parameters hxhy, average time needed to process one node of an image is 0.53 μs for Algorithm A, 1.06 μs for Algorithm C, and 3.74 μs for Algorithm D. The average time for parameters ρφ is 1.4 μs for Algorithm A, 2.8 μs for Algorithm C, and 4.62 μs for Algorithm D.

Let us note the features of the computational complexity of algorithms D and MVFAST. Studies have shown that the amount of computation of the latter is largely determined by the nature of the processed images. So if the adjacent frames of a video sequence do not contain motion, the computational complexity of the algorithm will be determined only by computing one SAD for each block. In the presence of motion, depending on its intensity, the computational complexity may increase in tens or even hundreds of times. In particular, when processing two frames of a video sequence without movement by the MVFAST algorithm with a block size of 1 pixel on the same PC, the average time spent on one node was 0.64 μs. This is the lowest possible computational cost. The average processing time for a single node for the images shown in Figure 2 was 8.91 μs. For similar images with a complex type of motion, for example, with α¯=23T4∘1, processing time is 21.84 μs. Note that the time expenses when using Algorithm D with the set of parameters hxhy were 7.44, 8.8, and 10.75 μs under similar conditions, while with the parameters ρφ, the time expenses were 8.13, 10.27, and 11.8 μs.

From the above results, it can be seen that the computational complexity of Algorithm D weakly depends on the presence or absence of moving objects, their size, and type of motion. The computational complexity of the MVFAST algorithm is largely determined by these factors. So for the case when the inter-frame deformations of images of a moving object correspond only to a parallel shift, the computation time for algorithms D and MVFAST is approximately the same. When motion is complex, it increases in several times for the MVFAST algorithm.

6. An example of estimating the trajectory of a moving object using deformation field

Let us give an example of estimating the trajectory of a moving object using the developed algorithms. Figure 13a and b shows two frames from the video sequence being processed in which the aircraft lands on an aircraft carrier.

Figure 13.
An example of estimation of moving object trajectory.

A complicating factor is the uneven movement of the camera toward the landing plane. Therefore, to estimate the position of the moving object relative to the scene (not to the camera), it was necessary not only to detect and identify the moving object area but also to stabilize the image. First, the distortion caused by the movement of the camera is compensated. To do this the inter-frame deformation of adjacent frames presented as similarity model α¯=hxhyφκ is estimated using deformation fields [21]. Then the area of a moving object is detected. Using the estimates of the deformation field in the area of moving object, the parameters of the inter-frame motion are found, which for 34 frames of the video sequence are isometrically presented in Figure 13c. These parameters are used to estimate a three-dimensional trajectory of a moving object. Figure 13d shows the result in relative coordinates XYZ. The camera position at the initial moment of the shooting is taken as the origin.

7. Conclusion

The effectiveness of the motion area detection in sequence of images using pixel-by-pixel stochastic gradient estimation of the inter-frame shift vectors for all points of the reference image corresponding to its nodes (deformation field) is investigated. It is shown that stochastic gradient estimation of shift vectors does not give equivalent results for their representation as projections on the basic axes of the image and polar parameters. This is due to the fact that these sets of parameters have a different physical meaning. The analysis of variants of estimation algorithms for the deformation field has shown that the use of polar parameters provides higher accuracy of estimation.

Two approaches to estimate the parameters of the deformation field are considered. In the first approach, the stochastic gradient procedure sequentially processes all rows of an image to find estimates of shifts for all points of the reference image. It processes each row bidirectionally, i.e., from the left to the right and from the right to the left. The joint processing of the results allows compensating the inertia of the stochastic estimation. However, this approach does not take into account inter-row correlation, and images are processed as one-dimensional signals. In the second approach, to improve the accuracy of estimation, the correlation of image rows is taken into account. Algorithm processes rows one after the other with change in direction after each row and uses obtained values to form resulting estimate for each node. From the results obtained for each node of the reference image, the resulting estimate of the shift vector is formed. Studies have shown that the second approach, with approximately the same computational complexity, showed significantly higher accuracy in the estimation of the shift vectors’ field.

As a criterion for the formation of the resulting estimate, the minimum of gradient estimation of the objective function and the correlation maximum of local neighborhoods of the deformed and reference images were investigated. Correlation maximum criterion showed the best results. When using this criterion, not only the mean value and the variance of the estimation errors turned out to be less but also the effect of “oscillations” outside the motion area. But the use of this criterion requires a large computational cost.

The computational complexity of the proposed algorithms is investigated. Under equal conditions, the estimation of the projections of the shift vectors requires a slightly smaller amount of computations than the estimation of polar parameters. Using reverse processing of a row is less time-consuming compared with processing of the adjacent rows. The use of the correlation maximum criterion implies a larger amount of calculations than the use of the gradient estimation minimum criterion. The algorithms with large computational costs provide less error; therefore the choice of algorithm depends on the specific task.

The experiments have confirmed the effectiveness of using the developed algorithms to find the motion parameters and the trajectories of objects in a video sequence. In particular, an example of estimating three-dimensional trajectory of an aircraft in relative coordinates using 34 frames of a video sequence is considered. The parameters of the similarity model (parallel shift, rotation angle, and scale factor) were used to describe the inter-frame motion.

Comparison of the developed algorithms with the block algorithm MVFAST (Motion Vector Field Adaptive Search Technique) showed that under identical conditions, the latter shows worse results in accuracy of detecting the boundaries of motion. In addition, in contrast to the MVFAST algorithm, the proposed algorithms allow obtaining subpixel estimation accuracy.

Thus, the conducted study allows us to conclude that the pixel-by-pixel inter-frame estimation of the shift vectors of the reference image points corresponding to its nodes is a promising approach in detecting the area of a moving object in a series of images.

Acknowledgments

The reported study was funded by the RFBR and Government of Ulyanovsk Region according to the research projects 18-41-730006 and 18-41-730011.

References

1. Elhabian SY, El-Sayed KM, Ahmed SH. Moving object detection in spatial domain using background removal techniques. Recent Patents on Computer Science. 2008;1(1):32-54. DOI: 10.2174/1874479610801010032
2. Karasulu B, Korukoglu S. Moving object detection and tracking in videos. In: Performance Evaluation Software. Springer Briefs in Computer Science. USA, New York, NY : Springer, 2013. pp. 7-30. DOI: 10.1007/11612704_16
3. Wang L, Yung NHC. Extraction of moving objects from their background based on multiple adaptive threshold and boundary evaluation. IEEE Transactions on Intelligent Transportation Systems. 2010;1(11):40-51. DOI: 10.1109/TITS.2009.2026674
4. Kutsov RV, Trifonov AP. Detection of a moving object in the image. Journal of Computer and Systems Sciences International. 2006;3(45):459-468. DOI: 10.1134/S1064230706030129
5. Grishin SV, Vatolin DS, Lukin AS. The block methods review of motion estimation in digital video signals. Programmnye Sistemy i Instrumenty. 2008;1(9):50-62. [in Russian]
6. Zolotykh NY, Kustikova VD, Meerov IB. The review of searching and tracking methods in video. Vestnik Nizhegorodskogo universiteta im N.I. Lobachevskogo. 2012;5(2):348-358. [in Russian]
7. Tsypkin YZ. Information Theory of Identification. Moscow: Fizmatlit; 1995. 336 p. [in Russian]
8. Tashlinskii AG. Computational expenditure reduction in pseudo-gradient image parameter estimation. Lecture Notes in Computer Science. 2003;2658:456-462. DOI: 10.1007/3-540-44862-4_48
9. Tashlinskii AG. Pseudogradient estimation of digital images interframe geometrical deformations. In: Obinata G, Dutta A, editors. Vision Systems: Segmentation & Pattern Recognition. Austria, Vienna: I-Tech; 2007. pp. 465-494. DOI: 10.5772/4975
10. Tashlinskii AG. Optimization of goal function pseudogradient in the problem of interframe geometrical deformations estimation. In: Yin PY, editor. Pattern Recognition Techniques, Technology and Applications. Austria, Vienna: I-Tech; 2008. pp. 249-280. DOI: 10.5772/6244
11. Tashlinskii AG, Smirnov PV, Biktimirov LS. Methods of finding gradient estimates of target function for measurement of images parameters. Pattern Recognition and Image Analysis. 2011;2(21):339-342. DOI: 10.1134/S1054661811021057
12. Tashlinskii AG, Khoreva AM, Smirnov PV. Selection of finite differences in finding the stochastic gradient of the objective function in the estimation procedures of interframe deformations of images. Radiotekhnika. 2012;9:56-60. [in Russian]
13. Smirnov PV, Tashlinskii AG. Method for moving object area identification in image sequence. Radiotekhnika. 2015;6:5-11. [in Russian]
14. Tashlinskii AG, Kurbanaliev RM, Zhukov SS. Method for detecting instability and recovery of signal shape under intense noise. Pattern Recognition and Image Analysis. 2013;3(23):425-428. DOI: 10.1134/S1054661813030139
15. Tashlinskii AG, Smirnov PV, Tsaryov MG. Pixel-by-pixel estimation of scene motion in video. International Archives of the Photogrammetry. Remote Sensing and Spatial Information. 2017;XLII-2/W4:61-65. DOI: 10.5194/isprs-archives-XLII-2-W4-61-2017
16. Tashlinskii AG, Smirnov PV. Pixel-by-pixel estimation of interframe geometric deformations of images in case selecting the area of a moving object. Avtomotizatsiia Protsessov Upravleniia. 2015;1(39):41-49. [in Russian]
17. Tashlinskii AG, Smirnov PV. Moving object area identification in image sequence. In: International Siberian Conference on Control and Communications (SIBCON); 21–23 May 2015; Russia. Omsk: IEEE; 2015. pp. 1-5. DOI: 10.1109/SIBCON.2015.7147239
18. Bogush RP, Lysenko VI, Samoschenkov GA. Blocks-based algorithms combination for optical flow calculation for moving objects detection and tracking in video sequences. Vestnik Polotskogo Gosudarstvennogo Universiteta. Fundamentalnye Nauki. 2011;4:2-7. [in Russian]
19. Grishin SV, Vatolin DS, Lukin AS. Review of block methods of motion estimation in digital video signals. Programmnye Sistemy i Instrumenty. 2008;9:50-62. [in Russian]
20. Hosur PI, Ma KK. Motion vector field adaptive fast motion estimation. In: Second International Conference on Information, Communications and Signal Processing (ICICS 1999); December 1999; Singapore. pp. 7-10
21. Kaveev IN, Tashlinskii AG. Estimate parameters of the affine model of image binding by conjugate points. In: Negoda VN, editor. Informatika, Modelirovanie, Avtomatizatsiia: Sbornik Nauchnykh Trudov. Ulyanovsk: UlSTU; 2009. pp. 109-111. [in Russian]

[1] 1. Elhabian SY, El-Sayed KM, Ahmed SH. Moving object detection in spatial domain using background removal techniques. Recent Patents on Computer Science. 2008;1(1):32-54. DOI: 10.2174/1874479610801010032

[2] 2. Karasulu B, Korukoglu S. Moving object detection and tracking in videos. In: Performance Evaluation Software. Springer Briefs in Computer Science. USA, New York, NY : Springer, 2013. pp. 7-30. DOI: 10.1007/11612704_16

[3] 3. Wang L, Yung NHC. Extraction of moving objects from their background based on multiple adaptive threshold and boundary evaluation. IEEE Transactions on Intelligent Transportation Systems. 2010;1(11):40-51. DOI: 10.1109/TITS.2009.2026674

[4] 4. Kutsov RV, Trifonov AP. Detection of a moving object in the image. Journal of Computer and Systems Sciences International. 2006;3(45):459-468. DOI: 10.1134/S1064230706030129

[5] 5. Grishin SV, Vatolin DS, Lukin AS. The block methods review of motion estimation in digital video signals. Programmnye Sistemy i Instrumenty. 2008;1(9):50-62. [in Russian]

[6] 6. Zolotykh NY, Kustikova VD, Meerov IB. The review of searching and tracking methods in video. Vestnik Nizhegorodskogo universiteta im N.I. Lobachevskogo. 2012;5(2):348-358. [in Russian]

[7] 7. Tsypkin YZ. Information Theory of Identification. Moscow: Fizmatlit; 1995. 336 p. [in Russian]

[8] 8. Tashlinskii AG. Computational expenditure reduction in pseudo-gradient image parameter estimation. Lecture Notes in Computer Science. 2003;2658:456-462. DOI: 10.1007/3-540-44862-4_48

[9] 9. Tashlinskii AG. Pseudogradient estimation of digital images interframe geometrical deformations. In: Obinata G, Dutta A, editors. Vision Systems: Segmentation & Pattern Recognition. Austria, Vienna: I-Tech; 2007. pp. 465-494. DOI: 10.5772/4975

[10] 10. Tashlinskii AG. Optimization of goal function pseudogradient in the problem of interframe geometrical deformations estimation. In: Yin PY, editor. Pattern Recognition Techniques, Technology and Applications. Austria, Vienna: I-Tech; 2008. pp. 249-280. DOI: 10.5772/6244

[11] 11. Tashlinskii AG, Smirnov PV, Biktimirov LS. Methods of finding gradient estimates of target function for measurement of images parameters. Pattern Recognition and Image Analysis. 2011;2(21):339-342. DOI: 10.1134/S1054661811021057

[12] 12. Tashlinskii AG, Khoreva AM, Smirnov PV. Selection of finite differences in finding the stochastic gradient of the objective function in the estimation procedures of interframe deformations of images. Radiotekhnika. 2012;9:56-60. [in Russian]

[13] 13. Smirnov PV, Tashlinskii AG. Method for moving object area identification in image sequence. Radiotekhnika. 2015;6:5-11. [in Russian]

[14] 14. Tashlinskii AG, Kurbanaliev RM, Zhukov SS. Method for detecting instability and recovery of signal shape under intense noise. Pattern Recognition and Image Analysis. 2013;3(23):425-428. DOI: 10.1134/S1054661813030139

[15] 15. Tashlinskii AG, Smirnov PV, Tsaryov MG. Pixel-by-pixel estimation of scene motion in video. International Archives of the Photogrammetry. Remote Sensing and Spatial Information. 2017;XLII-2/W4:61-65. DOI: 10.5194/isprs-archives-XLII-2-W4-61-2017

[16] 16. Tashlinskii AG, Smirnov PV. Pixel-by-pixel estimation of interframe geometric deformations of images in case selecting the area of a moving object. Avtomotizatsiia Protsessov Upravleniia. 2015;1(39):41-49. [in Russian]

[17] 17. Tashlinskii AG, Smirnov PV. Moving object area identification in image sequence. In: International Siberian Conference on Control and Communications (SIBCON); 21–23 May 2015; Russia. Omsk: IEEE; 2015. pp. 1-5. DOI: 10.1109/SIBCON.2015.7147239

[18] 18. Bogush RP, Lysenko VI, Samoschenkov GA. Blocks-based algorithms combination for optical flow calculation for moving objects detection and tracking in video sequences. Vestnik Polotskogo Gosudarstvennogo Universiteta. Fundamentalnye Nauki. 2011;4:2-7. [in Russian]

[19] 19. Grishin SV, Vatolin DS, Lukin AS. Review of block methods of motion estimation in digital video signals. Programmnye Sistemy i Instrumenty. 2008;9:50-62. [in Russian]

[20] 20. Hosur PI, Ma KK. Motion vector field adaptive fast motion estimation. In: Second International Conference on Information, Communications and Signal Processing (ICICS 1999); December 1999; Singapore. pp. 7-10

[21] 21. Kaveev IN, Tashlinskii AG. Estimate parameters of the affine model of image binding by conjugate points. In: Negoda VN, editor. Informatika, Modelirovanie, Avtomatizatsiia: Sbornik Nauchnykh Trudov. Ulyanovsk: UlSTU; 2009. pp. 109-111. [in Russian]

Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient Estimation

Pattern Recognition - Selected Methods and Applications

Abstract

Keywords

Author Information

Alexander Tashlinskii

Pavel Smirnov*

1. Introduction

Figure 1.

2. Variants of estimation algorithms

2.1 Criteria for the formation of the resulting estimation

2.2 Comparison of the estimation algorithm efficiency

Figure 2.

Figure 3.

Table 1.

Figure 4.

Figure 5.

2.3 Use of inter-row correlation of adjacent rows of the image

2.4 Comparison of algorithms C and D

Figure 6.

Figure 7.

Figure 8.

3. Efficiency of the algorithms for complex motion

Figure 9.

Figure 10.

4. Comparison of the developed algorithms with the MVFAST algorithm

Figure 11.

Figure 12.

5. Analysis of computational costs

Table 2.

6. An example of estimating the trajectory of a moving object using deformation field

Figure 13.

7. Conclusion

Acknowledgments

References

Depth Extraction from a Single Image and Its Application

Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient Estimation

Pattern Recognition - Selected Methods and Applications

Abstract

Keywords

Author Information

Alexander Tashlinskii

Pavel Smirnov*

1. Introduction

Figure 1.

2. Variants of estimation algorithms

2.1 Criteria for the formation of the resulting estimation

2.2 Comparison of the estimation algorithm efficiency

Figure 2.

Figure 3.

Table 1.

Figure 4.

Figure 5.

2.3 Use of inter-row correlation of adjacent rows of the image

2.4 Comparison of algorithms C and D

Figure 6.

Figure 7.

Figure 8.

3. Efficiency of the algorithms for complex motion

Figure 9.

Figure 10.

4. Comparison of the developed algorithms with the MVFAST algorithm

Figure 11.

Figure 12.

5. Analysis of computational costs

Table 2.

6. An example of estimating the trajectory of a moving object using deformation field

Figure 13.

7. Conclusion

Acknowledgments

References

Continue reading from the same book

Pattern Recognition