RMSE of recovered depth.
Fixational eye movement is an essential function for watching things using the retina, which has the property of responding only to changes in incident light. However, since the rotation of the eyeball causes the translational movement of the crystalline lens, it is possible in principle to recover the depth of the object from the moving image obtained in this way. We have proposed two types of depth restoration methods based on fixation tremor; differential-type method and integral-type method. The first is based on the change in image brightness between frames, and the latter is based on image blurring due to movement. In this chapter, we introduce them and explain the simulations and experiments performed to verify their operation.
- motion stereoscopic
- fixational eye movements
- differential-type method
- integral-type method
- optical flow
- gradient equation
- image blur
When humans stare at a target, an irregular involuntary movement called fixational eye movements occur . The human retina can maintain reception sensitivity by finely vibrating the image of the target on the retina, so in order to see something, first fixation motion is required. It has been reported that the vibrations may work not only as such the intrinsic function to preserve photosensitivity but also as an assistance in image analysis, the mechanism of which can be interpreted as an instance of stochastic resonance (SR) . SR is inspired by biology, more specifically by neuron dynamics , and based on it, the Dynamic Retina (DR)  and the Resonant Retina (RR) , which are new vision devices taking advantage of random camera vibrations, were proposed for contrast enhancement and edge detection respectively. It has been reported that the movement of the retinal image due to fixation eye movements can be an unconscious clue to depth perception, and an actual vision system based on fixational eye movements has been proposed .
On the other hand, binocular stereopsis is vigorous and plays an essential role in depth perception of a human vision system . In general, binocular stereopsis detects relatively large disparities, hence it can recognize high accurate depth. However this causes an occlusion problem, and a lot of solutions of it have been proposed. Wang et al. have proposed a local detector for occlusion based on deep learning . In , a robust depth restoration method has been proposed that integrates line-field imaging technology that simultaneously observes multiple angle views with stereo vision. Therefore, we expect that primitive depth information detected by fixational eye movements can be used to solve occlusion for binocular stereopsis. There is a concern that the accuracy of depth restoration by a small camera motion is lower than that of stereo vision. Even so, it is expected that erroneous correspondence due to the existence of occlusion can be reduced by using the depth information from fixational eye movements for the correspondence problem in stereo vision.
In monocular stereoscopic vision, “structure from motion (SFM)” has been the most widely studied, and many remarkable results have been reported. SFM has various calculation principles. To achieve spatially dense depth recovery with high computational efficiency, a method based on the gradient equation that expresses the constraint between the spatiotemporal derivative values of image intensity and the movement on the image is effective [9, 10, 11]. It should be noted that for such a gradient method, there is an appropriate size of movement to recover the correct depth. Since the gradient equation holds perfectly for small motions, the error in the equation cannot be ignored for very large motions. On the contrary, in the case of small movement, the motion information is buried in the observation error of the spatiotemporal derivative in intensity.
Adaptation of the frame rate is required to make the motion size suitable for the gradient method. We have proposed a method that does not require a variable frame rate based on multi-resolution decomposition of images, but it requires high computational cost . Therefore, we focus on small movements with an emphasis on avoiding equation errors in the gradient method. Then, in order to solve the above signal-to-noise ratio (S/N) problem that occurs with small movements, many observations are collected and used all at once [13, 14]. In such a strategy, it is desirable that the direction and size of the motion take different values. From the above discussion, we examined a depth perception model based on fixational eye movement and gradient method. Fixational eye movements are divided into three types: microsaccades, drifts, and tremors. As the first report of our attempt, we focused on tremor, the smallest of the three types. In the next step, we plan to use drift and microsaccade analogies for further progress. Using a lot of images captured with random small motions of camera, which consists of three-dimensional (3-D) rotations imitating fixational eyeball motions , many observations can be used at each pixel, i.e. many gradient equations can be used to recover the each depth value corresponding to the each pixel. Since the difference between the center of the three-dimensional rotation and the lens center generates a translational motion of the lens center, depth information can be obtained from these images. Simulations with artificial images confirm that the proposed method works effectively when the observed noise is an actual sample of a theoretically defined noise model.
However, if the wavelength of the main luminance pattern is small compared to the size of the motion in the image, aliasing will occur and the gradient equation will be useless. In other words, the methods of [13, 14] cannot be applied. To avoid the above problem, we proposed a new scheme based on the integral form that also used the analogy of fixational eye movement [15, 16]. Add up the many images generated by the above method to get one blurry image. The degree of blur is a function of pixel position and also depends on the depth value of each pixel. That is, the difference in the degree of image blur indicates the depth information. Based on the proposed scheme, the spatial distribution of the image blur is effectively estimated using the blurred image and the original image without blur. By modeling the small 3-D rotation of the camera as a Gaussian random variable, the depth map can be calculated analytically from this blur distribution.
Several depth recovery methods using motion-blur have been already proposed, but those use the blur caused by definite and simple camera motions. For example, blur by a translational camera motion is used in , and blur by an unconstrained camera motion composed of translation and rotation is assumed in . The depth recovery performance of these methods depends on the orientation of the texture in the image. That is, if the texture has a strip pattern whose direction is parallel to the direction of motion in the image, there is less blurring and accurate depth recovery is difficult. Unlike such a specific camera motion, the random camera rotation used in this study works well for any texture. Deterministic camera motion can be used just to solve this problem, but since it does not require precise control of the camera, it is easy to implement random camera rotation in a real system.
We proposed two algorithms based on the integral scheme. The first algorithm detects a point spread function (PSF) that represents image blur and then analyzes it to restore depth . The second algorithm directly calculates the depth without detecting the PSF . These algorithms use a motion-blurred image and a reference un-blurred image. It is expected that the performance of the proposed scheme depends on the degree of motion blur. For the same PSF, i.e. the fixed deviation of the random camera rotations, fine texture is advantageous for observing the accurate blur. This characteristic is the opposite of the method based on the differential scheme based on the gradient equation.
The features of our methods described above can be summarized as follows.
In a camera motion model that simulates eyeball rotation, the translation of the lens center is secondarily generated by eye rotation, reducing the number of unknown parameters. This camera motion model also allows you to recover depth as an absolute quantity instead of a relative quantity.
The movement of the camera is subtle because it simulates the tremor component of the eyeball. Therefore, by using a large number of image pairs, it is possible to improve the accuracy of depth recovery while avoiding occlusion.
In order to use multiple image pairs at the same time, we have adopted a direct framework that explicitly uses the inverse depth, which is a common parameter for them.
In the following, we will first explain the camera model and the imaging system, and then explain the differential scheme and the integral scheme, and the algorithms based on each. Due to the limitation of the number of pages, the integer type explains only the direct method. The function and characteristics of each algorithm are shown as the result of computer simulation with an emphasis on quantitative comparison with the true value. Finally, one of the algorithms in the differential scheme is applied to the real image, and the result is explained.
2. Camera motions imitating tremor and projection model
In this research, we use a perspective projection system as a camera imaging model. The camera is fixed in the coordinate system. The lens center corresponding to the viewpoint is at the origin , and the optical axis is along the axis. By taking the focal length as a unit of geometric representation, we can assume image plane without loss of generality. The point on the object in 3-D space is projected on the image point .
A brief description of the camera motion model that mimics the tremor component of fixational eye movements proposed in the previous study . According to the analogy of the human eyeball, the center of camera rotation is set behind the lens center by along the optical axis, and there is no explicit translational motion of the camera. This rotation vector also expresses a rotation vector centered on the coordinate origin, which is the lens center, using the same component. On the other hand, this difference between the coordinate origin and the center of rotation results in the translation vector , which is formulated as follows:
In general, the translational motion of a camera lens is essential to recover depth, and our camera motion model can implicitly achieve that translation simply by rotating the camera. This facilitates camera control. In addition, the system can recover absolute depth by pre-calibrating . We show the coordinate system and camera motion model used in this study in Figure 1.
From Eq. (1), it can be known that causes no translations. Therefore, we can set and redefine as a rotational vector like an eyeball. Using Eq. (1) and the inverse depth , image motion called “optical flow”
In the above equations, is an unknown variable at each pixel, and and are unknown common parameters for the whole image.
In this study, we treat as a white stochastic process to simplify the motion model, and indicates time. is defined as the rotation speed with respect to the camera orientation of , not the derivative between consecutive frames. In the actual fixational eye movement, the temporal correlation of tremor that forms the drift component is ignored. We assume that each fluctuation of follows a two-dimensional (2-D) Gaussian distribution with a mean of and a variance–covariance matrix of with an identity matrix .
In this study, is defined as the rotational speed for ease of theoretical analysis. In a real system, we have no choice but to use finite rotation, but if the angle of rotation value is small, then the formulations are almost valid.
3. Differential-type method
3.1 Gradient equation for rigid motion
The general gradient equation is the first approximation of the assumption that image brightness is invariable before and after the relative 3-D motion between a camera and an object. Assuming that the brightness values before and after 3-D motion are equal, the image brightness after 3-D motion are expressed by Taylor expansion, and terms of degree 2 and above are ignored. As a result, at each pixel , the gradient equation is formulated with the partial differentials , and , where denotes time, of the image brightness and the optical flow, as follows:
3.2 Probabilistic model for differential-type method
Let be the number of pairs of two frames and be the number of pixels. In our study, and are treated as random variables, and corresponding to the inverse depth of each pixel is treated as stochastic variable and recovered individually for each pixel. However, multiple frames that vibrate due to irregular rotation are used for processing, but no pixel tracking is done. Therefore, the recovered at each pixel does not correspond exactly to the value of this pixel, but takes the mean of the adjacent regions defined by the vibration width of the image. As a result, the recovered correlates with the values in the adjacent regions. Therefore, from the beginning, should be treated as a variable with such a correlation. Based on tremor, , which is estimated to correlate with the neighborhood, is planned to be improved to
In this study, we assume that optical flow is very small, and hence, observation errors of , and , which are calculated by finite difference, are small. Additionally, equation error is also small, and therefore we can assume that error having no relation with , and is added to the whole gradient equation. From this consideration, we assume that is a Gaussian random variable with mean 0 and variance , and and have no error.
where and , and is an unknown variance.
As mentioned above, considering the neighborhood correlation of
where is a -dimensional vector composed of and indicates a matrix corresponding to the 2-D Laplacian operator with a free end condition. By assuming this probabilistic density, we make a recovered depth map smooth. In this study, the variance is controled heuristically in consideration of smoothness of a recovered depth map. In future, we are going to examine a strategy for determination of in the whole system which models all fixational movements including microsaccade and drift also. Hereafter, we use the definition . In the simulation described later, random rotation is sampled according to Eq. (4) as the rotation for the initial image, but since the rotation between successive frames is estimated during depth restoration, the determined and the set value are different.
3.3 Algorithm for differential-type method
By applying the MAP-EM algorithm , parameter can be estimated as a MAP estimator based on , which is formulated by marginalizing the joint probability with respect to , but the prior of is formally regarded as an uniform distribution. Additionally, can be estimated as a MAP estimator based on , in which means a MAP estimator described above.
In the EM scheme, is considered as a complete data, is treated as a missing data, and is treated as an unknown parameter. E step and M step are mutually repeated until they converge. At first, at the E step, calculate the conditional expectation of the log likelihood of the complete data with observing , using the current MAP estimates . It is generally called Q function. Especially for the MAP-EM algorithm, the objective function maximized at the M step is equal to the Q function augmented by the log prior densities of parameters. In the following, the values computed using are indicated as also.
Based on the densities defined at 3.2, the objective function is derived as
using the following definitions derived by formulating the posterior density .
From this representation, and can be updated as
For , the partial derivative for each in the last term of Eq. (9) contains the adjacent surrounding . Therefore, to update , we need to solve the simultaneous equations. To avoid this, use the One-Step-Late (OSL) method . That is, consider the surrounding s as a constant and evaluate it with the current estimate . This allows each to be updated individually as follows:
where is also evaluated by the current estimate, and the matrices and are defined as
and indicates the local mean using a four-neighboring system that does not include .
3.4 Numerical experiments of differential-type method
In order to confirm the function of the proposed method, we conducted numerical experiments using artificial images. Figure 2(a) shows the original image generated using the depth map shown in Figure 2(b). The image size is pixels, which is equivalent to measured in focal length units. In Figure 2(b), the vertical axis shows the depth in units of focal length, and the horizontal axis shows the pixel position on the image plane.
In our model, the successive image pairs are used in turn to calculate . This study ignores the drift component of fixation eye movements and assumes that there is no temporal divergence of the range of motion at each image position. Therefore, each rotation sampled as a Gaussian independent random variable is considered to be relative to the initial image shown in Figure 2(a). We can think of a gradient equation that holds between the resulting image and the initial image. Additionally, in order to firstly justify our algorithm for the assumed statistical models, we computed using Eq. (6) with true values of and and use them for depth recovery. The performance evaluation for , , and actually measured from the image will be explained as the result of the real image experiment in Section 5.
We executed the proposed algorithm using . Under this condition, the average size of the optical flow is about one pixel, which is sufficiently small compared to the size of the intensity pattern in the Figure 2(a), which meets the conditions of the gradient method. A Gaussian random value with a zero mean and with a deviation corresponding to of the mean of was added to the which completely satisfies Eq. (6) as observation noise. We evaluated the effectiveness of the smoothness constraint introduced by Eq. (8) by performing a depth recovery by varying the value of . The initial values of both and were as arbitrary values, and was assumed initially as a plane of . Examples of the results with are shown in Figure 3. In addition, we varied for each to see the effectiveness of using many observations caused by small camera rotations together. The RMSE of the recovered depth for the values of and is organized in the Table 1. From these results, we can see that the smoothness constraint is important to reduce the degrees of freedom of
4. Integral-type method
4.1 Image motion blurring related to depth
This equation can be seen as a function of the inverse depth , so if we know the variance–covariance matrix at each pixel position, we can calculate the depth map.
There are some schemes to obtain the variance–covariance matrix of optical flow defined by Eq. (19) locally at each image position from multiple images observed through random camera rotations imitating tremor. The simplest and most natural way is to first detect the optical flow from the image and then calculate its quadratic statistic. However, here we are considering a case where the intensity pattern is fine with respect to the temporal sampling rate and it is difficult to accurately detect the optical flow. Therefore, we adopt an integral-formed scheme in which the variance–covariance matrix is calculated as the distribution of local image blur.
We define an averaged image brightness as an arithmetic average of observed images with fixational eye movements. If is asymptotically large, the following convolution holds using locally defined a two-dimensional Gaussian point spread functions and an original image brightness .
where indicates the image position, is a local region around , and has a vairance-covaiance matrix indicated by Eq. (19). Furthermore, it is assumed that is satisfied.
As explained above, we model as a blurred image due to fixation eye movements. The difference in the degree of blur of indicates the difference in depth. In other words, the closer the imaging target is to the camera, the greater this image blur.
4.2 Direct algorithms for integral-type method
In the two-step algorithm, after detecting the variance–covariance matrix of the image blur shown in Eq. (19), the maximum likelihood estimation of using that as the observable is analytically possible . However, the solution is not optimal because it observes indirect quantities. With optimality in mind, we need to use the direct method of directly estimating the depth map without determining . This strategy usually requires a numeric search or iterative update . We constructed two algorithms as a direct method, each adopting local optimization and global optimization respectively.
We assume that a depth value in a local region around each is constant. Based on the minimum least square criterion, the objective function can be defined with respect to the depth corresponding to each pixel.
We can recover the depth individually for each pixel by minimizing this function with respect to the depth corresponding to each pixel. Therefore, simultaneous multivariate optimization is not required and one-dimensional numerical search can be adopted.
By assuming a spatially smooth depth map in the solution, we can define the following objective function based on regularization theory of Poggio et al. .
where is a parameter for adjusting the degree of constraint on the smoothness of the depth map, and the integration of Eq. (23) is performed for the entire image. From the variational principle, the Euler–Lagrange equation for is derived using as follows:
For discrete computation, we can approximate the smoothness constraint in Eq. (23) using as a description of an image position.
4.2.1 Numerical experiments of integral-type method
The proposed algorithm assumes that the definition of the motion blur image in Eq. (20) holds. To observe such ideal motion blur, it takes enough exposure time for imaging. Here, we use artificial images to examine the characteristics of the proposed algorithm with respect to the relationship between the size of image motion and the fineness of intensity texture.
We artificially generate motion blur images. First, a true depth map is set up, and a large number of images are generated by a computer graphics technique that randomly samples according to the Gaussian distribution in Eq. (4). By averaging these images, we can artificially generate motion blur images. Input motion blur images averaged images to mimic analog motion blur, Figure 4 shows an example of a reference (original) image and a true inverse depth map. The image size is pixels, which is equivalent to measured in focal length units.
Local optimization algorithms (LOA) are computationally expensive, and global optimization algorithms (GOA) are slow to converge. Therefore, we considered a hybrid algorithm that uses LOA sparsely to obtain the initial value of GOA. For the initial value of LOA, use the plane corresponding to the background of Figure 4(b). To make a rough estimate, LOA uses blocks of pixels as in Eq. (21) and applies LOA once to each block. On the other hand, the size of in Eq. (22) was adaptively determined according to the depth value updated by the optimization process. Therefore, took a different size at each position in the image. We assumed a square area for . The length of that side was 10 times the larger of the two deviations of along axis and axis. These can be evaluated using Eq. (19).
The depth restoration simulation was executed while changing the camera rotation size . The recovered inverse depth map is shown in Figures 5–7. GOA uses various values of . The relationship between the root mean square error (RMSE) of the recovered depth map and the value of is also shown in Figure 8. From Figure 5, we can see that it is not suitable for depth recovery because it is difficult to accurately measure the motion blur of the image position with a small rotation of the camera. From Figure 8(a), small rotations do not provide sufficient measurement information, so to reduce the RMSE of the recovered depth map, the smoothness constraint indicated by is strongly required. On the contrary, when the rotation is large, the point image distribution function becomes wider than the spatial change of the target shape. Therefore, the Gaussian function with the variance–covariance matrix in Eq. (19) is inappropriate, the motion blur recognized by this model is smoother than the actual blur of the image, and a depth recovery error occurs. This can be confirmed from the RMSE value in Figure 8(c). Since the smooth depth tends to be recovered from the recognized smooth motion blur from Figure 8(c), it can be confirmed that the smoothness constraint of Eq. (23) is an obstacle to the reduction of RMSE.
5. Real image experiments of differential-type method
5.1 Selective use of image pairs to improve accuracy
When applying the difference-type method to an actual image and checking the actual performance, the performance was improved by selecting the image pair used for depth restoration. We have adopted a scheme that excludes image pairs that are expected to have large approximation errors in the gradient equation on a pixel-by-pixel basis. We can use the inner product of the spatial gradient vectors of consecutive image pairs to select image pairs that do not cause aliasing problems. For each pixel, the image pairs of which the sign of the inner product is negative are discarded. It is noted that .
In the next step, from the image pairs remained by the above decision, we additively select the suitable image pairs at each pixel by estimating the amount of the higher order terms included in the observation of . is exactly represented as follows:
After discarding a bad image pair, the higher-order terms can be considered small. In this case, the quadratic term in Eq. (28) can be estimated for each pixel as follows:
We can define a measure for estimating the equation error as the ratio of this higher order term to the first order term.
This measurement depends on the direction of the optical flow but is invariant with respect to the amplitude of the optical flow. To calculate the value of , we need to know the true value of the optical flow. By examining the details of , even if the difference of the spatial gradient is large, when the direction of is perpendicular to that of optical flow, the equation error becomes small. Therefore, the value can be used as the worst value. In the following, the image pairs for which is less than the certain threshold value are selected at each pixel to be used for depth recovery.
5.2 Camera system implementation
The camera system can be rotated around the horizontal axis i.e. axis and around the vertical axis, i.e. axis. The rotation around the optical direction, i.e. direction, cannot be performed, which is not needed to gain the depth information. The parameters of the system are shown as follows: focal length is mm, image size is pix., movable widths are deg. for axis and deg. for axis, and drivable minimum units are pulse = deg. for -axis and pulse = deg. for -axis.
5.3 Experimental results
In this section, we explain the results of the experiments using the real images captured by the developed camera system . Our camera system has a parallel stereo function. That is, the camera can be moved laterally by the slide system. Prior to the experiment, we calibrated the camera’s internal parameters, including focal length and , using the method in  and stereo calculations. The image used in the experiment is grayscale, consists of pixels, and is 8-bit digitized. An example is shown in Figure 10(a). The true inverse depth of the target object is shown in Figure 10(b). It was measured in parallel stereo above using a two-plane model. In this figure, the horizontal axis shows the position in the image plane, and the vertical axis shows the inverse depth in units of focal length. We captured 100 images. The maximum number of iterations of the MAP-EM algorithm was set to 600. Within this number of iterations, the iterations of almost all experiments converged. is heuristically determined. The average value of explained in the previous section with respect to all pixels was defined for each image pair as a standard magnification () of the threshold for selecting the suitable image pairs. Namely, by decreasing the threshold magnification, we can discard more image pairs. Conversely, by increasing the magnification, many image pairs can be used for recovery. Because of the limit of pages, we only show the results with by which the average of the optical flow’s amplitude approximately coincides with .
Figure 11 shows the result of the recovered depth for each threshold set as a constant multiple of the reference value. We also looked at the results using all image pairs. From these results, it can be confirmed that by reducing the magnification, inappropriate image pairs can be discarded and the accuracy of depth recovery is improved. The percentage shown in the caption of the figure shows the number of image pairs used for recovery, which is determined in conjunction with the change in threshold.
In this chapter, we introduced a depth recovery algorithms that uses large number of images with small movements by using camera motion that simulates fixational eye movements, especially the tremor component. The algorithms can be divided into a differential-type and an integral-type. For the differential-type, it is desirable that the movement on the image is relatively small with respect to the texture pattern of the surface to be imaged, and conversely, for the integral-type, it is appropriate to apply it to a fine texture compared to the movement on the image. Therefore, ideally, the development of a depth recovery system in which both schemes function adaptively and selectively according to the target texture is the most important task in the future.
A detailed technical issue is to automatically determine the parameters that control the smoothness of the depth. This can be achieved by considering all unknowns as stochastic variables and formulating them in the variational Bayesian framework. As for the integration method, since the resolution of the recovered depth is low in principle, it is possible to consider a composite type in which the differential-type is applied again and refinement is performed on the result obtained by the integral-type.
So far, we have considered a method that assumes only tremor, but in the future, we are planning to study camera motion that also simulates drift and microsaccade. In the method for drift component, it is necessary to extend the method based on tremor to the online version, and then update the depth estimate while advancing the tracking of the target as time series processing. When using microsaccades, it is necessary to handle large movements between frames. Therefore, based on the correspondence of feature points, sparse but highly accurate depth restoration can be expected. Drift itself does not have much merit in its use, but it plays an important role in generating microsaccades. As described above, we believe that an interesting system can be realized by comprehensively using the three components.
On the other hand, stereoscopic vision and motion stereoscopic vision are difficult to handle objects with few textures. In , we proposed a stereo system that considers shading information. The projected images to both cameras are calculated by computer graphics technique while changing the depth estimation value. The depth is determined so that the generated image matches the image observed by each camera. As a result, the association between images is indirectly realized. By introducing this method, it becomes possible to handle textureless objects. We aim to develop a comprehensive depth restoration method, including the multi-resolution processing proposed in . In another scheme that deals with the textureless region in stereo vision, the region where the depth value is constant or changes smoothly, called the support region, is adaptively determined . We will also consider whether the relationship between image changes due to tremor and microsaccade can be used for adaptive determination of this support region.
In recent years, many realizations of stereoscopic vision and motion stereoscopic vision by deep learning have been reported [26, 27, 28]. And the relationship with the conventional method based on mathematical formulas is often questioned. The deep learning method is hampered by the addition of a large number of images and annotations to them. Although unsupervised learning is often devised, the solution is often limited. Therefore, even if the conventional method is rather complicated and takes time, if a method capable of more precise depth recovery is constructed, it can be used for annotation calculation of deep learning. This can be understood as copying the conventional method to deep neural network (DNN). DNN takes time to learn, but has the advantage of being able to infer at high speed. In this way, it is important that both schemes develop in a two-sided relationship.
Here, the method of calibrating the axis of rotation is explained using Figure 12. Let a point in 3-D space be in the coordinate system before camera rotation and in the coordinate system after rotation, and the coordinates of the corresponding points on the image be and , respectively. Similarly, the optical axes before and after rotation are and , respectively. If the rotation is taken around the X-axis, the rotation matrix is given by the following equation.
The translation of the lens center generated by this rotation is given by the following equation in the coordinate system before rotation.
where represents the optical axis after rotation in the coordinate system before rotation. In addition, and have the following relationship.
Furthermore, by substituting the relation of , into Eq. (33), the following equation is obtained.
By expressing this equation in terms of components and organizing it, the following two equations are derived.
Martinez-Conde S, Macknik S L, and Hubel D. The role of fixational eye movements in visual perception. Nature Reviews. 2004;5:229–240
Stemmler M. A single spike suffices: the simplest form of stochastic resonance in model neuron. Network: Computations in Neural Systems. 1996;61:687–716
Propokopowicz P and Cooper P. The dynamic retina. Int. J. Computer Vision. 1995;16:191–204
Hongler M-O, de Meneses Y L, Beyeler A, and Jacot J. The resonant retina: Exploiting vibration noise to optimally detect edges in an image. IEEE Trans. Pattern Anal. Machine Intell. 2003;25:1051–1062
Ando S, Ono N, and Kimachi A. Involuntary eye-movement vision based on three-phase correlation image sensor. In: Proceedings on19th Sensor Symposium; 2002. p. 83–86
Lazaros N, Sirakoulis G-C, and Gasteratos A. Review of stereo vision algorithm: from software to hardware. Int. J. Optomechatronics. 2008;5:435–462
Wang J and Zickler T. Local detection of stereo occlusion boundaries. In: Proceedings on CVPR; 2019. p. 3818–3827
Liu F, Zhou S, Wang Y, Hou G, Sun Z, and Tan T. Binocular light-field: imaging theory and occlusion-robust depth perception application. IEEE Trans. Image Process. 2019;29: 1628–1640
Horn B K P and Schunk B. Determining optical flow. Artif. Intell. 1981;17:185–203
Simoncelli E P. Bayesian multi-scale differential optical flow. In: Jahne B, Haubecker H, Geibier P, editors. Handbook of Computer Vision and Applications.Academic Press; 1999. vol. 2. p. 397–422
Bruhn A and Weickert J. Locas/kanade meets horn/schunk: combining local and global optic flow methods. Int. J. Computer Vision. 2005;61:211–231
Tagawa N, Kawaguchi J, Naganuma S, and Okubo K. Direct 3-d shape recovery from image sequence based on multi-scale bayesian network. In: Proceedings on ICPR; 2008. p. CD–ROM
Tagawa N. Depth Perception model based on fixational eye movements using Bayesian statistical inference. In: Proceedings on ICPR; 2010. p. 1662–1665
Tagawa N and Alexandrova T. Computational model of depth perception based on fixational eye movements. In: Proceedings on VISAPP; 2010, p. 328–333
Tagawa N, Iida Y, and Okubo K. Depth perception model exploiting blurring caused by random small camera motions. In: Proceedings on VISAPP; 2012, p. 329–334
Tagawa N, Koizumi S, and Okubo K. Direct depth recovery from motion blur caused by random camera rotations imitating fixational eye movements. In: Proceedings on VISAPP; 2013, p. 177–186
Sorel M and Flusser J. Space-variant restoration of images degraded by camera motion blur. IEEE Trans. Image Processing. 2008;17:105–116
Paramanand C and Rajagopalan A N. Depth from motion and optical blur with unscented Kalman filter. IEEE Trans. Image Processing. 2012;21:2798–2811
Dempster A-P, Laird N-M, and Rubin D-B. Maximum likelihood from incomplete data. J. Roy. Statist. Soc. B. 1977;39:1–38
Green P.-J. On use of the Em algorithm for penalized likelihood estimation. J. Roy. Statist. Soc. B. 1990;52:443–452
Poggio T, Torre V, and Koch C. Computational vision and regularization theory. Nature. 1985;317:314–319
Tsukada S, Ho Y, Tagawa N, and Okubo K. Accuracy improvement for depth from small irregular camera motions and its performance evaluation. In: Proceedings on ICIAR; 2015, p. 306–315
Zhang Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Machine Intell. 2000;22:1330–1334
Wakabayashi K, Tagaw N, and Okubo K. Shape from multi-view images based on image generation consistency. In: Proceedings on VISAPP; 2013, p. 334–340
Wu W, Zhu H, Yu S, and Shi J. Stereo matching with fusing adaptive support weights. IEEE Access. 2019;7:61960–61974
Laina I, Rupprecht C, Belagiannis V, Tombari F, and Navab N. Deeper depth prediction with fully convolutional residual networks. In: Proceedings on 3D Vision;2016, p. 239–248
Ummenhofer B, Zhau H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, and Brox T. DeMoN: Depth and motion network for learning monocular stereo. In: Proceedings on CVPR;2017, p. 5038–5047
Chen P-Y, Liu A H, Liu Y-C, Wang Y-C F. Towards scene understanding: unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings on CVPR;2019, p. 2624–2632