In this chapter, a method for the generation of depth map was presented. To generate the depth map from an image, the proposed approach involves application of a sequence of blurring and deblurring operations on a point to determine the depth of the point. The proposed method makes no assumptions with regard to the properties of the scene in resolving depth ambiguity in complex images. Since applications involving depth map manipulation can be achieved by obtaining all-in-focus images through a deblurring operation and then blurring the obtained images, we have presented methods to derive all-in-focus images from our depth maps. Furthermore, 2D to 3D conversion can also be achieved from the estimated depth map. Some demonstrations show the performance and applications of the estimated depth map in this chapter.
- depth estimation
- blur estimation
- depth from defocus
- all in focus
- defocus magnification
- 2D to 3D
Derivation of depth information from 2D images is one of the most important issues in the field of image processing and computer vision. The depth information can be applied in 2D to 3D conversion, image refocusing, scene interpretation, the reconstruction of 3D scenes, and depth-based image editing. There are some techniques used to derive depth information, such as depth from focus , stereo vision , and depth from motion . Nevertheless, these techniques are complicated by the need to acquire multiple images, thereby making them impractical when only one image is available or the features corresponding between the images cannot be resolved well. To this end, a number of approaches have been proposed to acquire depth information from a single image, such as the computational photography approach , which modifies the shape of the aperture of a traditional lens, and the Kinect approach , which uses a structured light to derive depth maps.
An image captured by a conventional camera contains a blurred version of a scene that is out of focus. The blurriness of a pixel is called the “circle of confusion” (COC) and is usually modeled as a 2D Gaussian function. When a single image is taken by a conventional camera with a fixed focal length, aperture size, and distance from the image plane to the lens, a pixel’s COC is related only to the depth of the corresponding scene point. In such cases, depth estimation corresponds to blur estimation. Theoretically, if the depth map of an image can be accurately estimated, applications that manipulate depths of objects can be run by first applying deblurring followed by blurring operations. This is because the deblurring operation will move the objects closer and the blurring operation will move them farther away from the camera.
Blurring operation is more robust to depth map inaccuracy than the deblurring one, and many applications have been successfully built based on this operation. For example, defocus magnification  increases the out-of-focus area in an image by magnifying the existing blurriness to keep the shape of sharp regions and by modifying the depths of objects that are not in the focal plane to move the objects farther away from the plane. Deblurring operation, on the other hand, can be very sensitive to the accuracy of a depth map. A deblurring operation usually highlights the edged and textured points in an image. If their depths are overestimated, the operation can generate ringing artifacts that severely degrade the perceptual quality of an image.
The depth map estimation from an image is a fundamentally ill-posed problem. For example, in a single image, we cannot resolve the ambiguity between out-of-focus edges and the original smooth edges, we cannot determine whether blurriness of a point is in the front of focal plane or behind the focal plane, and we cannot estimate the depths of points in a smooth area. These problems cannot be resolved without the assumptions between the local image features and the scene. In this context, a widely adopted assumption is that a blurred edge is obtained via smoothing a step edge with a Gaussian kernel . Although this assumption has been adopted in an autofocus system of a camera, the goal of the autofocus is to derive the depths of manually selected scene points rather than the depth map of an image. The approach that is based on the assumption on scene points has also been used in estimating a depth map. Edges in a scene are first modeled, and the depths of the blurred edge points are then derived from the degree of blurriness that has been applied on the scene to obtain the points. However, because many types of singularities far beyond the step edges in the scene can appear in an image, the approach based on the scene modeling can be too restricted, as only a few types of singularities can be modeled, to derive precise depths of all points. As a result, the depth precision derived based on scene modeling is usually limited to images with two depth layers, foreground and background.
In this chapter, we propose a blurring-deblurring method that does not require the modeling of edge points. In the blurring process, a point is blurred by increasing its COC to the limit of a camera. However, in the deblurring process, a point is deblurred by reducing its COC to the limit in the other end. Combine the results of these two processes and derive the depths of edges. Therefore, the approach estimates the depths of edge points based on the characteristic curve of COC vs. the depth characteristic curve of a camera. We demonstrated that the proposed approaches can reliably derive depth maps of complex images and synthesize all-in-focus images. Furthermore, the depth maps can also be applied to synthesize the stereo image to 3D visualization through the mobile device. Figure 1 shows a diagram of applications from the depth map of a single image.
The remainder of this chapter is organized as the following. The relationship between the depth of a point and out-of-focus blurriness in images obtained by the thin-lens camera model is reviewed in Section 2. The proposed blurring-deblurring approach is presented in Section 3. The depth refinement approach and image deblurring process are presented in Section 4. In Section 5, we demonstrated the depth map results and applications. Section 6 contains some conclusions.
2. Camera model and out-of-focus blurriness
The out-of-focus blurriness is defined by the COC if an object is not in the camera’s focal plane. However, it is impossible to determine whether an object is behind or in front of the focal plane based on the blurriness of an object . In the following, we consider a case of the condition.
In a thin-lens model,
For a particular lens, the focal length
In the case where , we can derive the following relationship from the similar triangles shown in Figure 2(a):
Using Eq. (1) and we obtain
In the case where , we can derive the following relationship from similar triangles shown in Figure 2(b):
Using Eq. (1) and we obtain
From Eqs. (4) and (6), we can derive of a scene point; however, the equations do not allow us to determine whether the scene point is in front of or behind the focal plane. To remove the ambiguity, the assumption that all the scene points are behind the focal plane is adopted.
An image is usually modeled as the convolution of the scene and a camera-relative PSF. The Pillbox function is an ideal PSF, which is a box function with support σ and constant value . Usually, the Gaussian function is the approximation of the Pillbox function, and the standard derivation of the Gaussian function is Due to the factor, the difference between their frequency domain magnitudes is small, and the latter is easier to do analysis. In this chapter, we will use the Gaussian function to characterize the PSF of a camera.
3. Blurring and deblurring approach
Using the proposed approach, scene depths will be determined from the estimated blurriness in a single image by using a combination of blurring and deblurring processes. In this section, we will explain the rationale for combining the blurring and deblurring processes and provide the formulation of the combined approach. The depth of a scene point is defined as the distance between the camera lens and the point. In addition, the proposed method assumes that all the interested scene points are behind the focal plane (it matches to the case: ) as .
The ( vs.
Let be the blurriness of a point. A blurring operator can be defined to add an increment of blurriness to the point to obtain a new blurriness , with . This can be regarded as increasing the depth of the point by moving it along the ( vs.
If the increment in the blurriness of a point to reach can be determined, we can convert this increment into the increment in depth by referring to the curve ( vs.
On the other hand, the deblurring operator is defined to reduce the blurriness of the point to obtain . If a deblurring operation is repeatedly applied to a point, the latter will become sharper. The deblurring process gradually reduces the depth of the point by moving it along the ( vs.
Since the depth estimation at large and the estimation of a point close to focal plane are unreliable, we were motivated to propose the blurring and deblurring approach that combines the differential blurring and deblurring operations to yield a more robust depth estimation of a scene point than only using one of them.
Let be the true depth of the point at
with the constraint that
where is a camera-dependent constant and is the Lagrangian parameter that balances the blurriness and deblurriness measurements, and
is the increment of blurriness to and
denotes the decrement of the assuming the point at depth
3.3 Blurring and deblurring measurements
For a simplified analysis but without any loss of generality, the following derivations were based on one-dimensional signals and neglecting the boundary conditions.
3.3.1. Blurring measurement
The objective of the blurring process is to determine the amount of blurriness required for a point to reach . When edged or textured patches are gradually placed at far distance, the details of the patches become faint, their variances decrease, and only their mean brightness can be derived at infinity. Thus, the variance of a patch can be used as the blurriness measurement. Specifically, when a patch is blurred to reach , its variance can be assumed to be 0.
Let the true depth of the scene point
where g is the Gaussian function and is the variance of
where is the mean on a neighborhood of and . Assuming that , we obtain from Eqs. (11) and (12)
The above equation is derived by using the fact that the convolution of two Gaussians of variances and is a Gaussian of variance . If u is equal to , then and it becomes
since can be approximated as the mean of . Thus, reaches a local minimum when u is equal to .
3.3.2 Deblurring measurement using blurring-deblurring operator
In contrast to blurring, deblurring is extremely unstable, and it usually assumes some prior knowledge of the scene so that the high-frequency (edge and texture) information can be recovered. Because of the prior assumption, when the estimated depth is overestimated, ringing artifact occurs in the result of the deblurring process.
We denote as the blurring-deblurring operator, where is the reduction in blurriness of a Gaussian kernel with variance . An image is first deblurred and then blurred by the same Gaussian kernel with variance . A deblurring process will tend to over-enhance the high-frequency information in the image if
As shown in Figure 4(d), when the estimation of the blurring scale is over the true scale (the scale is 4), the result of increases dramatically because of the artifacts in the neighborhood of the edge points.
From Figure 4(d), is asymmetric with respect to over- and underestimation of , where is the true blurring scale or true depth. To capture the transition point from small to large values of , we calculated the curvature at of the smooth curve as follows:
as shown in Figure 4(e). The larger the value of the curvature, higher is the probability that it is the pivot point for the transition. Thus, we define the deblurring measurement as follows:
The measure has a local minimum at the transition of . Thus, when
3.3.3 Depth estimation
The blurring and deblurring measurements, and , defined in Eqs. (12) and (17), respectively, can be substituted in the objective function in Eq. (7) to obtain
The blurring and deblurring approach can now be used to derive the solution for
based on the constraint in Eq. (8). The complexity of the problem relies on how precise the depth is measured for each point. Although depth is an important cue, it seems that the relative depths, such as which object is in foreground and which is in background, are more important than the accurate depths. The blurring and deblurring approach is a point-wise optimization method. We used a method by deriving the best solution from the
4. Depth refinement and image deblurring
The blurring and deblurring measurements are only able to determine the depths of edge and texture points. However, if blurring and deblurring with Gaussian kernel of any variance are applied to a sufficiently large patch of constant value, it will yield a patch of constant value. Therefore, the proposed approach cannot reliably determine the depths of points in smooth regions. Thus, we resort to another approach to derive the depths of smooth scene points.
On the other hand, all in focus is to generate an image that is focused everywhere or to transfer the depth map of an image to a depth map in focal plane. Therefore, an all-in-focus image can be generated by the deblurring process. Practically, since a deblurring process is very sensitive to overestimated depth, if there are overestimated depths in images, the deblurring process can hamper the all-in-focus result and render a visually unacceptable image.
This problem cannot be trivially solved by subtracting a depth from all the points because the value to subtract is not easy to determine. This value should be large enough to stabilize the deblurring process and at the same time small enough so that the depths are not underestimated too much, rendering a blurred all-in-focus image. We used two methods, viz., depth quantization and TV deblurring process, to rectify the effects caused from the depth estimation error.
4.1 Depths of smooth scene points
We followed  to estimate the depth at edges and texture followed by propagation of the results to other points. In our method, we use Canny edge detector  to decide whether a point is an edged or textured point. The depths of these points (called Canny points) were then estimated from the blurring and deblurring approach. For convenience, we called the remaining points as the smooth points.
The propagation algorithm to derive the depths of smooth points was based on the solution of the Dirichlet problem , which addresses the temperature distribution from the boundary to the interior of a medium. The solution of the Dirichlet problem is based on two principles: the maximum principle and the uniqueness principle. The maximum principle states that the interior temperature lies between the maximum boundary temperature and the minimum boundary temperature, and the uniqueness principle states that the solution of the problem is unique. In our approach, we regarded the temperature as the depth and defined the boundary points as the union of the smooth points at the border and the non-smooth points of an image. The depth was first assigned to the smooth points at the border of the image. Then, we used the solution of the Dirichlet problem to derive the depth of the smooth points inside the image. By this approach, the depths of the smooth points were never larger than those of the enclosing points.
The steps of the depth propagation procedure are as follows. First, we normalized the depths of the Canny points by setting the depth of the point farthest from the camera as 1. Then, we assigned depths to the smooth points on the borders of the image. Because the top border of an image is usually the background, the depth of the smooth points on that border was assigned the value 1. For a smooth point on the left-hand, right-hand, and bottom borders, we assigned the depth of the closest non-smooth point. Figure 5 demonstrates an example how the depths are propagated to an image.
4.2 Depth quantization
For the quantization process, a depth can be approximated by a layer. The motivation of the idea was from the scalar quantization in compression. Via the quantization process, a coefficient can be approximated. In the process, the layer
where is subject to for each in layer 1, and , for each in layer 2. By recursively subdividing a depth layer to acquire two more depth layers, the procedure can be applied to acquire
By sampling a few depths as candidate anchor depths firstly, the anchor depth in Eq. (20) can be derived. Next, we set each candidate as to calculate the average error from Eq. (20). With the help of the histogram of the depths, this process can be efficiently achieved. So, the anchor depth is the depth in the candidate depths that yields the smallest average error. Figure 6 shows the estimated depth map and the depth quantization results on an image, composed of four layers of depths. After quantization, some outliers in Figure 6(b) are removed.
4.3 Deblurring process
5. Experimental results
We will demonstrate the depth estimation results and some applications related to the estimated depth map including all in focus, refocusing, defocus magnification, and 2D to 3D conversion. We use synthetic images, real images, and video frames to show the performance of the method.
5.1 Synthetic image
First, we evaluated the depth estimations on synthetic images. The scene images with the ground truth depth map are from , but the images and depth maps are not aligned. The proposed method is allowed to estimate the blurriness from a blur image; however, the given scene image is near to an all-in-focus image. Therefore, we align the depth map and the original image by nearest neighbor scaling method and then apply Eq. (4) to generate a blur map according to the given depth map with some fixed camera parameters. The transformed Gaussian blur kernels are controlled in the range from 0 to 8. A blur image is synthesized when the original scene image is convolved with the corresponding blur map. Two sets of the original scene images, depth maps of the ground truth, synthetic blurred images, and estimated depth maps are shown in Figure 7 (a), (b), (c), and (d), respectively. Perceptually, the estimated depth maps correspond to the ground truth depth maps with outlines.
5.2 Real image
Second, we set real images to the depth estimation method. Figure 8 shows the results on a two-layer image, and the input image (a) is from . The estimated depth map, synthesized all-in-focus image, quantized depth map, refocus image, and defocus magnification image are shown in Figure 8(b)–(f), respectively. With the quantized depth map, the applications of refocusing and defocus magnification can be manipulated from the all-in-focus image. Therefore, if obtaining a high quality of the depth map and all-in-focus image, the performance of these applications will be great.
Figures 9 and 10 present the performance of the all-in-focus image. Figure 9(a) is a three-layer poker card image, and the camera was focused on the third row of cards. The blurriness increased from the bottom part of the image to the top. So, the first and second rows of cards are out of focus. Figure 9(b) shows the synthesized all-in-focus image, and (c) and (d) show the corresponding magnified regions from (a) and (b), respectively. Figure 10(a) was captured from a ramp of brick wall. As the camera was focused on the leftmost part of the image, the blurriness of the brick wall progressively increased from the left part of the image to the right. Figure 10(b) shows the synthesized all-in-focus image, and (c) and (d) show the magnified regions. From these two sets of image, the results show a significant improvement when comparing to the original images.
5.3 Video frame
Third, we apply the method to the video frames. In this subsection, we show the performance of 2D to 3D conversion. The video frames in Figure 11(a) are used as the input left-eye images, which are from YouTube; (b) are the synthesized right-eye images, which are from left-eye images and corresponding depth maps; and (c) are the synthesized anaglyphs, which are from (a) and (b). With the anaglyph glasses, the 3D effect will be visualization. Combine the mobile device with the stereo image and VR device, such as Google Cardboard; the 3D effect also will be visualized.
In this chapter, a single-image depth estimation method was presented. The depth map was derived based on the characteristic curve of COC vs. depth of a camera. Applications to manipulate depth maps can be achieved by first deblurring an image to all in focus and then blurring the all-in-focus image. Thus, generation of an all-in-focus image from a depth map successfully is significant for the depth estimation method. Furthermore, the quality for 2D to 3D conversion also depends on the performance of depth estimation. The proposed depth estimation method makes it possible to produce high-quality all-in-focus images and 2D to 3D conversion, even from originals with a complex depth map layout.
Grossmann P. Depth from focus. Pattern Recognition Letters. 1987; 5(1):63-69. DOI: 10.1016/0167-8655(87)90026-2
Bulthoff HH, Mallot HA. Integration of depth modules: Stereo and shading. Journal of the Optical Society of America. A. 1998; 5(10):1749-1758. DOI: 10.1364/JOSAA.5.001749
Ullman S. The interpretation of structure from motion. Proceedings of the Royal Society of London B: Biological Sciences. 1979; 203(1153):405-426. DOI: 10.1098/rspb.1979.0006
Levin A, Fergus R, Durand F, Freeman WT. Image and depth from a conventional camera with a coded aperture. ACM Transactions on Graphics. 2007; 26(3). DOI: 10.1145/1276377.1276464
Khoshelham K, Elberink SO. Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors. 2012; 12(2):1437-1454. DOI: 10.3390/s120201437
Bae S, Durand F. Defocus magnification. Computer Graphics Forum. 2007; 26(3):571-579. DOI: 10.1111/j.1467-8659.2007.01080.x
Zhuo S, Sim T. Defocus map estimation from a single image. Pattern Recognition. 2011; 44(9):1852-1858. DOI: 10.1016/j.patcog.2011.03.009
Tung SS, Shao HC, Hwang WL. Extending depth of field in noisy light field photography. In: Proceedings of the IEEE International Conference on Awareness Science and Technology (iCAST '17); 8-10 November 2017; Taiwan. pp. 127-131. DOI: 10.1109/ICAwST.2017.8256430
Tung SS, Hwang WL. Multiple depth layers and all-in-focus image generations by blurring and deblurring operations. Pattern Recognition. 2017; 69:184-198. DOI: 10.1016/j.patcog.2017.03.035
Canny J. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986; 8(6):679-698. DOI: 10.1109/TPAMI.1986.4767851
Doyle PG, Snell JL. Random Walks and Electric Networks. USA: The Mathematical Association of America; 1984. DOI: 10.5948/UPO9781614440222
Wang Y, Yang J, Yin W, Zhang Y. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences. 2008; 1(3):248-272. DOI: 10.1137/080724265
Saxena A, Chung SH, Ng AY. 3d depth reconstruction from a single still image. International Journal of Computer Vision. 2008; 76(1):53-69. DOI: 10.1007/s11263-007-0071-y
Zhang W, Cham WK. Single-image refocusing and defocusing. IEEE Transactions on Image Processing. 2012; 21(2):873-882. DOI: 10.1109/TIP.2011.2162739