Towards a Biologically Plausible Stereo Approach

A computational model for stereoscopic disparity estimation has been recently introduced which reformulates, in neurophysiologically plausible terms, the traditional image matching approach (Torreao, 2007). The left and right stereo images are assumed viewed through the receptive fields of cortical simple cells, modeled as Gabor functions (Marcelja, 1980). The Green’s functions of a matching equation (whose uniform solutions are also Gabor functions) are used to filter the receptive-field modulated inputs, introducing different relative shifts between them. A measure of the local degree of matching of such shifted inputs can then be used for the estimation of stereoscopic disparities, in a way which is reminiscent of the energy model for the responses of cortical complex cells (Adelson & Bergen, 1985; Ohzawa et al., 1997). Although based on well-established neurophysiological concepts, the Green’s function approach is still far from being biologically plausible. For instance, it assumes as cortical inputs the original irradiance images, disregarding the transformations performed by the earlier stages of the visual pathway− namely, the retina and the lateral geniculate nucleus. As a further step towards a fully biological stereo, we here introduce an improved version of the algorithm in (Torreao, 2007), incorporating a new concept of signal coding by input-dependent receptive fields (Torreao et al., 2009). The classical receptive field description, which assumes a fixed, stimulus-independent spatial organization, has recently been challenged by neurophysiological indications that the receptive-field structure does change with neuronal input (Allman et al., 1985; Bair, 2005; David et al., 2004). In (Torreao et al., 2009), a signal coding scheme has been introduced, where the parameters of the coding functions are obtained from the Fourier transform of the coded signal, and, in (Torreao & Victer, 2010), such scheme has been taken up as a model for stimulus-dependent center-surround (CS) receptive fields, such as those found in the retina and in the lateral geniculate nucleus. Assuming that the role of the CS structures is that of decorrelating natural images, as suggested in (Attick & Redlich, 1992) and (Dan et al., 1996), center-surround receptive fields which code whitened versions of the input signals have been obtained. We show that, by incorporating a similar center-surround coding module into the Green’s function stereo algorithm, we are able to obtain better quality disparity estimates, through an approach which is closer to the neurophysiological situation. Towards a Biologically Plausible Stereo Approach

where U ≡ U(x, y) denotes the disparity field, and where I l and I r stand for the left and the right input images. Typically, an equation such as (1) is employed for the estimation of U(x, y), given the stereo pair, but in (Torreão, 2007) it was taken as a constraint over the matching images, such that, given I r , its perfect match is to be found, for simple forms of the disparity field.

Uniform disparity field
For instance, assuming uniform disparity, U(x)=u, where u is a constant, we can take a second-order Taylor series expansion of Eq. (1), to obtain where the primes denote differentiation with respect to x (for simplicity, we henceforth omit the dependences on y). Eq.
(2) can then be solved via the Green's function approach , to yield where is the Green's function of Eq. (2), which amounts to the solution to that equation when its right-hand side is the impulse function, δ(x − x 0 ) (we are here assuming an unbounded image domain: x ∈ [−∞, ∞]). From Eq.
(3) we then find that, to second order in u, convolving the right image, I r (x), with the Green's function G u (x) effects a shift of that image, to the right, by the fixed amount u. Similarly, the convolution with G u (−x) (keeping u as a positive parameter) would effect a similar shift to the left. More generally, we can consider a complex Green's kernel, by introducing the quadrature pair to G u (x − x 0 ), given as which yields a homogeneous solution to Eq.
(2) − that is to say, a solution to that equation when its right-hand side is identically zero. Thus, being the complex Green's kernel, we would obtain, for the rightwards shifted version of I r (x), up to second order in u.

Linear disparity field
The disparity estimation approach of (Torreão, 2007) is based on a similar Green's kernel as that of Eq. (6), but for a differential equation which approximates a linear matching constraint, that is to say, one for which the disparity field takes the form U(x)=u + vx, for u and v constants. More specifically, a rightwards shift is there performed by the complex kernel where σ and a are positive constants, and k = a/σ 2 . The kernel K(x, x 0 ) is the complex Green's function to the equation whose homogeneous solution is the Gabor function e ikx e − (x+a) 2 2σ 2 . When |x| and σ are both much smaller than a, Eq. (9) can be approximated as Similarly as in the uniform case, a leftwards image shift would be effected, in this linear disparity model, by the kernel K (−) (x, x 0 )=K(−x, −x 0 ).

Green's function disparity estimation
Disparity estimation, in the Green's function approach, proceeds thus: the input images, I l and I r , are each multiplied by a Gabor function, yielding complex signals. These are then filtered, respectively, by the K(x, x 0 ) and the K (−) (x, x 0 ) kernels, which effect spatial shifts and phase changes in the Gabor-modulated inputs (see below). The optimal spatial shift at each image location can thus be obtained by evaluating the match of the filtered images when different values of the kernel parameters are employed. This yields an estimate of the disparity map encoded by the stereo pair. The relation of the Green's function approach to the neurophysiological models of stereoscopy stems from the following property of the Green's kernels: when filtering a Gabor-function modulated signal, they yield similarly modulated outputs, but for spatially shifted versions of the signal. For instance, let us consider the result of filtering the complex signal by the kernel K(x, x 0 ). This can be shown to yield (Torreão, 2007) where I r (x − u) is given by Eq. (7), for u = σ 2 /a, and where ψ = κu. Thus, filtering signal I 1 (x) by the kernel K(x, x 0 ) essentially preserves its Gabor modulating factor (with the introduction of a phase), but spatially shifts the modulated image. Assuming that, locally − i.e., under the Gaussian window of the Gabor modulating function −, the disparity between the right and left input images is well approximated by u, such that I l (x) ≈ I r (x − u), we would be able to rewrite Eq. (13) as Together with Eq. (12), this would then mean that I 1 (x) and I 2 (x) correspond, respectively, to a right and a left stereo images, as seen through the receptive fields of a quadrature pair of simple cortical cells, according to the so-called phase-shift model of stereo responses (Fleet et al., 1991). Thus, operating on the equivalent to the right-eye cortical input (viz., the right-eye retinal image as seen through the simple cell right-eye receptive field), the Green's kernel K(x, x 0 ) produces the equivalent to the left-eye cortical input (viz., the left-eye retinal image as seen through the simple cell left-eye receptive field). If we change right for left and left for right in the foregoing development, it also becomes valid for the Green's kernel K (−) (x, x 0 ). Namely, being the Gabor-modulated signal to be filtered by where, once again, u = σ 2 /a. And Eq. (16) can also be rewritten, similarly to Eq. (14), as Thus, operating on the equivalent to the left-eye cortical input (viz., the left-eye retinal image as seen through the simple cell left-eye receptive field), the Green's kernel K (−) (x, x 0 ) produces the equivalent to the right-eye cortical input (viz., the right-eye retinal image as seen through the simple cell right-eye receptive field).
Therefore, if we now compute the local match between the signals I 2 (x) and I − we will be effectively assessing the match between a left-and a right-cortical images, under the assumption that the local disparity is 2u (recall Eqs. (13) and (16)). Different u values can be obtained by changing the parameter a of the Green's kernels, while keeping σ and κ fixed, as proposed in (Torreão, 2007). This allows the estimation of the local disparity, for instance, as where a is that a value for which the measure R(x) is minimized. Incidentally, the measure R(x) affords the comparison of the Green's function approach with the energy model for disparity estimation (Qian, 1994;Qian & Miakelian, 2000). Using Eqs. (13) and (16), we can rewrite it as where ∆ψ = −2κu. When considered for x = 0, Eq. (20), apart from the minus sign in front of the phase factor (which can be easily accomodated by assuming a phase difference of π between the Gabor modulating functions in Eqs. (12) and (15)), is similar to what is predicted for the complex cell response, by the phase-and-position-shift energy model. In the present case, the position shift is given by the disparity measure, d = 2u, and the phase shift, by ∆ψ.

Stimulus-dependent receptive fields
The receptive field (RF) of a visual neuron defines the portion of the visible world where light stimuli evoke the neuron's response, and describes the nature of such response,

61
Towards a Biologically Plausible Stereo Approach www.intechopen.com distinguishing, for instance, excitatory and inhibitory subfields (Hubel & Wiesel, 1962). In the standard description, the spatial organization of the RF remains invariant, and the neuronal response is obtained by filtering the input through a fixed receptive field function. Lately, this classical view has been challenged by neurophysiological experiments which indicate that the receptive field organization changes with the stimuli (Allman et al., 1985;Bair, 2005;David et al., 2004). Motivated by such findings, we have proposed a model for stimulus-dependent receptive field functions − initially for cortical simple cells (Torreão et al., 2009), and later for center-surround (CS) structures, such as found in the retina and in the lateral geniculate nucleus (Torreão & Victer, 2010). In what follows, we present a brief description of the model as originally proposed for simple cells − which is mathematically easier to handle −, later extending it to the CS structures.

Cortical receptive fields
Let us first consider a one-dimensional model. Being I(x) any square-integrable signal, it can be expressed as where the aterisk denotes a spatial convolution, and where σ(ω) and ϕ(ω) are related, respectively, to the magnitude and the phase of the signal's Fourier transform,Ĩ(ω),as Eq. (21) can be verified by rewriting the integral on its right-hand side in terms of the variable ω ′ , and taking the Fourier transform (FT) of both sides. Using the linearity of the FT, and the property of the transform of a convolution, we obtaiñ and, by making use of the sampling property of the delta, which is exactly the definition in Eq. (22). Also, if we make the convolution operation explicit in Eq. (21), it can be formally rewritten as where the angle brackets denote an inner product, with g * (x) standing for the complex conjugate of g(x). Comparing Eq. (25) to a signal expansion on the Fourier basis set, is equivalent to the signal I(x). Thus, we have found a set of signal-dependent functions, localized in space and in frequency, which yield an exact representation of the signal, under the form which amounts to a Gabor expansion with unit coefficients (the above result can be easily verified, again by making the spatial convolution explicit in Eq. (21)).
In (Torreão et al., 2009), the above development has been extended to two dimensions, and proposed as a model for image representation by cortical simple cells, whose receptive fields are well described by Gabor functions (Marcelja, 1980). In the 2D case, Eq. (21) becomes is the model receptive field, with ϕ(ω x , ω y ) being the phase of the image's Fourier transform, and with σ c (ω x , ω y ) being related to its magnitude, as The validity of Eq. (30) can be ascertained similarly as in the one-dimensional case. Moreover, as shown in (Torreão et al., 2009), the same expansion also holds with good approximation over finite windows, with different σ c and ϕ values computed locally at each window. Under such approximation, it makes sense to take the coding functions ψ c (x, y; ω x , ω y ) as models for signal-dependent, Gabor-like receptive fields.

Center-surround receptive fields
A similar approach can be followed for neurons with center-surround organization, as presented in (Torreão & Victer, 2010). The role of the center-surround receptive fields − as found in the retina and in the lateral geniculate nucleus (LGN) − has been described as that of relaying decorrelated versions of the input images to the higher areas of the visual pathway (Attick & Redlich, 1992;Dan et al., 1996). The retina-and LGN-cells would thus have developed receptive field structures ideally suited to whiten natural images, whose spectra are known to decay, approximately, as the inverse of the frequency magnitude − i.e., ∼ (ω 2 x + ω 2 y ) −1/2 (Ruderman & Bialek, 1994). In accordance with such interpretation, we have introduced circularly symmetrical coding functions which yield a similar representation as Eq. (30) for a whitened image, and which have been shown to account for the neurophysyiological properties of center-surround cells.
where I white (x, y) is a whitened image, and where ψ(r; ω x , ω y ) is the CS receptive field function, with r = x 2 + y 2 . Following the usual approach (Attick & Redlich, 1992;Dan et al., 1996), we have modeled the whitened image as the result of convolving the input image with a zero-phase whitening filter, where W(x, y) is such as to equalize the spectrum of natural images, at the same time suppressing high-frequency noise. The whitening filter spectrum has thus been chosen under the formW (ω x , ω y )= ρ 1 + κρ 2 where κ is a free parameter, and ρ = ω 2 x + ω 2 y . On the other hand, the signal-dependent receptive field has been taken under the form where ϕ is the phase of the Fourier transform of the input signal, as already defined, while σ(ω x , ω y ) is related to the magnitude of that transform, as Eq. (37) can be verified by introducing the above ψ(r; ω x , ω y ) into Eq. (33), and taking the Fourier transform of both sides of that equation. We remark that the most commonly used model of center-surround receptive fields, the difference of Gaussians (Enroth-Cugell et al., 1983), has not been considered in the above treatment, since it would have required two parameters for the definition of the coding functions, while our approach provides a single equation for this purpose. Fig. 1 shows plots of the coding functions obtained from a 3 × 3 fragment of a natural image, for different frequencies. The figure displays the magnitude of ψ(r; ω x , ω y ) divided by σ, such that all functions reach the same maximum of 1, at r = 0. Each coding function displays a single dominant surround, whose size depends on the spectral content of the coded image at that particular frequency (when the phase factor in Eq. (36) is considered, we obtain both center-on and center-off organizations). For ρ = 0, σ vanishes, and the coding function becomes identically zero, meaning that the proposed model does not code uniform inputs. At low frequencies, the surround is well defined (Fig. 1a), becoming less so as the frequency increases (Fig. 1b), and all but disappearing at the higher frequencies (Fig. 1c). All such properties are consistent with the behavior of retinal ganglion cells, or of cells of the lateral geniculate nucleus. Fig. 2 shows examples of image coding by the signal-dependent CS receptive fields. The whitened representation is obtained, for each input, by computing Eq. (33) over finite windows. As shown by the log-log spectra in the figure (the vertical axis plots the rotational average of the log magnitude of the signal's FT, and the horizontal axis is log ρ), the approach tends to equalize the middle portion of the original spectra, yielding representations similar to edge maps which code both edge strength and edge polarity. We have observed that the effect of the κ parameter in Eq. (35) is not pronounced, but, consistent with its role as a noise measure, larger κ values usually tend to enhance the low frequencies.
In the following section, we will use the whitened representation of stereo image pairs as input to the Green's function algorithm of Section 2, showing that this allows improved disparity estimation through an approach which is closer to the neurophysiological situation.

Green's function stereo with whitened inputs
We have incorporated a whitening routine into the stereo matching algorithm of Section 2, such that the Green's function procedure is now performed over signals which emulate the neurocortical input from the lower visual areas. Thus, what we have identified as retinal images, in Section 2.3, become the whitened representations of the stereo pair, obtained, through Eq. (33), by means of center-surround, signal-dependent receptive field functions. Disparity estimation proceeds as described, with the proviso that, similarly as in (Torreão, 2007), instead of choosing precisely that disparity which minimizes the matching measure R(x) (Eq. (18)), we take, as our estimate, the sum of all disparities considered, each weighted by the inverse of the corresponding R(x) value. This yields a dense disparity map, avoiding the need of interpolation. The preprocessing alignment of the stereo pair, through 65 Towards a Biologically Plausible Stereo Approach www.intechopen.com uniform-disparity matching, has also been followed (Torreão, 2007), as a means for handling large overall disparities. Figs. 3 to 6 depict some results of the whitened Green's function stereo, along with those yielded by the original approach. It is apparent that the former generally affords better spatial resolution, as well as sharper disparity definition. Being based on whitened inputs, the new algorithm proves less sensitive to image features not depth related, as can be seen in the background region of the meter stereo pair (Fig. 6), more uniformly captured by the whitened approach than by the original one, which is biased by the complex edge structure in the region.

Conclusion
In this chapter, we have reviewed the Green's function stereoscopy (Torreão, 2007) − a neurophyisiologically-inspired stereo matching approach −, along with a recently introduced model for signal-dependent receptive fields (Torreão et al., 2009;Torreão & Victer, 2010). By coupling the Green's function algorithm with a whitening representation of the stereo inputs, based on center-surround, signal-dependent receptive field functions, we have been able to obtain better disparity estimates, through an approach which is closer to the neurophysiological situation. We are presently working on the incorporation, into our stereo algorithm, of the Gabor-like, signal-dependent receptive model of Section 3.1. This will allow a more realistic parallel with the cortical mechanisms of biological stereo vision.

68
Advances in Stereo Vision www.intechopen.com