## 1. Introduction

The computer vision research aims at a better understanding of the human visual system and building artificial visual systems. Vision researchers in psychology and physiology have explored biological visual systems including the human vision, and obtained much knowledge on nature and architecture of their visual information processing. For example, some of previous results in experimental psychology suggested integration of several visual cues [1], [2], [3], and others of them showed evidence of anisotropy in the stereo depth perception [4], [5]. Mathematical models and computer algorithms developed according to previous experimental results help us to understand the human visual system and to build artificial visual systems.

The human visual system has two eyes aligned on a horizontal line. When the system captures an object located in a three-dimensional space, it perceives depth of the object. The visual system projects the object onto both of left and right retinae and the projected intensity distributions are referred to as retinal images. Since the eyes see the object from slightly different three-dimensional positions, the object is projected at slightly different positions of both retinal images. The positional difference which is referred to as stereo disparity gives the depth of the object according to the concept of triangulation [6]. Detection of the stereo disparity requires a task of finding a reliable one-to-one correspondence between the left and right images. It is difficult to deal with this task, because there are ambiguities such as repeated texture and uniform color. The human visual system seems to have some robust mechanism for finding stereo disparity.

Motivated by the human visual system, Marr and Poggio presented a computer algorithm of stereo disparity detection [7], [8], [9]. Their algorithm named “cooperative algorithm” solves the stereo correspondence problem with a biologically inspired grid system, in which they placed cells at grid points and connected neighboring cells. In order to solve the stereo correspondence problem and to obtain a dense stereo disparity map, they proposed imposing two constraints: uniqueness and continuity on the disparity map. The uniqueness constraint states that a point on the stereo disparity map has a unique disparity level except for transparent surfaces and object boundaries having multiple disparity levels. The continuity constraint states that neighboring grid points share the same or similar disparity levels except object boundaries. Marr and Poggio designed the grid system so as to satisfy the two constraints, by connecting neighboring cells cooperatively for the continuity constraint, and by connecting multi-layered grid systems exclusively for the uniqueness constraint.

Psychological and biophysical research results have affected the computer vision research including the cooperative algorithm. According to the Gestalt psychology [10], [11], when the human visual system captures an image consisting of small figures such as short lines and small crosses, it perceives a group of neighboring elements sharing the same or similar visual properties [12]. The Gestalt psychologists originally found this phenomenon and clearly stated the laws of closure, similarity, proximity, symmetry, continuity and common fate. In addition, previous biophysical research results showed that biological cells respond to external stimuli and exhibit a nonlinear excitation-inhibition process. Therefore, we understand that these previous results have affected the cooperative algorithm by Marr and Poggio. We can find similarities between the continuity constraint and the perceptual grouping exhibiting the laws of similarity, proximity and continuity, and between behavior of artificial cells in the cooperative algorithm and the cell responses in the excitation-inhibition process.

We previously presented several reaction-diffusion algorithms for segmentation and stereo disparity detection under the concept of reaction-diffusion systems [13], [14]. A reaction-diffusion system refers to the system of diffusively coupled elements exhibiting an excitation-inhibition process [15]. The reaction-diffusion system is mathematically described with a set of time-evolving partial differential equations consisting of diffusion terms and reaction ones. By numerically computing the reaction-diffusion system, we can simulate spatio-temporal phenomena such as pulse propagation observed in natural and biological systems, for example, in biological information transmission processes. We can expect that the pulse propagation phenomenon in a reaction-diffusion system serves as the continuity constraint in the stereo correspondence problem, and thus we proposed a stereo algorithm consisting of exclusively connected multi-layered reaction-diffusion systems [14]. In addition, inspired by the strong inhibitory diffusion causing Turing patterns [16], and suggested by a lateral inhibition mechanism in a biological visual system [17], we have imposed the strong inhibitory diffusion on the reaction-diffusion stereo algorithm.

This chapter presents recent advances in the reaction-diffusion stereo algorithm introducing anisotropy in diffusion processes. After quickly reviewing previous related work achieved in several different areas of psychology, stereo algorithms, reaction-diffusion and physiology in Section 2, we describe elementary stereo geometry, the cooperative algorithm and a reaction-diffusion system as preliminaries in Section 3. Then, we proceed to the original reaction-diffusion stereo algorithm with isotropic diffusion processes and its recent advances introducing anisotropic diffusion processes in Section 4. The section also presents the cooperative algorithm revised with a reaction-diffusion system. Then, we demonstrate comparison among the reaction-diffusion stereo algorithms including the original and anisotropic ones for several stereo image pairs provided on the Middlebury stereo vision page [18], [19]. We discuss the experimental results and consider future research topics for the reaction-diffusion stereo algorithms. Finally, we conclude this chapter by summarizing the reaction-diffusion stereo algorithms, the experimental results and the future research topics.

## 2. Background

### 2.1. Human stereo depth perception

When the human visual system captures a three-dimensional scene with two eyes, it perceives depth structure of the scene. Even if the system is exposed to a pair of random-dot stereo images having only randomly dotted pattern (random-dot stereogram), it can perceive depth structure of objects embedded into the stereo images. Julesz generated random-dot stereograms by utilizing computers [20], [21]. Thimbleby et al. later proposed a computer algorithm generating a single image named “autostereogram”, in which they embedded a pair of stereo images; the autostereograms successfully caused stable depth perception for the human visual system [22]. These previous findings suggested that structural image pattern or higher knowledge on objects is unnecessary for the human stereo depth perception. In addition, the findings suggested that the human visual system has a function or a module of detecting disparity in its early vision. With the technique artificially generating random-dot stereograms many researchers have explored the nature of the human visual system. For example, it was shown that there exists the analogy between stereo depth perception and brightness perception [23], [24]. It was strongly suggested that a common mechanism underlies both the stereo depth perception and the brightness one.

In the depth perception, the human visual system integrates several visual cues such as stereo disparity, motion parallax and monocular configuration, each of which brings depth information. Landy et al. presented a weak fusion model that linearly integrates the visual cues with weighted averaging [1]. In contrast to the weak fusion model, Bradshaw and Rogers presented a strong fusion model that integrates the cues of stereo disparity and motion parallax with a nonlinear manner [2]. The strong fusion model states that each of the two modules processing stereo disparity and motion parallax takes an output of the other module as feedback, and thus the two modules are depending each other. Ichikawa et al. examined the depth perception and presented an integration model of the three visual cues: stereo disparity, motion parallax and monocular configuration [3]. Their model states that the cues of disparity and motion parallax are integrated with the strong fusion model at particular spatial frequency-tuned channels, and then outputs obtained at all the channels are furthermore integrated linearly with the cue of monocular configuration with the weak fusion model.

There is anisotropy in the human stereo depth perception; the human visual system perceives differently a horizontally slanted surface and a vertically slanted one. Rogers and Graham measured depth effect for an object having a one-dimensionally curved surface with the Cornsweet type depth profile [4]. When the surface of the object slants horizontally, the human visual system perceives a part of the surface to be nearer than the true depth. However, when the surface slants vertically, the system perceives depth of the surface correctly. From these experimental results, they presented a hypothesis. There exist two different processes for perceiving vertical and horizontal slant surfaces; this brings anisotropy in the human stereo depth perception. Ichikawa more carefully examined the anisotropy with respect to latency and adaptation of the stereo depth perception for three different depth profiles and for a wide range of orientation [5]. His results also showed the similar anisotropy and presented evidence to support the hypothesis presented by Rogers and Graham.

These previous experimental results inspired us to develop the reaction-diffusion stereo algorithm. The former evidence showing the integration of several visual cues does not straightforwardly indicate the integration of image edge information into stereo disparity detection. However, it motivated us to study the integration of the image edge information, which is obtained with another algorithm having a mechanism similar to reaction-diffusion [25]. The latter evidence showing anisotropy encouraged us to divide a diffusion process into two diffusion processes, that is, horizontally and vertically oriented ones, with different diffusion coefficients.

### 2.2. Previous stereo algorithms

Dev proposed a feature segmentation model and its application to the stereo disparity detection [26]. The segmentation model employed a multi-layered network of which each grid point is described with a neural process having excited and resting states. When a point on the network enters an excited state, it becomes a member of the group of the associated feature. For achieving the segmentation, she imposed two interactions on the multi-layered network; in one of the interactions a point on a network layer should have excitatory links to neighboring points, and in the other one of the interactions the point in an excited state should inhibit excitation of neighboring points on other network layers. She applied the segmentation model to stereo disparity detection, in which each layer of the network is associated with each layer of a possible disparity level. The former interaction is similar to the continuity constraint, and the latter one is similar to the uniqueness constraint imposed on the cooperative algorithm [7], [8], [9]. Although the model by Dev [26] and the cooperative algorithm impose the similar interactions or constraints, the algorithm by Dev has an additional inhibitor network, which controls the inhibitory interaction; this is the main difference between the algorithm by Dev and the cooperative algorithm by Marr and Poggio.

Following the earlier work of the cooperative algorithm, many stereo algorithms have been proposed [27]. There are three main categories of the cooperative algorithm, the matching algorithm [28] and the regularization algorithm [29], [30]. The matching algorithm deals with the stereo correspondence problem by utilizing a similarity measure such as a cross-correlation coefficient computed between left and right images; in the case of the block matching algorithm, a rectangular area is utilized for the similarity measure. If images are rectified, an object in the left image should exist at the same vertical position in the right image; thus, a search area for the matching algorithm is usually restricted on a horizontal line (the epipolar constraint). Since a larger value of the cross-correlation coefficient indicates a higher probability of a correspondence between left and right images, the disparity with the maximum coefficient among possible disparities is employed as the detected disparity. The regularization algorithm formulates a functional consisting of a data term and a smoothness term. The data term denotes difference between left and right images at a disparity level, and the smoothness term denotes the continuity constraint. By minimizing the functional, the regularization algorithm provides a disparity map. Thus, the regularization algorithm avoids explicitly dealing with the stereo correspondence problem by replacing the problem with a kind of the optimization problem.

In order to obtain a reliable stereo disparity map, we need to solve several problems such as the aperture problem, the occlusion problem and the problem arising from transparent surfaces. The following describes what the three problems refer to, and how classical stereo algorithms have approached the problems.

The matching algorithm needs to estimate an optimal window size for measuring the similarity. Real images usually contain untextured or featureless areas, which do not provide information to find a stereo correspondence. If we extend a window size so as to cover neighbor textured or feature-rich areas, we may obtain the drawback of a highly blurred disparity map, in which we can not expect detailed depth structure around object boundaries, resulting in lack of information necessary for later stages of the visual system. On the other hands, a smaller window size results in an unreliable disparity map. The aperture problem refers to the trade-off problem in estimating the window size. Kanade and Okutomi proposed a block matching stereo algorithm with an adaptive correlation window [28]; the algorithm controls the size and the shape of the window area for reducing uncertainty and simultaneously for keeping detailed depth structure.

Let us consider a situation in which there are two objects in a three-dimensional scene and one of the two objects partly occludes the other one on captured stereo images. Since two eyes capture the scene from slightly different positions, a part of the occluded object appears in one of the images and remains to be occluded in the other one. There is no corresponding points for the part of the occluded object in the other image, and thus stereo algorithms tend to find false correspondences for the part. This is the occlusion problem. A bi-directional matching technique provides a cue for detecting occluded areas [31], [32], [33], as follows. In the case where an interest point ${p}_{l}$ in the left image is not occluded in the right image, the corresponding point in the right image, ${p}_{r}$, can be detected by searching for a point with the maximum similarity, and then a point ${p}_{l}^{\text{'}}$ in the left image, which corresponds to the detected point ${p}_{r}$, will be searched. Now the coordinate of ${p}_{l}^{\text{'}}$ is expected to be same with that of ${p}_{l}$. On the other hand, in the case where an interest point ${q}_{l}$ in the left image does not have its corresponding point in the right image due to the occlusion, the point ${q}_{r}$ with maximum similarity in the right image have the corresponding point ${q}_{l}^{\text{'}}$ on the left image, whose coordinate is different from that of ${q}_{l}$. It means that by detecting a two-step corresponding point and comparing it with the interest point, it is possible to judge if the interest point is occluded in the other image or not. As another approach, Zitnick and Kanade proposed a modern cooperative algorithm that can detect the occlusion areas [34]. As similar to the classical cooperative algorithm, their algorithm iteratively updates states of network. In contrast to the classical algorithm, the modern algorithm multiplies similarity distributions and states of the network at every iteration. This simple modification for the classical cooperative algorithm effectively detected occlusion areas.

Most stereo algorithms assume that objects are opaque and a point on a disparity map has only one disparity level; that is, they impose the uniqueness constraint on a disparity map. However, when we see an object through a window of glass material or wire fences, we perceive two different surfaces [35]. The human visual system can perceive multiple disparity levels from a single pair of stereo images. According to the result of Tsirlin et al. [35], the human visual system can perceive up to six overlaid surfaces; Akerstrom and Todd examined the performance of the human visual system for random-dot stereograms including transparent surfaces [36]. Later, Shizawa presented a mathematical model describing the transparency upon the principle of superposition [37]. Szeliski and Golland proposed a stereo algorithm for transparent objects [38]. The occlusion problem can be also considered as the stereo transparency problem. Since an object occludes another object at an occluding boundary, there exist two disparity levels at the boundary. Assumption of two disparity levels brought another approach to solve the occlusion problem [39].

Since there have been proposed many stereo algorithms, Scharstein and Szeliski built a website named “Middlebury Stereo Vision Page” [19], [18] for quantitative evaluations of the algorithms. The website provides several stereo images, an evaluation system for stereo disparity maps and tables ranking stereo algorithms submitted to the website. There are two state-of-the-art algorithms in addition to the above mentioned three algorithms. One of the two state-of-the-art algorithms is the belief propagation algorithm and the other one is the graph-cuts algorithm. The belief propagation algorithm, which was originally proposed by Sun et al. [40], has grid points on a disparity map, and propagates the belief into their neighboring points. The belief denotes a kind of probability showing existence or non-existence of its associated disparity level at a grid point. By updating the state of a grid point with messages of belief received from its neighboring points, the algorithm builds a disparity map iteratively. The graph cuts algorithm was originally proposed by Kolmogorov and Zabih [41]. In the algorithm, we consider a graph network so as to express a functional shown in the regularization algorithm. By minimizing the number of points cutting the graph network, the algorithm provides a disparity map.

### 2.3. Reaction-diffusion systems related to image processing and vision research

Reaction-diffusion systems are common in nature, in particular, in chemical and biological systems [15]. Let us focus on a photo-sensitive chemical reaction-diffusion system. Busse and Hess firstly reported that the two-dimensional chemical reaction system senses illumination and generates a circular pattern of a chemical concentration wave at an illuminated point [42]. Kuhnert et al. reported that the photo-sensitive system has the functions of image memory, edge enhancement and segment detection for image pattern projected onto a surface of the two-dimensionally extended chemical solution [43], [44]. They also mentioned that the reaction-diffusion system is applicable to image processing and becomes a candidate of a new computer architecture with parallel processing. After their findings, many researchers examined experiments and proposed mathematical models on the photo-sensitive system. In particular, Sakurai et al. experimentally demonstrated a traveling path of a chemical reaction wave guided by a feedback control system having an illumination light [45]. These preceding research results suggested that reaction-diffusion systems provide new research topics in image processing and vision research.

An example of a biological reaction-diffusion system exists in an active pulse transmission process along a nerve axon. A mathematical model of the system is formulated with a set of two reaction-diffusion equations consisting of diffusion terms and the FitzHugh-Nagumo type reaction terms [46], [47]. If we stimulate a point on the reaction-diffusion system, we can observe pulses traveling from the point on the system. Stereo algorithms presented in this chapter utilize the FitzHugh-Nagumo type reaction-diffusion system; the nature of traveling pulses helps to realize the continuity constraint. Later, Section 3.3. describes how the reaction-diffusion system works for the stereo correspondence problem.

With a reaction-diffusion system Turing proposed a mechanism for explaining a stationary pattern formation process observed in a biological system [16]. He considered a set of two reaction-diffusion equations having two variables: activator and inhibitor. In general, a diffusion process brings a spatial uniform distribution. For example, when a chemical species distributes non-uniformly in a space, it diffuses according to a gradient of the distribution and finally distributes uniformly. However, by considering two diffusion processes on activator and inhibitor and by assuming that the inhibitor diffuses more rapidly than the activator does, Turing found that those diffusion processes bring a non-uniform stationary pattern. According to the mechanism proposed by Turing [16], Gierer and Meinhardt [48] proposed more biologically plausible models explaining how biological systems self-organize spatial patterns. More recently, several researchers reported evidence showing that the Turing mechanism causes pattern formation observed on a fish skin [49], and other researchers identified two proteins as an activator and its inhibitor for a hair follicle spacing pattern of a mouse [50]. As the results of these researches, biologists have accepted the Turing mechanism as a possible mechanism explaining biological pattern formation.

Mach found an edge enhancement phenomenon for a step-wise illumination change in the human brightness perception;
it is now known as the Mach-bands pattern [51].
Hartline and Ratliff found that there are excitatory and inhibitory interactions among outputs of individual eye units in compound eyes of *Limulus*,
and proposed a linear model of equations describing the interactions [52].
Later, Barlow and Quarles proposed a nonlinear model for explaining the Mach-bands pattern observed in the visual system of *Limulus* [17];
they derived the nonlinear model by modifying the original model proposed by Hartline and Ratliff [52].
By comparing laboratory experiments with numerical results of the two models of equations,
they indicated the importance of the long-range inhibition and the nonlinearity in the modified model for explaining the Mach-bands pattern.
Gierer and Meinhardt pointed out that the long-range inhibition is analogous to the rapid inhibitory diffusion of the Turing condition imposed
in modeling biological pattern formation processes [48].
These results suggested that the long range inhibition or the rapid inhibitory diffusion may play an important role in biological pattern formation and vision.

As pointed out by Gierer and Meinhardt [48],
we have been interested in some common mechanism organizing visual functions, or underlying biological visual systems and pattern formation processes.
We believe that the common mechanism is reaction-diffusion with the rapid inhibitory diffusion proposed by Turing [16],
or the long-range inhibition found in eyes of *Limulus*.
In order to show this,
we have built the visual functions of edge detection [53], [54], [55], segmentation (grouping) [13] and stereo disparity detection [14],
explored how the mechanism works, and confirmed how much the mechanism is effective in the visual functions.

## 3. Preliminaries

### 3.1. Stereo geometry

A stereo vision system consists of two cameras, which independently project an object located in a three-dimensional space onto the image planes of the cameras. Let us consider a simple stereo vision system, in which optical axes of the two cameras are parallel and a horizontal line passing through the origin of the left image plane also passes through the origin of the right image plane, as shown in Fig. 1. In this situation, an epipolar line refers to each of horizontal lines shared by the two image planes. If we utilize a pin-hole camera model, we can simply obtain depth of the object from stereo disparity $d$ (pixel) [6]. Let $f$ be a focal length of the cameras and $\ell $ be distance between the two optical axes. The stereo vision system projects the object at the position ${x}_{\text{L}}$ on the left image ${I}_{\text{L}}$ and at the position ${x}_{\text{R}}$ on the right image ${I}_{\text{R}}$. Then, the depth $D$ becomes $D=f(\ell -d)/d$ with $d={x}_{\text{L}}-{x}_{\text{R}}$. Thus, if we can solve the stereo correspondence problem at particular points from the stereo images, we can obtain a full disparity map $M(x,y)$ and reconstruct its depth structure.

### 3.2. Cooperative algorithm for stereo disparity detection

A simple way of solving the stereo correspondence problem is to utilize a similarity measure such as a cross-correlation coefficient computed for local areas between stereo images. Let $\mathcal{B}$ be a local area for computation of the similarity measure; for example, the area is defined as a $3\times 3$ pixels square area consisting of a target discrete position $(i,j)$ and its relative positions ${\mathcal{B}}_{3\times 3}=\left\{({i}^{\text{'}},{j}^{\text{'}})\right|-1\le {i}^{\text{'}}\le 1,\phantom{\rule{0.277778em}{0ex}}-1\le {j}^{\text{'}}\le 1\}$ in a discretized coordinate system, or a cross-like local area consisting of the target position and its nearest neighboring four relative positions ${\mathcal{B}}_{5}=\{(0,0),(-1,0),(1,0),(0,-1),(0,1)\}$. If we utilize a cross-correlation coefficient as the similarity measure and the local area $\mathcal{B}$ as its local correlation area, we can compute a similarity measure ${C}_{d,i,j}$ between ${I}_{\text{L},i,j}$ and ${I}_{\text{R},i-d,j}$ as follows:

$${C}_{d,i,j}=\frac{1}{{s}_{\text{L},i,j}\times {s}_{\text{R},i-d,j}}\sum _{({i}^{\text{'}},{j}^{\text{'}})\in \mathcal{B}}\left[{I}_{\text{L},i+{i}^{\text{'}},j+{j}^{\text{'}}}-\overline{{I}_{\text{L},i,j}}\right]\times \left[{I}_{\text{R},i+{i}^{\text{'}}-d,j+{j}^{\text{'}}}-\overline{{I}_{\text{R},i-d,j}}\right],$$ | () |

in which $d$ (pixel) is a discrete disparity level; ${s}_{\text{L},i,j}$ is the standard deviation of ${I}_{\text{L},i+{i}^{\text{'}},j+{j}^{\text{'}}}$ and ${s}_{\text{R},i-d,j}$ is that of ${I}_{\text{R},i+{i}^{\text{'}}-d,j+{j}^{\text{'}}}$ for $({i}^{\text{'}},{j}^{\text{'}})\in \mathcal{B}$; $\overline{{I}_{\text{L},i,j}}$ is the average of ${I}_{\text{L},i+{i}^{\text{'}},j+{j}^{\text{'}}}$ and $\overline{{I}_{\text{R},i-d,j}}$ is that of ${I}_{\text{R},i+{i}^{\text{'}}-d,j+{j}^{\text{'}}}$ for $({i}^{\text{'}},{j}^{\text{'}})\in \mathcal{B}$. The similarity measure of Eq. (1) provides 1.0 for a pair of completely matched local areas of the left and right images. If the similarity measure provides a high value such as ${C}_{d,i,j}\simeq 1.0$ for the disparity level $d$ and provides low values for other disparity levels, the position $(i-d,j)$ on the right image ${I}_{\text{R}}$ becomes a candidate for the corresponding point of the position $(i,j)$ on the reference left image ${I}_{\text{L}}$. Let $\mathcal{D}$ be a set of possible discrete disparity levels; $\mathcal{D}=\{{d}_{0},{d}_{1},\cdots ,{d}_{(N-1)}\}$. Thus, detection of the maximum of ${C}_{d,i,j}$ for $d\in \mathcal{D}$ at a position $(i,j)$ provides a stereo disparity map ${M}_{i,j}$, as follows:

As stated in Section 2.2., there are several typical problems in the stereo correspondence problem. The aperture problem results from situations in which the local area defined by $\mathcal{B}$ does not have enough intensity information such as distinguishable patterns from other areas and thus the area is matchable to each of the most areas on its epipolar line. In the occlusion problem, since a position on one of the stereo images is occluded on the other image, searching the maximum of the similarity measure may bring a false position as its corresponding point. For stereo images containing transparent objects, there are multiple maximum values at a position; it is difficult to detect multiple disparity levels without information of its multiplicity or under noisy situations.

Marr and Poggio proposed a classical cooperative algorithm by focusing on a random-dot stereogram [7], [8], [9]. They imposed two constraints: continuity and uniqueness on a stereo disparity map ${M}_{i,j}$. They considered a cell network in which cells enter an excited state or a resting state and a state of a cell propagates into its neighboring cells, and utilized a multi-layered cell network, in which each network layer is associated with a disparity level $d\in \mathcal{D}$. The cooperative algorithm iteratively computes states of cells on the network. Let ${S}_{d,i,j}^{k}$ be a state of the cell located at a position $(i,j)$ on a network layer associated with a disparity level $d$; a large value of ${S}_{d,i,j}^{k}\simeq 1.0$ denotes that the cell at $(i,j)$ on the network layer is in an excited state, and a small value of ${S}_{d,i,j}^{k}\simeq 0.0$ denotes that the cell is in a resting state. Then, the state ${S}_{d,i,j}^{k+1}$ at the $(k+1)$-th iteration is updated, as follows:

$${S}_{d,i,j}^{k+1}=\sigma \left(\sum _{({i}^{\text{'}},{j}^{\text{'}},{d}^{\text{'}})\in \Omega (i,j,d)}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}-\xi \sum _{({i}^{\text{'}},{j}^{\text{'}},{d}^{\text{'}})\in \Theta (i,j,d)}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}+{C}_{d,i,j},T\right),$$ | () |

in which $\sigma (S,T)$ is a step function which returns 1 for $S>T$ and returns 0 for $S\le T$ with the threshold level $T$; $\xi $ is a constant for inhibition; $\Omega $ denotes an excitatory domain for the continuity constraint and $\Theta $ denotes an inhibitory domain with $\mathcal{D}\setminus \left\{d\right\}$ for the uniqueness constraint (for more detail, see Fig. 1 in Ref. [34]). Finally, as similar to Eq. (2), the cooperative algorithm provides a disparity map ${M}_{i,j}^{k}$ with

In Eq. (3) the term ${\sum}_{\Omega}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ works for the continuity constraint and the term ${\sum}_{\Theta}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ works for the uniqueness constraint, as follows. If most of neighboring cells in $\Omega $ are in excited states with ${S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}\simeq 1.0$, ${\sum}_{\Omega}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ also becomes large. Then, according to Eq. (3), if ${\sum}_{\Omega}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ becomes larger than the threshold level $T$, the next state of ${S}_{d,i,j}^{k+1}$ also enters an excited state. Thus, all cells in the local area denoted by $\mathcal{B}$ are in excited states, and the area becomes to be in the disparity level $d$. This is what the continuity constraint indicates. If there is a situation in which ${\sum}_{\Theta}{S}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ is large, that is, if cells on other network layers associated with other disparity levels are already in excited states, the threshold level $T$ becomes relatively large. Thus, the cell on the network layer of $d$ tends to remain in a resting state, even if its surrounding cells on the same network layer are in excited states. This is what the uniqueness constraint indicates.

### 3.3. Reaction-diffusion system and coupled nonlinear elements

A reaction-diffusion system is described with a set of reaction-diffusion equations consisting of diffusion terms and reaction ones. Let us consider a two dimensional space ${\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$, in which a process governed by the reaction-diffusion system proceeds with time $t$ and the process has two distributions of activator $u(x,y,t)$ and inhibitor $v(x,y,t)$ defined at a position $(x,y)\in {\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$ and $t\ge 0$. The reaction-diffusion system has a general form of two reaction-diffusion equations with their reaction terms $f(u,v)$ and $g(u,v)$, as follows:

$$\begin{array}{ccc}\hfill {\partial}_{t}u& =& {D}_{u}{\nabla}^{2}u+f(u,v),\hfill \\ \hfill {\partial}_{t}v& =& {D}_{v}{\nabla}^{2}v+g(u,v),\hfill \end{array}$$ | () |

in which ${\partial}_{t}=\partial /\partial t$ and $\nabla $ is a two-dimensional gradient operator; ${D}_{u}$ and ${D}_{v}$ are diffusion coefficients. For example, the FitzHugh-Nagumo type reaction-diffusion equations have the following reaction terms [46], [47]:

f(u,v) = [ u(u-a)(1-u) - v ]/ε,

g(u,v) = u - bv, in which $a$ and $b$ are constants and $\epsilon $ is a small constant ($0<\epsilon \ll 1$).

In order to understand temporal behavior of the FitzHugh-Nagumo type reaction terms, let us consider a single element governed by the following ordinary differential equations:

du/dt = f(u,v) = [u(u-a)(1-u)-v]/ε,

dv/dt = g(u,v) = u-bv.

Equations (9) and (10) describe temporal changes of $u\left(t\right)$ and $v\left(t\right)$. Figure 2 shows the temporal changes as well as phase-portraits of Eqs. (9) and (10). Depending on the parameter settings of $a$ and $b$, the element of Eqs. (9;) and (10) exhibits three different types of behavior, such as a mono-stable element shown in Fig. 2(a), a bi-stable element shown in Fig. 2(b), and an oscillatory element shown in Fig. 2(c). The mono-stable element has one stable steady state, to which all solutions converge; the bi-stable element has two stable steady states and solutions converge to either of the two states. When a solution of an element is in areas having large values of $u>0.5$, we denote that the element is in an excited state; when a solution is in areas having small values of $\left|u\right|\simeq 0$, we denote that the element is in a resting state. In contrast to the mono-stable and bi-stable elements having stable steady states, an oscillatory element does not have any stable steady state, and autonomously alternates between an excited state and a resting one as time proceeds.

In addition to the reaction-diffusion system of diffusively coupled elements such as the FitzHugh-Nagumo type, there is also an interesting system consisting of discretely coupled elements; the former system is a continuous system and the latter one is a discrete system. For example, the following equations describe a discrete system with reaction terms of $f(\xb7,\xb7)$ and $g(\xb7,\xb7)$ in a two-dimensional grid system $(i,j)$ on ${\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$, as follows:

$$\begin{array}{ccc}\hfill \frac{\text{d}{u}_{i,j}\left(t\right)}{\text{d}t}& =& {C}_{u}\left[\sum _{({i}^{\text{'}},{j}^{\text{'}})\in {\mathcal{B}}_{5}\setminus \left\{(0,0)\right\}}{u}_{i+{i}^{\text{'}},j+{j}^{\text{'}}}-4{u}_{i,j}\right]+f({u}_{i,j},{v}_{i,j}),\hfill \\ \hfill \frac{\text{d}{v}_{i,j}\left(t\right)}{\text{d}t}& =& {C}_{v}\left[\sum _{({i}^{\text{'}},{j}^{\text{'}})\in {\mathcal{B}}_{5}\setminus \left\{(0,0)\right\}}{v}_{i+{i}^{\text{'}},j+{j}^{\text{'}}}-4{v}_{i,j}\right]+g({u}_{i,j},{v}_{i,j}),\hfill \end{array}$$ | () |

in which ${C}_{u}$ and ${C}_{v}$ are positive coupling strength; ${\mathcal{B}}_{5}$ defines a local area consisting of a target position and its nearest neighboring four positions, as introduced in Section 3.2.. Equations (11) and (12) are similar to a spatially discretized version of the reaction-diffusion system of Eqs. (5) and (6); thus, both of the reaction-diffusion system and the system of discretely coupled elements have common mechanisms such as the nonlinear reaction denoted by $f(\xb7,\xb7)$ and $g(\xb7,\xb7)$ and spatial coupling of the nonlinear elements.

Let us confirm spatio-temporal patterns generated by the continuous system and the discrete one; both the systems consist of the FitzHugh-Nagumo type reaction terms of Eqs. (7) and (8). Figure 3 shows spatio-temporal representations of $u$ and $v$ and their one-dimensional snapshots obtained in one-dimensional continuous or discrete space. The continuous mono-stable system generates a traveling pulse as shown in Fig. 3(a), and the continuous bi-stable system generates a propagating wave as shown in Fig. 3(c); velocity of the pulse and the wave is almost constant, depending on the parameter settings of the system. The continuous oscillatory system generates multiple pulses that also travel in the space as shown in Fig. 3(e). In contrast to these,the discrete system with strong-inhibitory coupling (${C}_{u}\ll {C}_{v}$) generates stationary pulse(s) and a wave, as shown in Figs. 3(b), 3(d) and 3(f). For example, the discrete mono-stable system generates a pulse fixed at the edge position of a step-wise initial distribution ${u}_{i}(t=0)$. This result indicates that the discrete system is applicable to edge detection for its initial condition.

In the neural network approach, Wang and his coworkers proposed a segmentation algorithm (eg [56]). The algorithm consists of FitzHugh-Nagumo type oscillatory elements with complex spatial coupling rules and a global inhibitor that controls the oscillatory elements. In the algorithm, an oscillation period is automatically divided into the number of segments to be detected. During one oscillation period, one segment emerges in a part of the period and another segment emerges in another part of the period. The idea of utilizing the global inhibitor is similar to the inhibitory array of the algorithm proposed by Dev [26]. However, in contrast to the inhibitory array in the algorithm of Dev, the algorithm by Wang and his coworkers has the global inhibitor represented by one variable. The approach by Wang and his coworkers is also similar to our approach of reaction-diffusion algorithms. Although both approaches utilize FitzHugh-Nagumo type elements, our approach imposes the strong inhibitory coupling on the algorithms without any global inhibitor nor any complex spatial coupling rules. We would like to emphasize this point as the main difference between the two approaches.

## 4. Reaction-diffusion stereo algorithm

### 4.1. Original reaction-diffusion stereo algorithm

This section presents a stereo algorithm solving the stereo correspondence problem, by utilizing bi-stable reaction-diffusion systems. Let us recall that the bi-stable system with the FitzHugh-Nagumo type reaction terms generates a traveling wave as shown in Fig. 3(c). Areas in excited states extend outward, as their interface facing an excited state and a resting state propagates into areas in resting states. We apply the nature of the traveling wave to the continuity constraint imposed on the stereo algorithm. According to the original cooperative algorithm mentioned in Section 3.2., we consider multiple reaction-diffusion systems and associate each of the systems with a possible disparity level. Each system governs areas having the associated disparity level; if a point of the system enters an excited state and all of the other systems are in resting states at that point, the algorithm judges the point to have the associated disparity level. We consider a state of each reaction-diffusion system as a kind of possibility and thus the traveling wave works as the continuity constraint. If the reaction-diffusion systems are independent, the traveling wave fills in everywhere of the space ${\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$. In order to partition a disparity map into segments having correct disparity levels, we need to link the multiple reaction-diffusion systems exclusively for the uniqueness constraint. This can be done via the parameter $a$; however $a$ is a constant in the original FitzHugh-Nagumo type reaction term of Eq. (7), we consider $a$ as a threshold level depending on states of other reaction-diffusion systems; the role of $a$ is similar to that of the threshold level $T$ of the cooperative algorithm described with Eq. (3).

The above consideration brings a set of reaction-diffusion equations associated with a disparity level $d\in \mathcal{D}$, as follows:

$$\begin{array}{ccc}\hfill {\partial}_{t}{u}_{d}& =& {D}_{u}{\nabla}^{2}{u}_{d}+\left[{u}_{d}({u}_{d}-{a}_{d})(1-{u}_{d})-{v}_{d}\right]/\epsilon +\mu {C}_{d}(x,y),\hfill \\ \hfill {\partial}_{t}{v}_{d}& =& {D}_{v}{\nabla}^{2}{v}_{d}+({u}_{d}-b{v}_{d}),\hfill \\ \hfill {a}_{d}& =& \alpha +\left[1+tanh\left({d}_{a}-\beta \right)\right]\times \underset{{d}^{\text{'}}\in \Theta}{max}{u}_{{d}^{\text{'}}}/2,\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}{d}_{a}=\left|d-\underset{{d}^{\text{'}}\in \Theta}{\text{argmax}}{u}_{{d}^{\text{'}}}\right|,\hfill \end{array}$$ | () |

in which ${C}_{d}(x,y)$ denotes an external stimulus and $\mu $ is its coefficient; the reaction-diffusion stereo algorithm utilizes a cross-correlation coefficient of Eq. (1) as ${C}_{d}(x,y)$; $\alpha $ and $\beta $ are constants.

After sufficient duration of time ${L}_{t}$, a disparity map is obtained by

Numerical implementation of the reaction-diffusion stereo algorithm requires discretization of Eqs. (13) and (14). Spatially and temporally discretized coordinate systems of $i=\lfloor x/\delta h\rfloor ,j=\lfloor y/\delta h\rfloor $ and $k=\lfloor t/\delta t\rfloor $, in which $\lfloor \xb7\rfloor $ is the floor function, derive the following set of equations approximately describing Eqs. (13) and (14), as follows:

$$\begin{array}{ccc}\hfill {u}_{d,i,j}^{k+1}-{C}_{u}\left[\sum _{({i}^{\text{'}},{j}^{\text{'}})\in {\mathcal{B}}_{5}\setminus \left\{(0,0)\right\}}{u}_{d,i+{i}^{\text{'}},j+{j}^{\text{'}}}^{k+1}-4{u}_{d,i,j}^{k+1}\right]& =& {u}_{d,i,j}^{k}+\delta tf({u}_{d,i,j}^{k},{v}_{d,i,j}^{k})+\delta t\mu {C}_{d,i,j},\hfill \\ \hfill {v}_{d,i,j}^{k+1}-{C}_{v}\left[\sum _{({i}^{\text{'}},{j}^{\text{'}})\in {\mathcal{B}}_{5}\setminus \left\{(0,0)\right\}}{v}_{d,i+{i}^{\text{'}},j+{j}^{\text{'}}}^{k+1}-4{v}_{d,i,j}^{k+1}\right]& =& {v}_{d,i,j}^{k}+\delta tg({u}_{d,i,j}^{k},{v}_{d,i,j}^{k}),\hfill \end{array}$$ | () |

in which ${C}_{u}=\delta t{D}_{u}/\delta {h}^{2}$ and ${C}_{v}=\delta t{D}_{v}/\delta {h}^{2}$. For example, the Gauss-Seidel method provides solution for a set of linear equations such as each of Eqs. (17) and (18). Algorithm 1 describes a pseudo-code designed for the reaction-diffusion stereo algorithm originally proposed in [14]. Later, Section 5.1. shows a simple demonstration of how reaction-diffusion systems work for the continuity constraint and the uniqueness one.

for all $d\in \mathcal{D}$

for all $(i,j)\in {\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$

Compute ${C}_{d,i,j}$ from ${I}_{\text{L}}$ and ${I}_{\text{R}}$. with Eq. (1).

Set initial conditions of ${u}_{d,i,j}^{k=0}={v}_{d,i,j}^{k=0}=0$.

end for

end for

$k\leftarrow 0$

while $k<{L}_{t}/\delta t$

for all $d\in \mathcal{D}$

for all $(i,j)\in {\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$

Compute ${d}_{a}$ and ${a}_{n}$ with Eq. (15)

Compute ${u}_{d,i,j}^{k+1}$ and ${v}_{d,i,j}^{k+1}$ with Eqs. (17) and (18)

end for

end for

for all $(i,j)\in {\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$

Compute ${M}_{i,j}^{k}$. with Eq. (16)

end for

$k\leftarrow k+1$

end while

### 4.2. Integration of intensity edge information into the reaction-diffusion stereo algorithm

As mentioned above, the continuity constraint states that disparity distribution varies smoothly or neighboring points share the same disparity level on a disparity map. However, object boundaries causing depth discontinuity violate the continuity constraint. Thus, it is important to identify depth discontinuity areas from other areas satisfying the continuity constraint. If the depth discontinuity areas are preliminary known, the stereo correspondence problem becomes much easier. In general, the object boundaries or the depth discontinuity areas are unknown in advance of solving the stereo correspondence problem. Although intensity edge areas do not always correspond to depth discontinuity areas, some of them coincide with object boundaries. A practical idea to overcome the problem caused by the depth discontinuity is to utilize other visual cues such as image intensity. Previous psychological results show that the human visual system integrates several other cues into the depth perception [1], [2], [3]. Sun et al. also showed that the integration of image edge information into their belief propagation algorithm improves its performance [40]. Martin et al. proposed a contour detection algorithm [57], that can be applied to the stereo correspondence problem.

In order to prevent the continuity constraint from working across object boundaries, we consider integrating another cue of image intensity edge information into the reaction-diffusion stereo algorithm. The diffusion terms ${D}_{u}{\nabla}^{2}{u}_{d}$ and ${D}_{v}{\nabla}^{2}{v}_{d}$ in the stereo algorithm mainly control the continuity constraint. There are several choices of integrating the edge information into the algorithm; one of them is to weaken the excitatory diffusion coefficient ${D}_{u}$ in image intensity edge areas, and another one is to strengthen the inhibitory diffusion coefficient ${D}_{v}$ in the areas. This is because a larger value of ${D}_{u}$ drives wave propagation and increases its velocity, and a larger value of ${D}_{v}$ makes strong diffusion of an inhibitor distribution, and thus inhibits the propagation of the wave. Between the two choices, we take the latter one of modulating the inhibitory diffusion coefficient ${D}_{v}$ with intensity edge information. Thus, we obtain another algorithm by replacing Eq. (14) of the original reaction-diffusion algorithm with the following equation

in which the spatial distribution of ${D}_{v}(x,y)$ is obtained from combination of the intensity edge map denoted by ${\mathcal{M}}_{e}$ and an initial guess of a stereo disparity map $M(x,y,0)$. If an intensity edge exists at the point $(x,y)$, and simultaneously if the initial guess is almost uniform in a neighboring area around the point $(x,y)$, ${D}_{{v}_{max}}$ is given for ${D}_{v}(x,y)$, otherwise ${D}_{{v}_{min}}$ is given for ${D}_{v}(x,y)$, as follows:

$${D}_{v}(x,y)=\left\{\begin{array}{cc}{D}_{{v}_{max}}\hfill & \text{if}\phantom{\rule{4pt}{0ex}}(x,y)\in {\mathcal{M}}_{e}\phantom{\rule{0.277778em}{0ex}}\text{and}\phantom{\rule{0.277778em}{0ex}}\left|\nabla M(x,y,0)\right|<2\hfill \\ {D}_{{v}_{min}}\hfill & \text{otherwise}\phantom{\rule{4pt}{0ex}}\hfill \end{array}\right.,$$ | () |

in which ${D}_{{v}_{max}}>{D}_{{v}_{min}}$. In addition, ${D}_{v}(x,y)$ is diffused during a short period of time ${L}_{{d}_{t}}$ for its smoothness. The discrete system of spatially coupled FitzHugh-Nagumo elements provides the intensity edge map ${\mathcal{M}}_{e}$ [25] (recall that the discrete system can detect edges as shown in Section 3.3.). Figure 4 shows a processing diagram of the stereo algorithm.

### 4.3. Anisotropic inhibitory diffusion processes depending on pre-specified orientation

Inspired by the anisotropy observed in the human stereo depth perception, we try to introduce anisotropy into an inhibitory diffusion coefficient of the reaction-diffusion stereo algorithm. We pre-specify horizontal or vertical orientation ($\phi =0,\pi /2$ radian) as the anisotropy. According to the formulation proposed by Shoji et al. for modeling strip patterns self-organized on fish skins [58], the coefficient is spatially modulated with difference between $\phi $ and $\theta =arctan({\partial}_{y}{v}_{d}/{\partial}_{x}{v}_{d})$, in which ${\partial}_{x}=\partial /\partial x$ and ${\partial}_{y}=\partial /\partial y$; $\theta $ denotes a gradient direction of the inhibitor distribution. If $\theta $ coincides with $\phi $, the diffusion is strengthened at the direction; as the difference between $\theta $ and $\phi $ becomes larger, the diffusion is weakened. Thus, we obtain the anisotropic reaction-diffusion stereo algorithm by replacing Eq. (14) of the original algorithm with the following equation:

$$\begin{array}{ccc}\hfill {\partial}_{t}{v}_{d}& =& {D}_{v}\nabla \xb7\left[A\left(\theta \right)\nabla {v}_{d}\right]+({u}_{d}-b{v}_{d}),\hfill \\ \hfill A\left(\theta \right)& =& 1/\sqrt{1-\rho cos(2\theta -2\phi )},\hfill \end{array}$$ | () |

in which $\rho $ indicates strength of the anisotropy ($0\le \rho <1$); if $\rho =0$, Eq. (22) becomes $A\left(\theta \right)=1.0$, and thus Eq. (21) returns to the original isotropic Eq. (14). Expansion of the first term in the right side of Eq. (21) derives

$$\nabla \xb7\left[A\left(\theta \right)\nabla {v}_{d}\right]=\underset{\text{Horizontal}}{\underbrace{{\partial}_{x}A\left(\theta \right)\xb7{\partial}_{x}{v}_{d}}}+\underset{\text{Vertical}}{\underbrace{{\partial}_{y}A\left(\theta \right)\xb7{\partial}_{y}{v}_{d}}}+\underset{\text{Isotropic}}{\underbrace{A\left(\theta \right){\nabla}^{2}{v}_{d}}},$$ | () |

which consists of the horizontal and vertical advection terms, of which the velocity is $-\nabla A\left(\theta \right)$, in addition to the isotropic diffusion term. Thus, in accordance with the psychological hypothesis proposed by Rogers and Graham [4] and its evidence shown by Ichikawa [5], the anisotropic reaction-diffusion stereo algorithm has two processes of horizontal and vertical advection terms that propagate information of existence of its associated disparity level.

### 4.4. Cooperative algorithm revised with a reaction-diffusion equation

As described in Section 4.1., the original reaction-diffusion stereo algorithm satisfies the continuity constraint and the uniqueness one presented in the original cooperative algorithm. Let us focus on the FitzHugh-Nagumo type ordinary differential equations of Eqs. (7) and (8). Under the condition of a fixed $v=0$, the ordinary differential equation $\text{d}u/\text{d}t=f(u,0)=\left[u\right(u-a\left)\right(1-u\left)\right]/\epsilon $ behaves as a switching element having a threshold level $a$. This is because the solution $u\left(t\right)$ increases as time proceeds and finally converges to $u=1$ for the initial condition $u\left(0\right)>a$, and decreases and finally converges to $u=0$ for $u\left(0\right)<a$. Thus, the reaction term $f(u,0)=\left[u\right(u-a\left)\right(1-v\left)\right]/\epsilon $ can qualitatively replace the step function $\sigma (S,T)$ of Eq. (3), in which $S$ corresponds to $u$ and $T$ does to $a$. These correspondences between the step function $\sigma (S,T)$ and the behavior of the FitzHugh-Nagumo equations with the fixed $v=0$ bring a revised version of the cooperative algorithm proposed by Marr and Poggio, as follows:

$${\partial}_{t}{u}_{d}={D}_{u}{\nabla}^{2}{u}_{d}+{u}_{d}({u}_{d}-{a}_{d})(1-{u}_{d})/\epsilon +\mu {C}_{d}(x,y),$$ | () |

in which ${a}_{d}$ is obtained by Eq. (15), other parameters ${D}_{u},\epsilon ,\mu $ are constants, and ${C}_{d}(x,y)$ denotes a similarity measure. The diffusive coupling ${D}_{u}{\nabla}^{2}$ in Eq. (24) works for the continuity constraint described with the term ${\sum}_{\Omega}{s}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ in Eq. (3); the threshold level ${a}_{d}$ in Eq. (24) works for the uniqueness constraint described with $\xi {\sum}_{\Theta}{s}_{{d}^{\text{'}},{i}^{\text{'}},{j}^{\text{'}}}^{k}$ and the fixed threshold level $T$ in Eq. (3). By comparing the revised cooperative algorithm of Eq. (24) and the reaction-diffusion stereo algorithm of Eqs. (13) and (14), we can also understand that the original reaction-diffusion stereo algorithm is an extended version of the cooperative algorithm, and the main difference between the two algorithms is the existence of the inhibitory distribution ${v}_{d}$ described with Eq. (14). Section 5.2. demonstrates the effect of the inhibitory distribution.

## 5. Experimental results and discussion

### 5.1. Demonstration for a simple stereo image pair

Algorithm | Equations | Parameter settings |

COR5 | Eqs. (1), (2) | – |

M&P | Eqs. (24), (15) | – |

RDSA(org) | Eqs. (13), (14), (15), (16) | ${D}_{v}=3.0$ |

RDSA(edge) | Eqs. (13), (19), (20), (15), (16) | ${D}_{{v}_{max}}=15.0,{D}_{{v}_{min}}=0.5,{L}_{{d}_{t}}=10.0$ |

RDSA(aniso-H) | 245mmEqs. (13), (21), (22), (15), (16) | ${D}_{v}=2.0,\phi =0,\rho =0.9$ |

RDSA(aniso-V) | ${D}_{v}=2.0,\phi =\pi /2,\rho =0.9$ |

This section presents how the original reaction-diffusion stereo algorithm works for a simple situation, in which there are two possible disparity levels $d=0,1$ pixel. Let us suppose that a distribution ${C}_{d,i,j}$ obtained by a similarity measure has high values at three points and zero at other points; the two of the three points have the high values of ${C}_{d}$ at the disparity level $d=0$ pixel, and the reminder has the high value in the disparity level $d=1$ pixel; the three points located at different positions. Figure 5 shows snap shots of ${u}_{0}$ and ${u}_{1}$ obtained by the stereo algorithm at two different time instances $t=1.0,4.0$. During the early period of the algorithm, for example, at the time instance $t=1.0$, waves originated from the three points began to extend outward as shown in Figs. 5(a-1)$\sim $5(a-4). After that, on the distribution of ${u}_{0}$ the two circular waves collided and became one, which occupied a large continuous area as shown in Figs. 5(b-1) and 5(b-2). In contrast to this, on the distribution of ${u}_{1}$ the wave originated from the point in ${u}_{1}$ exclusively collided with the waves originated from the two points on ${u}_{0}$. As the result, the wave on ${u}_{1}$ did not extend beyond the collision boundary. The former result shown in Figs. 5(b-1) and 5(b-2) denotes that the continuity constraint indeed works, and the latter result shown in Figs. 5(b-3) and 5(b-4) denotes that the uniqueness constraint does.

### 5.2. Results for Middlebury stereo image pairs

The Middlebury stereo vision page [18], [19] provides four stereo image pairs named TSUKUBA, VENUS, TEDDY and CONES as well as their ground truth data of stereo disparity maps, and definitions of performance evaluation areas such as nonocclusion areas (nonocc.), all areas (all) and depth discontinuity areas (disc.). The page also provides ranking tables of stereo algorithms with respect to bad-match-percentage error measure, in addition to an evaluation system for submitted stereo disparity maps. The bad-match-percentage error measure BMP${}_{T}$ evaluates an obtained stereo disparity map ${M}_{i,j}$ with the ground truth disparity map ${G}_{i,j}$, as follows:

$${\text{BMP}}_{T}=\frac{1}{\left|\mathcal{E}\right|}\sum _{(i,j)\in \mathcal{E}}\sigma \left(|{M}_{i,j}-{G}_{i,j}|,T\right)\times 100\phantom{\rule{0.277778em}{0ex}}(\%),$$ | () |

in which $\mathcal{E}\in \{\text{nonocc.},\text{all},\text{disc.}\}$ and $\left|\mathcal{E}\right|$ denotes the number of pixels in the area $\mathcal{E}$; $\sigma (\xb7,T)$ denotes the step function and $T$ (pixel) denotes a threshold level for judgment of a bad-match pixel or a correct one; we chose $T=0.5,1.0$ pixel in this section. In addition to the error measure, this section also utilized the root-mean-square error measure denoted by RMS and defined by

$$\text{RMS}=\sqrt{\frac{1}{\left|\mathcal{E}\right|}{\sum}_{(i,j)\in \mathcal{E}}{\left({M}_{i,j}-{G}_{i,j}\right)}^{2}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\text{(pixel)}.$$ | () |

We applied the algorithms presented in this chapter to the four stereo image pairs and evaluated obtained disparity maps with the error measures of BMP${}_{1.0}$, BMP${}_{0.5}$ and RMS. Table 1 summarizes the algorithms utilized here; COR5 provides a stereo disparity map with the only similarity measure, M&P is the cooperative algorithm revised with a reaction-diffusion equation, RDSAs are the reaction-diffusion stereo algorithms, in which RDSA(org) is the original algorithm, RDSA(edge) is the algorithm integrating intensity edge information, and RDSA(aniso-H) and RDSA(aniso-V) are the algorithms with anisotropic diffusion processes depending on the pre-specified orientation. We fixed their parameter settings across the four stereo image pairs except for the image size of ${\mathcal{L}}_{x}\times {\mathcal{L}}_{y}$ and the possible disparity levels $\mathcal{D}$, which were provided as the Middlebury stereo vision page indicates.

Table ▭ shows evaluation results and average ranks of the algorithms, and Figs. ▭ and ▭ show stereo disparity maps and their absolute error distributions. From these results, we can state the following. On each of the algorithms: RDSAs, average ranks evaluated with BMP${}_{0.5}$ were better than those with BMP${}_{1.0}$. The algorithms RDSAs were better than M&P in most cases excluding for TSUKUBA and including average ranks. Among RDSAs, average ranks were almost similar; although there was no distinguishable difference on performance among RDSAs, RDSA(edge) was slightly better than RDSA(org) and RDSA(aniso-V), and RDSA(edge) indicated similar error measures with RDSA(aniso-H) according to the sum of average ranks obtained with BMP${}_{1.0}$ and BMP${}_{0.5}$. When focusing on the results for VENUS, in particular, the results for the depth discontinuity areas (disc.), RDSA(edge) was quite effective in comparison with other RDSAs. This can be also easily confirmed with the disparity maps and their error distributions shown in Fig. ▭(b). In most cases, the results with COR5 denoted that the error measures evaluated in the areas of disc. were less than those evaluated in the areas of nonocc. and all, except for the results of CONES with BMP${}_{1.0}$ and BMP${}_{0.5}$. This implies that the depth discontinuity areas bring rather reliable disparity information than other nonocclusion areas do in COR5; Fig. ▭ supports this implication. We consider that this is because there is rather rich information of image intensity distribution around depth discontinuity areas. In contrast to this, the results for M&P and RDSAs indicated the worst error measures for the depth discontinuity areas.

Figure ▭ shows dependence of error measures on the inhibitory diffusion coefficient ${D}_{v}$ in RDSA(org). Let us focus on the dependence confirmed for the depth discontinuity areas (disc.). For all of the four image pairs TSUKUBA, VENUS, TEDDY and CONES, the error measures achieved the least values at certain positive values ${D}_{v}>0$; for example, for the image pair TSUKUBA, the error measure decreased as ${D}_{v}$ increased and achieved the least value at ${D}_{v}=3.0$; for other image pairs, they indicated similar trends.

Figure ▭ shows a representative example in which RDSA(edge) works in comparison to other algorithms. The example shows a small rectangular area capturing a coffee cup and chopsticks in the left image of CONES, and stereo disparity maps obtained in the area. An edge detection result shown in Fig. ▭(a-4) has almost completely detected object boundaries of the chopsticks. Although the result by RDSA(edge) was not perfect as shown in Fig. ▭(d), it was better than the results by the other algorithms of RDSA(org) and RDSA(aniso-V). Since there exists depth discontinuity along the chopsticks standing almost vertically, the algorithm RDSA(aniso-H) causing strong inhibitory diffusion in horizontal orientation worked better than RDSA(aniso-V) and provided a result similar to that of RDSA(edge).

Figure ▭ shows temporal changes of error measures evaluated during processes of stereo disparity detection in the stereo algorithms RDSA(org), RDSA(aniso-H) and RDSA(aniso-V). In the early period less than $t=10$, the error measures dynamically changed; after that, they changed slowly and achieved almost constant at $t=100$. From these results, we can roughly state that the algorithms converged.

### 5.3. Discussion

Each of the reaction-diffusion systems utilized in the original reaction-diffusion stereo algorithm has diffusion terms ${D}_{u}{\nabla}^{2}{u}_{d}$ and ${D}_{v}{\nabla}^{2}{v}_{d}$ on excitation and inhibition. The dependence of the inhibitory diffusion coefficient ${D}_{v}$ on performance shows that the strong inhibitory diffusion compared with the weak excitatory one improves performance in depth discontinuity areas. In addition, the reaction-diffusion stereo algorithm having the inhibitory distribution works better than the cooperative one revised with a single reaction-diffusion equation having only excitatory distribution. These results indicate the importance of the strong inhibitory diffusion, that is, the long-range inhibition. If turning our attention to an edge detection algorithm proposed by Marr and Hildreth [59], we can find that the algorithm has a long-range inhibition and a short-range excitation. Thus, we hypothesize that the long-range inhibition is an important factor and underlies visual functions including edge detection and stereo disparity detection.

In low curvature areas found along straight lines of depth discontinuity areas (see Fig. ▭), we can not expect the effect of the strong-inhibitory diffusion. In addition, previous psychological results implied that the human depth perception integrates several visual cues [1], [2], [3]. From these two reasons, we designed the reaction-diffusion stereo algorithm integrating intensity edge information so as to work in the depth discontinuity areas, if the intensity edge areas coincide with the depth discontinuity areas. Figure ▭ showed an example of a successful situation, in which the integration of intensity edge information compensated for difficulties in the depth discontinuity areas. For more performance improvement, it is necessary to integrate not an edge map, but a contour map defining object boundaries into the algorithm. The study by Martin et al. [57] should be helpful for obtaining a contour map.

Image processing and computer vision algorithms utilizing diffusion equations require estimating stopping time [60]; this is called the termination problem [56]. Reaction-diffusion algorithms with a nonlinear reaction such as the FitzHugh-Nagumo type do not require solving the termination problem, if they can spend sufficient duration of time for their processing, as shown in Figs. ▭(a)$\sim $(d).

Between the two algorithms RDSA(aniso-H) and RDSA(aniso-V) of which diffusion coefficients depend on pre-specified orientation, we could not confirm distinguishable difference on their convergence trends, as shown in Figs. ▭(e)$\sim $(h). According to previous results of psychological experiments [4], [5], the human stereo depth perception indicates anisotropy on latency. In order to confirm the correctness of the idea introducing anisotropic diffusion processes, we need to apply the algorithms to random-dot stereograms utilized in the psychological experiments and confirm their convergence. At this moment, our experimental results are insufficient to mention the correctness of the idea.

There are two main future research topics. One of them is intended for the occlusion problem. As shown in Fig. ▭, the results obtained for TEDDY and CONES indicate that occluded areas have much error or large deviation from the true disparity level. In order to solve the occlusion problem, we believe that the idea of bi-directional matching is helpful also in the reaction-diffusion stereo algorithm. That is, by feeding the result of the bi-directional matching to the iteration process of the reaction-diffusion systems, we build a dynamical system with feedback and try to solve the occlusion problem. This feedback system may furthermore bring a hint for building a strong fusion model proposed as the human depth perception integrating several visual cues. The other future research topic is accuracy improvement for slanted surfaces. The reaction-diffusion stereo algorithms presented here imposed the continuity constraint on each reaction-diffusion system associated with a disparity level; the continuity constraint does not work across more than two disparity levels. Thus, for slanted surfaces, we need to impose the continuity constraint on the surfaces. We believe that the imposition of the constraint on the slanted surfaces is possible. In addition to the accuracy improvement, the imposition may bring a hint for our understanding the anisotropy of the human stereo depth perception.

## 6. Conclusion

This chapter presented a reaction-diffusion stereo algorithm [14] solving the stereo correspondence problem with multiple reaction-diffusion systems. The algorithm converts the stereo correspondence problem into a segmentation problem with respect to stereo disparity levels. Each of the reaction-diffusion systems is associated with a particular disparity level. A set of FitzHugh-Nagumo type reaction-diffusion equations describes each of the reaction-diffusion systems. Depending on the parameter settings of the set, it behaves as a mono-stable system, a bi-stable system and an oscillatory system. In order to govern existence or non-existence of a stereo disparity level associated with a reaction-diffusion system, the algorithm utilizes bi-stable reaction-diffusion systems. The bi-stable systems have two stable steady states, of which the excited state denotes the existence, and of which the resting state denotes the non-existence. Since the bi-stable systems have the nature of driving wave propagation, they extend their associated areas with the nature and fill-in undefined areas of stereo disparity. This filling-in process works as the continuity constraint presented by Marr and Poggio. For the uniqueness constraint assuming only one disparity level at a particular point, the algorithm exclusively linked the multiple reaction-diffusion systems with mutual inhibition mechanism. In addition to these constraints, the reaction-diffusion algorithm has the self-inhibition mechanism caused by rapid inhibitory diffusion. The mechanism contributes to preserving detailed depth structure or high curvature areas such as areas of corner points in a disparity map.

Besides the original reaction-diffusion stereo algorithm, this chapter presented two additional stereo algorithms with anisotropic diffusion processes. One of the algorithms integrated the intensity edge information, which is provided by another edge detection algorithm designed with discretely coupled nonlinear elements [25]; the stereo algorithm was inspired by the human depth perception integrating several visual cues. The other algorithm introduced anisotropic diffusion processes depending on pre-specified horizontal or vertical orientation. The human visual system differently perceives a horizontally slanted surface and a vertically slanted one; this anisotropy inspired the stereo algorithm with anisotropic diffusion processes depending on pre-specified orientation.

The experimental section demonstrated performance of the reaction-diffusion stereo algorithms as well as a cooperative algorithm revised with a reaction-diffusion equation. Experimental results showed that the reaction-diffusion stereo algorithms perform better than the cooperative algorithm, and the original reaction-diffusion stereo algorithm works better with larger inhibitory diffusion coefficients especially in depth discontinuity areas. Thus, we reached the conclusion that the strong inhibitory diffusion or the strong inhibitory coupling is an important condition for the algorithms, and the strong inhibitory coupling underlies visual functions; a previous edge detection algorithm [59] and biological evidence in vision [52] also support the conclusion. Integrating image intensity edge information is effective, if intensity edge areas coincide with depth discontinuity areas. This also implies that the algorithm achieves better, if object contour information is available; a future research topic exists in how to obtain the reliable contour information. For the algorithms with anisotropic diffusion processes depending on horizontal or vertical orientation, experimental results demonstrated convergence of the algorithms for four stereo image pairs. However, there was no distinguishable difference on the convergence between the two algorithms. We need to confirm the convergence by applying the algorithms to random-dot stereograms utilized in psychological experiments. In addition to these future topics, development of the depth perception model integrating several visual cues is also an interesting topic, for which the previous strong and weak fusion models may provide hints.

## Acknowledgment

The present study was supported in part by the Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (No. 20500206, No. 23500278).