Open access peer-reviewed chapter

Perception and Reality in Stereo Vision: Technological Applications

By Humberto Rosas

Submitted: November 18th 2010Reviewed: April 26th 2011Published: July 19th 2011

DOI: 10.5772/22380

Downloaded: 2276

1. Introduction

In stereo vision, eyes capture two different views of a three-dimensional object. Retinal images are fused in the brain in a way that their disparities (or parallaxes) are transformed into depth perception, yielding a three-dimensional representation of the object in the observer’s mind. The process of transforming parallax into depth perception is not entirely understood. The main question refers to the quantitative connection between these two variables, which creates a significant difference between real object and perceived image, that is, between reality and perception. In general, the theme of depth perception has been approached from different points of view.

In the first place, there is what could be called the geometric approach, because its methodology deals with relationships used in geometric optics for generating images. On this basis, several formulations were proposed for determining the vertical exaggeration perceived when aerial photographs are viewed stereoscopically (Aschenbrenner, 1952; Collins, 1981; Stone, 1951; Goodale, 1953; La Prade, 1972; 1973; 1978; Miller, 1960; Raasveldt, 1956; Yacoumelos, 1972; Yacoumelos,1973). At the end, none of these formulations has shown to be sufficiently reliable.

Another approach to depth perception in stereo vision is the psychological one. Differently from the geometric approach, observations are performed under conditions of natural vision (Norman et al, 1996; Rucker, 1977, Wagner, 1985). This methodology has the value of permitting the observer to make direct estimations of depth perception. In this manner, there is no risk of confusing perceptual lengths with real lengths. However, psychological observations on depth perception, rather than being quantitative, remain at a qualitative level.

A third approach to the study of depth perception is the physiological one. From that point, the attention centers on how the brain measures disparity form two retinal images. Qian, 1997 emphasizes the need of a better understanding of the brain function, and that any computational model must be based on real physiological data. On their turn, Backus, 2000; Backus, et al., 2001; and Chandrasekaran, et al., 2007 recognize that neural encoding of depth continues to be a mysterious subject. However, Mars, 1982, supports the belief that physiological details are not important for understanding visual perception, under certain mathematical assumptions. Consequently, he believes that physiological details do not become necessary for understanding the visual perception process.

Within the above framework of investigation, Rosas et al, 2010, have proposed a different approach to the stereo vision phenomenon. It could be called psychophysical approach because it establishes a mathematical connection between physical object and mental image, as a response to some conceptual inconsistencies showed by Rosas, 1986.

The present chapter will make special reference to mathematical relationships derived from the psychophysical approach, by taking into consideration that they may lead to innovations in the design of stereo viewing instruments. These possibilities for technological developments are described in the last part of this chapter.

2. Monocular vision

According to geometric optics, plane images are captured in the retinas and transmitted to the brain where they are projected outside to generate a mental representation of the object in space, or perceptual image. In monocular vision, the retinal image provides the brain with an exact representation of the object shape in two dimensions. As to object distance, the brain lacks geometric information enough to obtain telemetric data. Despite that, different types of pictorial cues, such as perspective, lights and shades, and logic judgments about size of familiar objects, allow the brain to make inferences concerning distance. In the absence of those cues, it becomes impossible for the brain to choose a specific location of object in space, as shown in Fig.1

Figure 1.

Perception of a plane object in monocular vision. Geometric data does not provide information enough to define the object’s location in space. Occasionally, some spatial cues might permit the observer to make reasonable inferences on distance.

3. Binocular vision

In binocular vision, a three dimensional image is obtained from two plane retinal images. In this case, depth perception is caused by the disparity (or parallax) created between the two retinal images. Experience shows that the perceived image normally does not fit the object shape but it appears deformed in depth, as illustrated in Fig. 2. The belief that we “see” the real word has led to erroneous conclusions, particularly derived from thinking that our mental perceptions are generated by intersection of optic rays. Though this methodology is valid for the real space, it shows inconsistencies regarding the perceptual space.

Regarding the perceptual space, there is a complex debate about whether it is Euclidean or not. Wagner 1985, Norman et al, 1996, propose the theory of a non-Euclidean space. They distinguish between the intrinsic structure of the perceptual space, which is Euclidean, and its extrinsic structure associated with the relationship between physical and perceived space, which is non Euclidean. Rucker, 1977, suggests that the perceived space may be elliptic or positively curved.

Figure 2.

Perception of an object in binocular vision. The three-dimensional object (green) appears deformed in depth in the perceived image (blue).

Indeed, it is not easy to obtain sound conclusions on perceptual space before establishing its precise relationship with the real space. The Cartesian formulation of Rosas et al, 2010 leads to conclude that the perceptual space is Euclidian. Something that eventually might result confusing refers to the scales of the perceptual space. In fact, its planar dimensions are linear, while its depth dimension is logarithmic.

4. Psychophysical nature of depth perception

A recent attempt to interrelate real space to perceptual space was made by Rosas et al, 2010. They developed an equation that connects real variables with perceptual ones. The whole analysis rests on the premise that parallax is the only information on depth available for the brain to build a three-dimensional model from two plane retinal images, so that parallax becomes the measure of depth perception. Other monocular and binocular cues supposed to influence depth perception, such as ordinal configural cues, perspective, occlusion, blur, vergence and accommodation (Burge et al., 2005, Landy et al., 1995; Mon-Williams et al; 2000; Allison et al., 2003), were not taken into consideration for the mathematical analysis.

In the formulations, an apostrophe placed after the symbols representing perceptual variables distinguishes them from the real ones. Experience shows that a perceptual depth interval ∆D’ increases in accordance with its corresponding parallax ∆P. All of that leads to think of a direct proportionality existing between ∆D’ and ∆P. That is,

ΔD = K ΔPE1

Where K is a constant of proportionality. The above equation automatically connects a perceptual magnitude (∆D’) with a real one (∆P). Through infinitesimal analysis, Rosas et al, 2010, arrived to the following equation:

D'=KblogDE2

Where D’ is perceptual viewing distance, K is a constant characteristic of stereoscopic vision, b is eye base, or inter-pupillary distance, and D is real viewing distance.

The procedure of finding the numerical value of K may look operationally easy. However, the problem is that perceptual magnitudes, such as D’, are not measurable physically as it is done in the real space. In order to overcome this difficulty, stereo drawings of pyramids were constructed and compared geometrically among them, in order to make reasonable estimations of depth perception. By applying this method, the value of K, for lengths given in centimeters, resulted to be 4.2. This value allows perceptual magnitudes to be solved in terms of real values. Then, the final equation is:

D'=4.2blogDE3

In psychophysical terms, D’ can be considered a visual sensation caused by the visual stimulus of D. Then, it is curious though not surprising that the above equation happens to coincide with the psychophysical law proposed by Fechner, G.T., 1889, which states that sensation response increases proportionally to the logarithm of the stimulus intensity. That is:

R=klogIE4

Where R is sensation response, I is stimulus intensity, and k is a constant characteristic of each sensorial mode, such as intensity of light, sound, smell, and weight sensation. The analogy between Eq. (1) and Eq. (2) would confirm the extension of the Fechner’ Law to the stereo vision sensorial mode, and would indicate the psychophysical nature of depth perception. In Eq. (01), value 4.2 becomes the psychophysical constant for stereo vision (K).

5. Depth exaggeration

This expression is equivalent to “vertical exaggeration” widely used to designate the ratio of vertical scale to horizontal scale of the perceived object, when aerial photographs are viewed through a stereoscope. Evidently, under the optical specifications of common stereoscopes and according to the relationships used in the obtainment of aerial photographs, the terrain appears vertically exaggerated, a reason why the expression “vertical exaggeration” was adopted for referring to this increase in vertical scale. However, the increase of vertical scale is not a general rule. It is the reason why, for cases different from aerial photographs, the expression “depth exaggeration “is preferred in the present chapter.

Rosas et al. 2007a call the attention on the need of differentiating the vertical exaggeration of the three-dimensional model generated geometrically by intersection of optic rays (geometric exaggeration), from the vertical exaggeration of the image perceived in the observer’s mind (perceptual exaggeration). The “vertical exaggeration” as it usually appears explained in books, normally refers to the geometric exaggeration (Gupta, 2003; Pandey, S., 1987). Its determination is a geometric procedure well known in photogrammetry. On the contrary, the perceptual exaggeration - central theme of this article - continues to be a controversial subject.

In fact, depth exaggeration measures the deformation an object shows in its third dimension when it is viewed stereoscopically, and therefore it will be a point of reference concerning stereo viewing instruments.

5.1. Depth exaggeration in natural stereo vision

Practically all of the mathematical formulations proposed for calculating depth exaggeration are derived from observations made under artificial stereo vision, by viewing aerial photographs with the aid of a stereoscope. The point is that this way has not led to significant conclusions regarding natural stereo vision.

Observations made under natural stereo vision have been focused from a psychological point of view. Regarding depth exaggeration, the psychological approach has arrived to the conclusion that the perceived space is increasingly compressed in depth at farther viewing distances (Norman et al, 1996, Wagner 1985,). In other terms, it means that depth exaggeration decreases with viewing distance. These observations are entirely valid, although they remain at a qualitative level, lacking of numerical results.

Figure 3.

Curve showing the variation of vertical exaggeration (or depth exaggeration) relative to viewing distance, in natural stereo vision, for an eye base of 6.5 cm. (Rosas et al, 2010).

A mathematical expression proposed by Rosas et al, 2010, provides elements for quantifying the phenomenon of depth exaggeration in natural stereo vision. The corresponding equation for depth exaggeration is:

E'=4.2bDlogDE5

Where E’ is depth exaggeration, 4.2 is the psychophysical constant K for stereo vision, b is eye base, and D is viewing distance. The graphic of this equation is showed in Fig. 3.

An exercise was done for determining the viewing distance from which a three- dimensional object is viewed with no deformation (E’=1) by a person having b = 6.5 cm. (a reasonable average for humans). Replacing values in Eq. (3), the following expression is obtained.

1=4.26.5DlogDE6

Hence,

D=45cm.E7

Then, 45 cm is the distance at which a three-dimensional object is viewed in its right shape. Rosas et al, 2010 called this distance “Realistic viewing distance”. It is remarkable that 45 cm. becomes the distance at which humans handle most of their manual tools.

The graphic of Fig. 3 shows that vertical (or depth) exaggeration decreases with viewing distance, except for extremely short distances where the convergence of eye axes is abnormal. In the same figure the correlation of the value E’=1 with its correspondent D=45cm. is graphically indicated.

It is important to make clear that depth exaggeration is a punctual feature for a specific distance, upon the basis of infinitesimal depth intervals. In the graphic of Fig.3, E’ decreases rapidly with viewing distance (D), up to the point that, for a distance of 100mt, E’= 0.00546, a value which may look apparently negligible; and even more if distances of kilometers are considered. However, this does not mean that depth perception becomes imperceptible, since at long viewing distances the observer is able to perceive great differences in depth. Though small depth intervals may be imperceptible, big depth intervals become quite significant within the whole field of vision.

In qualitative terms, the curve of Fig.3 coincides with the psychological observations made by Norman et al, 1996 and Wagner 1985, in the sense that perceptual space is progressively compressed as viewing distance increases.

5.2. Depth exaggeration in artificial stereo vision

Artificial stereo vision takes place when the observer does not see a three-dimensional object directly but through a pair of plane images. An example is the observation of aerial photographs with a stereoscope. In this case, the equation of depth exaggeration (E’) takes the following form:

E'=4.2BHlogDE8

Where 4.2 is the psychophysical constant K for stereo vision, B is camera baseline, H is camera distance (or height of flight in the case of aerial photographs), and D is viewing distance. In the use of lenses, the viewing distance corresponds to the focal length of the ocular lenses. Then, another form of Eq. (5) is:

E'=4.2BHlogfE9

The ratio B/H can also be expressed in function of the convergence angle of camera axes (α), according to the following equation:

BH=2tanα2,Hence,E'=4.2×2tanα2logfE10

In defining the characteristics of some stereoscopic instruments, such as microscopes, the use of α is preferred.

Before going into details concerning technological applications, it is worthwhile to point out the ubiquitous character of the psychophysical constant (K) that, for lengths given in centimeters, becomes equal to 4.2.

6. Technological applications

Technological applications refer to the design of instruments for obtaining a given degree of depth exaggeration, so that an object can be perceived as elongated or flattened as desired, or even with no deformation. Examples of these instruments are microscopes, telescopes, photo interpretation devices, simulators and, in general, stereoscopic media where perception is critical. Furthermore, taking into account that depth exaggeration varies with viewing distance according to a logarithmic function (Fig. 3), stereoscopic instruments can be designed to recreate artificially the perception of an object or scenery as viewed at a given distance in natural vision. In viewing instruments based on depth perception, its degree of precision is limited by the sensibility of the vision system.

In contrast to the above-mentioned instruments, there are the photogrammetric instruments that work upon the basis of measurements performed directly on real objects, virtually with any desired degree of precision. These instruments do not include depth perception data in their calculations, and therefore they are not a subject of the present chapter.

6.1. Interpretation of aerial photographs

In the observation of aerial photographs though a stereoscope, three stereo models come into play. One of them is the real model represented by the terrain being photographed. A second one is the geometric model, yielded by the intersection of optic rays according to both the geometry of rays under which photographs were taken, and to the geometry under which they are viewed. A third stereo model is the one perceived by the observer, or perceptual model (Fig.4.).

A normal experience in the interpretation of aerial photographs is that the real model is perceived vertically exaggerated. That is why the term “vertical exaggeration” was coined for referring to such a deformation in depth. However, this effect that is practically a rule in viewing aerial photographs, is not applicable to other cases of stereoscopy. For example, in natural vision the rule is just the opposite one: the model, rather than being vertically exaggerated, uses to appear flattened.

Figure 4.

Interpretation of aerial photographs with the aid of a mirror stereoscope. a) Taking overlapping photographs from two camera positions. b) Viewing aerial photographs under a stereoscope. In the whole process, three stereo models come into play: real model (R) being photographed, geometric model (G) yielded by intersection of optical rays, and perceptual model (P) represented in the mind of the observer. Variables H, B, and f, influencing E, are indicated in red.

The vertical exaggeration of the perceptual model relative to the real one is given by Eq. 6, that is:

E'=4.2BHlogfE11

Where B is camera base, H is camera distance (or height of flight), and f is focal length of the objective lenses. Note that the eye base of the observer does not influence E’.

The technological implication of this equation is that it connects a perceptual variable (E’) with H, B and f representing instrument variables. Therefore, it permits real values to be converted into perceptual ones and vice versa. For example, in photo interpretation, the interpreter can make a rapid calculation of real topographic magnitudes such as dips and slopes in function of values perceived on the relief. The procedure consists in dividing perceived values by E’.

6.2. Microscopes and telescopes

Both microscopes and telescopes function under a similar optical principle, involving two main phases: 1) Capture of images through the objective lenses, and 2) observation of them through the oculars. The following equation, resulting from combining Eqs. (6) and (7), permits the degree of depth exaggeration to be calculated in function of objective and ocular features.

E'=2tanα2×4.2logfE12

Where E’ is depth exaggeration of the instrument is convergence angle of the objective lenses, and f is focal length of the ocular lenses. As it can be seen, E’ depends on the convergence of objective lenses (α ), and on the oculars’ focal distance (f). The objectives’ focal length influences the magnification factor of the instrument but it does not affect E’. Fig.5 shows a microscope where independent variables α and f are indicated in red.

Figure 5.

Stereo microscope. Variables influencing vertical exaggeration are shown in red.

The technological conclusion is that Eq. (8) allows microscopes (and telescopes) variables to be conveniently interrelated for the observer to perceive an object with any desired depth exaggeration, between flattened and elongated, including with no deformation when E’ =1.

6.3. Stereo simulators

As explained before, in natural binocular vision, vertical exaggeration decreases with viewing distance, according to a logarithmic function. The mathematical relationships permits a scene perceived in natural vision to be recreated artificially by means of stereo images conveniently obtained and viewed.

Stereo viewing instruments for recreating reality might be applicable when objects are viewed through photographic images taken at a distance, for example by a robot, and required to be perceived as if the observer was located actually at a desired viewing distance. Another application is the production of videos that recreate the perception of a large area of land as viewed from an aircraft, in order to be implemented in flight simulators.

In addition, Eq. (3) for natural vision shows that the distance at which objects are viewed with no deformation in depth, is around 45 cm, referred to as “realistic viewing distance”. It is significant that this viewing distance becomes the one used by humans in handling most of their familiar tools. On this basis, simulators could allow the operator to perceive objects as located at the optimal viewing distance for handling tools. An example of this application is the type of instruments used in telesurgery. Both stereo camera and stereo visor can be adjusted for the surgeon to perceive the area of intervention as if he was located at the realistic viewing distance, while the surgery tools are handled robotically.

6,3.1. Flight simulators

An example of instruments utilized for recreating artificially a scene as viewed in natural vision are flight simulators. In fact, they could also be applied for simulating the operation of different types of vehicles and crafts.

In the case of an aircraft, a landing maneuver can be simulated by a three-dimensional video that recreates the perception of the terrain as it is viewed in natural vision. The procedure comprises two stages: 1) stereoscopic recording of video images, and 2) viewing video images in a stereoscopic visor. In a landing maneuver there will be a prevailing viewing distance D where the pilot concentrates his attention – for instance 500 meters - that has to be established according to experience, type of aircraft, and other particular circumstances. Points located at the established viewing distance will be focused in the center of the visual field that, in this case, becomes considerably large.

The depth exaggeration (E’N) perceived in natural stereovision, derived form Eq. (3), is:

E'N=4.2bDlogDE13

Where b is the observer’s eye base and D is the established viewing distance. On the other hand, the depth exaggeration (E’I) perceived instrumentally in the visor is given by Eq. (6)

E'V=4.2BDlogfE14

Where B is camera base and f is focal length of ocular lenses. The established viewing distance and the camera distance become the same, represented by D in both Eq. (9) and Eq. (10).

For matching natural vision with artificial vision, the key point consists in equalizing the depth exaggeration perceived in natural conditions, with the depth exaggeration perceived artificially in the visor. Therefore,

EN =EI,Hence,blogD=BlogfE15

Where b is eye base, f is focal length of the ocular lenses and D is camera distance. Variable f is valid when lenses are used in the visor. If a three-dimensional screen is used, f corresponds to the observer-screen distance (d). Hence

blogD=BlogdE16

Figure 6.

Idealized flight simulator during a landing maneuver. a) Registering images in a stereo camera. b) Viewing stereo images in a three-dimensional screen. Variables B, D and d, influencing E’, are indicated in red.

The technological conclusion is that, for producing a video that allows the observer to reproduce artificially the perception of a scene as if he was located actually at a given distance, it is necessary that b,D, B and f (or d), satisfy Eq. (11) or (12).

6.3.2. Robotic tools

They are instruments designed for manipulating tools by means of robotic hands that are controlled by visual perception. Examples of them are those used in telesurgery. In this instance, the surgeon sees the surgical tools through a pair of video images (Fig.7).

According to Eq. (6), when an object is viewed through stereo images, the depth exaggeration (E’) takes the following expression

E'=4.2BHlogfE17

or in function of α according to Eq. (7)

E'=4.2×2tanα2logfE18

Where α is convergence of camera axes, and f is focal length of the ocular lenses. For a surgeon to perceive the field of intervention in optimal conditions, as viewed under natural vision, it is necessary that E’ equals one. Then,

1=4.2×2tanα2logfE19

The technological conclusion regarding telesurgery is that viewing instruments can be arranged to make the surgeon perceive his operating field with no deformation, as viewed in natural vision at a distance of about 45 cm. To reach that purpose it is necessary that variables α (or its equivalent in terms of B and H) and f, satisfy Eq. 13.

Figure 7.

Two phases of the vision process used in telesurgery. a) Video images recorded by a stereo camera when the surgeon performs a laparoscopic prostatectomy. b) The surgeon observes the stereo images through a visor that allows him to perceive the three-dimensional effect. Variables α and f influencing depth exaggeration are indicated in red. http://i.ytimg.com/vi/ChO9CUwr_2Y/0.jpg

7. Conclusion

In stereo vision, retinal disparities are mentally converted into depth perception, in a way that the real depth magnitudes are not always reproduced accurately in the perceived image. As a result, reality is perceived deformed in depth. Experiments have shown the metric correlation between real and perceptual depth to follow a logarithmic function that happened to coincide with the Psychophysical Law of Fechner, 1889, connecting stimulus with sensation. Indeed, stimulus is associated with reality while sensation is related to perception.

The above considerations have implications concerning the geometric nature of the perceptual space. The Cartesian connection between reality and perception leads to conclude that the perceptual space is Euclidean, its third dimension being logarithmic, while its plane dimensions remain linear.

The fact that real lengths can be expressed mathematically in terms of perceptual lengths opens possibilities for technological developments, particularly in the design of stereo viewing instruments such as stereoscopes, microscopes, telescopes and stereoscopic viewers.

It is important to emphasize that the technological applications mentioned above refer to instruments in which perception is critical. This is not the case of photogrammetric instruments dealing with the external reality, where three-dimensional perception becomes only an aid for recognizing objects in space, rather than for measuring them. In general, we can say that photogrammetric instruments (analog and digital) use fundamentally linear functions, whereas perception instruments work upon the basis of logarithmic functions.

Acknowledgments

The author expresses his gratitude to Drs. Henry Villegas and Orlando Navas, from the Colombian Institute of Geology and Mining (INGEOMINAS) Bogotá, Colombia, for their scientific and logistic support, including the final revision of the manuscript. The Colombian Society of Geology assumed the publishing costs. The mathematical formulations were verified instrumentally in the laboratories of INGEOMINAS.

© 2011 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike-3.0 License, which permits use, distribution and reproduction for non-commercial purposes, provided the original is properly cited and derivative works building on this content are distributed under the same license.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Humberto Rosas (July 19th 2011). Perception and Reality in Stereo Vision: Technological Applications, Advances in Stereo Vision, Jose R.A. Torreao, IntechOpen, DOI: 10.5772/22380. Available from:

chapter statistics

2276total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Stereo Vision and Its Application to Robotic Manipulation

By Jun Takamatsu

Related Book

First chapter

Stereo Matching: From the Basis to Neuromorphic Engineering

By M. Domínguez-Morales, A. Jiménez-Fernández, R. Paz-Vicente, A. Linares-Barranco and G. Jiménez-Moreno

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us