## 1. Introduction

Environment mapping is considered an essential skill for a mobile robot in order to actually reach autonomy [1]. The robotic mapping can be defined as the process of acquiring a spatial model of the environment through sensory information. The environment map allows mobile robots to interact coherently with objects and people in this environment. The robot can safely navigate, identify surrounding objects and have flexibility to dealing with unexpected situations. Without a map some important operations could be complex as the determination of objects position in the surroundings of the robot and the definition of the path to be followed. These issues involve the importance of the mapping task be performed correctly, since the acquisition of inaccurate maps can lead to errors in the inference of correct positioning of the robot, resulting in an imperfect implementation of these operations. Therefore there is a mutual dependence between inferring the exact localization of the robot and building an accurate map.

There are several researches done in robotics mapping proposing ways to represent a mapped environment, all of them concerned in dealing with high dimensional mapped environment. The work of Agelopoulu et al. [2] provides a good overview on articles published at the very early period of research about mapping and it reports the several different ways of representing the environment, focusing on the main features of each approach. Thrun [1] proposes a classification of the ways of representing an environment mapped into two main categories: metric and topological representations. The metric representations store geometric properties of the environment, whereas the topological representations describe connectivity between different places. Within the metric representations, the occupancy grid stands out by providing a relatively accurate reproduction of the mapped environment.

Robots can be used to map internal or external environments depending on the task type supported by them. Lee et. al. [3], for example, uses a mobile robot equipped with sonar to build a features-based map. The robot finds points, lines and circles in the environment through processing of the information provided by sonar. Dealing with external environments, Guivant et. al. [4] has implemented a mobile vehicle to map an environment typical of a farm. The map built by the vehicle was composed of typical objects of the environment, as trees and stakes. However, the majority of the works dealing with the robotics mapping theme do not discuss a generalist alternative to provide the mapping of both internal and external environments. The difficulty is in the detection of characteristics inherent in all kind of environments. It is well known that internal environments are more structured, so that the vast majority of them have common cues, for example, lines, nooks and corners. External terrain mapping depends on the objects that can be highlighted in them (transit cards, buildings, trees, and rocks, among others).

Several types of sensors can be used for carrying out the mapping. The most common are sonar, lasers and cameras. The sonar is attractive because of its low cost. Besides being relatively inexpensive they can be easily found in a commercial market. However, sonar presents significant inaccuracies in the measurements acquired. Lasers are highly accurate and provide the acquisition of detailed maps but they are not attractive because of its high cost. Cameras are sensors that, at each day, are getting cheaper and they make possible the acquisition of a large amount of information surrounding the robot. For these reasons, cameras are playing an important role among the sensors used for robotic mapping.

This chapter proposes the mapping of internal and/or external environments through a system of stereo cameras of low-cost with metric representation of the environmental in a probabilistic occupancy grid. With the stereo vision system the robot can collect information from different places with different types of obstacle, and it does not dependent on the type of environment in which it is located or the type of features inside this place. The mapping algorithm considers a probabilistic modeling for the vision system used by the robot, as well as to its performed movements. With this, the 3D Probabilistic Occupancy Grid to Robotic Mapping with Stereo Vision generates results (maps) consistent with the information obtained by the robot. To attest to the feasibility proposed by that research will be presented preliminary experiments performed.

The article is structured as follows: section 2 presents the problem of environment mapping with robots. The section introduce the occupancy grid mapping. Section 3 contextualizes the same problem with the use of visual sensors (cameras). In section 4 will be presented the proposed mapping with stereo vision and occupation grid representation. Section 5 expose the preliminary results achieved with the mapping algorithm. Finally, the section 6 brings the concluding remarks, directing to the next steps that will be given toward the full implementation of the system.

## 2. Robotic mapping

To better formalize the problem of robotic mapping, we need to establish some basic assumptions (or restrictions). The first hypothesis is that the robot has proprioceptive and exteroceptive sensors that allow collect data about yourself (position and orientation) and about the environment. The second is that perception system must always provide the exact knowledge of the position and orientation of the robot relative to any coordinate system or frame of global fixed reference adopted.

With these assumptions, we can define the robotic mapping as the problem of building a space model of the environment by the robotic computational system, based on data provided by the perception system. The Figure ▭ illustrates the evolution of the process of building a map. In this illustration, $x$ represents the robot pose (position and orientation), $z$ represents the sensory measurements collected over time, and $m$ represents the map being built iteratively at each time interval.

The hypothesis that the perception system has the exact knowledge of the robot position and robot orientation relative to some global reference adopted is not always true. And in most cases to get the exact position and orientation of the robot inside of your environment is necessary to have a map. Here a conflict arises: to get a consistent map is required exact knowledge of the robot position and orientation, and for this, it is necessary an environment map. Note that there is a dependency between location (inferring the position and orientation of the robot in the environment) and mapping. To resolve this conflict some researchers extend the mapping problem to a simultaneous localization and mapping problem or simply SLAM [5].

From this point, we assume as true the hypothesis that the robot position and orientation are known somehow. Then specifically we approach the mapping problem with known position and known orientation, as proposed by Thrun [6]. It should be noted that even assuming this assumption, the following challenges need to be overcome [1]: sensory inaccuracies, dimensionality of environment, data matching, dynamism of the environment and operating strategy. Then each of these challenges will be detailed.

### 2.1. Sensory uncertainties

The sensors used for robotic mapping are influenced by several sources of noise that cause errors in his measurements. An important issue in the mapping environments is how to deal with the uncertainties and inaccuracies found in the data provided by these sensors. Inaccuracies can be caused due to many factors intrinsic to the type of device used. In addition, there are also the errors inherent in the robot movements within its environment, which can be caused by systemic factors, such as the uncertainties present in parameters that are part kinematic modeling of the robot (for example, differences in the size of the wheels, erroneous measurement axes) and/or by non systematic factors as uncertainties from unexpected situations (for example, collision with unexpected objects, wheel slip) [7].

The challenge occurs depending on the type of the noise in measurements, in the other words, to model the robotic mapping problem would be relatively easy if the noise in different measurements was statistically independent. However, there is a statistical dependency that occurs because the errors inherent in the robot movements accumulate over time, and affect how the sensory measurements are interpreted. To overcome this challenge, generally the error sources are modeled or approximated by stochastic functions, allowing the sensory data are handled properly during the mapping process. When these factors are disregarded, we are likely to witness the construction of inconsistent and inaccurate maps.

### 2.2. Dimensionality

Another challenge that arises in the mapping problem refers to the dimensionality of the environment to be mapped. Imagine an environment typically simple, such as a house with rooms, corridors and kitchen. If it is considered just a topological description of compartments will require a few variables to describe this environment. However, when the goal is to get rich maps in detail with two (2D) or three (3D) dimensions, the complexity to estimate this map is larger and, in addition, it is necessary to increase the space for storing its representation in the computer memory.

Furthermore, cannot forget the larger and more complex the environment, higher is the probability of some kind of error in the robot movements, which can prejudice the quality of built map.

### 2.3. Matching

During the mapping, normally a same object or obstacle in the environment is perceived several times by the robot, in different moments. Therefore is desirable that this object should be identified as mapped and must be distinguished from those not yet mapped. This problem is known as matching or data association.

The lack of data association in the mapping can generate inconsistencies and differences in maps. An effective scheme of matching must make distinctions between spurious measurements, new measurements and lost measurements together with a basic function to associate the available map with the new measurements. A study of the techniques more used to solve the matching problem can be found in the work of Wijesoma et al. [8], where are presented the general idea of solutions and their respective original references.

A method widely used in matching is the Nearest Neighbor - NN [9]. This method associates a map point to the nearest observation inside a region of validation, based on some distance, which is usually given by the Mahalonobis distance. This method is attractive because the implementation is easy, however it gives poor results. Other robust methods of solving the matching problem can be checked in the literature, for example, the Joint Probabilistic Data Association (JPDA) [10], Joint Compatibility Branch and Bound (JCBB) [11], Multi Hypotheses Tracking (MHT) [12] and "lazy" data association method (a variant of MHT) [13].

### 2.4. Dynamism of the environment

Another challenge arises when the interest is in mapping dynamic environments or not stationary, such as offices where people travel constantly and manufactures where objects are transported to different places. In these cases, the robotic system can consider such changes as inconsistent measurements taken at a given time. Think about the case where, in a given moment, a robot maps a table in your environment. Then, for some reason, this table is moved to another location at a later time, and this change of location, which should be changed also on a robot map, does not occur. With this, when the robot go through the place where the table was previously may seem that the robot is in a new location not yet mapped, because the object that should be detected, it is not.

The vast majority of mapping algorithms considers that the process is running in static environments, and this makes the mapping problem of dynamic environments be largely unexplored. However, in recent years, several proposals have emerged to solve this problem. For example, Biswas et al. [14] proposes a mapping algorithm in the occupation grid called Robot Object Mapping Algorithm - ROMA, able to model not stationary environments. The approach assumes that objects move slowly in the environment that can be considered static in relation to time it takes to build a map in occupation grid. The proposed algorithm is able to identify such moving objects, learn the model of them and determine their location at any time. It also estimates the total number of distinct objects in the environment, making the approach applicable to situations where not all objects not stationary are visible at all times. The changes in the environment are detected by a simple technique of differentiation of maps.

Wolf and colleague [15] propose an approach based on differentiation of maps, however the differentiation is made between a map of occupancy grid with static parts, and a map of occupancy grid with dynamic parts. The algorithm proposed is able to detect dynamic objects in the environment and represent them on a map, even when they are outside the robot perceptual field. In a recent work, Baig and collaborators [16] propose the detection of dynamic objects in the mapping process of external environments through the method Detection And Tracking of Moving Objects (DATMO), from a vehicle equipped with laser and odometry.

### 2.5. Exploration strategy

During the mapping, the robotic system should choose certain paths to be followed. Given that the robot has some knowledge about the world, the central question is: where he should move to get new information? This is done by operating strategy that determines the displacement that the robot should perform to meet all environment. At the process end we have a visited environment map produced from the information provided by the robot sensors.

The choice and implementation of good operating strategy emerges as a fifth challenge for the mapping problem. The exploration assumes that the robot position and robot orientation are precisely known and focuses on bring the robot with efficiency throughout the environment to build the full map [17]. For this, the exploration strategy should consider a model of partially built environment, and from this to plan the movement action to be performed. Stachniss [17] describes the exploration strategy as a combination between the mapping and the planning.

It is important that it be considered the exploration strategy efficiency to avoid unnecessary spending of time and energy. In addition, it is necessary that the robot is able to deal with unexpected situations that may arise during the operation and building of the map. A classic technique of exploration is proposed by Yamauchi [18], which is based on the concept of frontiers (frontier-based exploration), that are regions forming limits between free spaces and unexplored spaces. When the robot moves toward a new frontier, it can understand unexplored spaces and add new information to your map. As result the mapped territory expands, retreating the limits between known regions and unknown regions. Leaving for successive frontiers, the robot can constantly increase your knowledge of the world. Silva Júnior [19], presents the idea of exploitation based on boundary value problems. Stachniss [17] presents a summary of several exploration techniques developed, comparing your proposal based on coverage maps. In this strategy, the robot exploration actions are selected to provide a reduction of uncertainties that affect the mapping.

### 2.6. Representation types

There are two main approaches to represent an environment using a mobile robot: the representation based in topological maps and the representation using metric maps [1]. However, some authors prefer change this classification increasing another paradigm called features maps representation [20] [21]. This category, by storing metrics information of the notable objects (features) of the robot environment, is treated here as a metric maps subset. The following are main peculiarities of these paradigms.

#### 2.6.1. Topological maps

Topological maps are computationally represented by the graphs, which describes an arrangement of nodes (or vertices) connected by edges (links or arcs). Typically the graphs describe the free spaces for performing tasks. The nodes correspond to regions that have homogeneous sensory information and the arcs reflect the connection between these regions. Intuitively the use of a graph to describe an environment is a great idea because graph is a compact structure for storage in memory of the robot computational system and can be used to model structures or enumerate processes, such as representation of cities connected by roads, connections of the printed circuit board, and others. The main problem of this representation is the lack of a standard that define which structures are associated with the vertices and which relationships are described as links.

Still, the robot location using topological maps is abstract, this means that, there is no way to define explicitly the robot position and robot orientation. However, it is possible to affirm in which graph node or in which environment regions it is.

The lack of standardization of which elements are considered nodes and edges of graphs is easily seen. For example, Kuipers and colleague [22] uses map nodes to represent places, characterized by sensory data, and the edges represent paths between places, characterized by control strategies. Thrun [23] uses a topological map obtained from a probabilistic occupancy grid partitioned into regions (nodes) separated by close passages (edges). In a recent work [24], the definitions of nodes and edges are made from a topology extraction method based on Generalized Voronoi Diagram, that make the skeletonisation of the images provided by a laser, to produce appropriate topological information.

#### 2.6.2. Metric maps

The metric representation or metric maps reproduce, with a certain accuracy degree, the geometry of the environment in which the robot is inserted. Objects such as walls, obstacles and passages are easily identified in this approach, since the map maintains a good topographic relationship with the real world. The metric maps are represented by occupancy grids and features maps.

$\u2022$ *Feature-based maps*

The features maps store information of important elements founded in environment specific locations (features). A feature can be understood as "something" easily notable inside the environment such as corners and edges. In images, special properties of the some parts (such as a circle, a line, or a region with particular texture) are usually identified. Lee and colleagues [3] use a robot equipped with a sonar belt to build a features map with lines, points and circular objects. These features are differentiated according with the sonar readings processing.

This does not impede others objects, such as doors, lights, trees, buildings, towers etc, are also used as notable features to be stored in a map. In the work of Guivant [4], for example, a mobile vehicle gives the map of the typical and external environment of a farm, identifying trees as features to be stored in a map.

This map type usually stores geometric information of the chosen features and Detected, as for example, cartesian coordinates or polar coordinates of features relatives to some fixed reference. Santana and Medeiros [7] use floor lines of internal environments as interesting features to be detected. The map was built with the polar coordinates of the straight obtained from imaging by Hough transform. A great advantage of this map type is have a compact representation if compared to occupancy grid representations. However, it also has disadvantages, and the main one is the dependence of a pre-defined procedure to detect and extract environment features [17].

$\u2022$ *Occupancy Grid*

The representation using occupancy grid was initially proposed in 1987 and formalized in 1989 by Alberto Elfes [25][26]. With this representation, the continuous spaces of the environment are discretized, so that the environment is being represented in the configuration of a grid or multi-dimensional matrix (2D or 3D). Each element of the matrix, also called cell, represents a environment location that can be occupied, empty or unexplored.

The occupancy grid representation initially was proposed to map environments in a 2D grid, however, after Elfes [25], Moravec [27] expanded this representation to a 3D discrete configuration innovating also the sensor type used for the construction of the map. Unlike Elfes who employed sonar in his experiments, Moravec used a stereo vision system in the construction of a 3D map. In addition, Moravec also proposed a new approach to indicate the possibility of cell occupancy based on evidence theory of Dempster-Shafer, also differing from the Bayesian probabilistic approach proposed by Elfes. The 3D grid presented was called evidence grid and each cell stores a value that indicates the evidence of occupation.

Another approach of grid representation was presented in the work of Oriolo et al. [28]. In this work, the authors show that it is possible formulate and solve perception problems and planning problems dealing with uncertainties using the set theory (fuzzy logic). The built map by this technique is defined as a fuzzy set, in which each point is associated to a real number that quantifies the possibility of the map pertain to an obstacle. The main advantage is the possibility of using several types of fuzzy operators to model uncertainty and aggregate information from multiple sources. Ribo and Pinz [29] present a comparative study between the three approaches mentioned, showing their models to the range sensors and checking the treatment of the uncertainties presents in the measurements. However, the cited work is restricted to experiments with robots equipped with sonar inserted in typical office environments.

Other variations of these approaches can be seen in the literature. For example, Konolige [30] proposes a method that is a mathematical refinement of the mapping method presented by Elfes [25], named MUltiple Representation Independent Evidence Log - MURIEL, aiming to control the intrinsic problems of the sonar, such as reflection multiples and redundant readings. Borenstein and Koren [31] presents a method based on histograms. In this method, the main goal is to reduce the computational cost intrinsic to representation based occupancy grids. Another variation can be checked on work Yguel et al. [32]. This work reports the issues of representation and data storage by large maps and, for this, proposes a form of occupancy grid representation based on wavelets: Wavelet Occupancy Grids.

Some changes of the default algorithm presented by Elfes [25] are proposals aiming to improve the quality of built maps, considering computational cost of processing and storage [13], uncertainty treatment [33] and sensors modeling [34]. There are several recent examples in the literature that attest the utility of the representation by occupancy grid in SLAM [35] [36], with applications like surveillance [30] [37], exploration [38], rescue [39], and others tasks.

## 3. Occupancy grid mapping

The models based on occupancy grid proposed by Elfes [26] is one of the most used metric approaches. The environment is represented as a regular grid of cells, where the value of each cell encodes its state that can be free, occupied, or not mapped (undefined). The occupancy value of a cell is determined using a probabilistic approach that has as input estimated distances to objects calculated from data given by the sensors. Through a bayesian approach, it is possible to update the cell values at every time that a measure is performed. Note that a subset of the whole grid is updated each time. The resulting model can be used directly as a map of the environment in navigation tasks as path planning, obstacle avoidance, and pose estimation. Figure ▭ ilustrates the representation of a depth sensor measure in a 2D occupancy grid. Grey cells have unknown occupancy values, white cells are free and black cells are occupied. The main advantages of this method are: it is easy to construct and it can be as accurate as necessary.

The construction of a map based on occupancy grid involves the determination of the occupancy probability of each cell. Let us assume that the occupancy probabilities of neighboring cells are decoupled, that is, the probability that a cell is occupied does not affect the estimation of probability occupancy values for its neighboring cells. Besides this is not an ideal assumption specifically in the case of cells representing the same object it turns easier the implementation of the mapping algorithm without a substantial increasing in the measured error. With this assumption, the probability of occupation of a map $M$ can be factored into the product of the occupancy probability of each cell ${m}_{n}$ individually according to Equation ▭.

The construction of the map $M$ depends on the history of robot localizations ${x}_{1:t}$ and on the sensors readings ${z}_{1:t}$ performed in each localization. Updating the map is a process of repeating these readings, which can be performed from other locations with other orientations. In practice, only cells currently inside the field of view of the sensors are to be updated. The occupancy value of a cell ${m}_{n}$ is calculated by Equation ▭.

with

$${l}_{t,n}={l}_{t-1,n}+log\frac{p\left({m}_{n}\right|{x}_{t},{z}_{t})}{1-p\left({m}_{n}\right)}-log\frac{1-p\left({m}_{n}\right|{x}_{t},{z}_{t})}{p\left({m}_{n}\right)}$$ | () |

$p\left({m}_{n}\right)$ is the occupancy value of the cell ${m}_{n}$ previously to any measurement that can be attributed according to the obstacles density in the environment. The probability $p\left({m}_{n}\right|{x}_{t},{z}_{t})$ specifies the occupancy probability of cell ${m}_{n}$ conditioned to the sensor reading ${z}_{t}$ in the time $t$ and depends on the sensor used, and is named inverse sensor model. More details of the standard algorithm for occupancy grid can be found in the work of Thrun [34].

## 4. Mapping with stereo vision

### 4.1. Stereo vision

The term stereo vision is generally used in the literature to designate problems where it is necessary to recover real measures of points in a 3D scene from measures performed in their corresponding pixels in two or more images taken from different viewpoints. The determination of the 3D structure of a scene using stereo vision is a well known problem [40]. Formally, a stereo algorithm should solve two problems: the matching (pixel correspondence) and the consequent 3D geometry reconstruction.

The matching problem involves the determination of pairs of pixels, one pixel from each image in each stereo pair, corresponding to the points in the scene that have been projected to the pixels. That is, given a pixel $x$ in the first image, the matching problem is solved by determining its corresponding pixel ${x}^{\text{'}}$ in the second image, for all pixels in the first one. In principle, a search strategy has to be adopted to find the correspondences. Several solutions are proposed generally using restrictions to facilitate the matching. The epipolar geometry is one of these, that makes narrower the search space [40].

By using this restriction, given a pixel in the one image, the search for the corresponding pixel in the other image can be performed in a line thus substantially decreasing the search space (from 2D to 1D). In general, the result of the matching phase is a disparity map that has, for all corresponding pixels determined, the displacement in images coordinates of each matched pixel in the second image.

The reconstruction of 3D geometry relates to the determination of the scene 3D structures. The 3D position of a point $P$ in space can be calculated by knowing the disparity between positions (in image coordinate system) of the corresponding pair of pixels $x$ and ${x}^{\text{'}}$ given by the disparity map and by knowing the geometry of the stereo vision system. The last is specified in two matrices: $M$ (named external) and ${M}^{\text{'}}$ (internal), which are previously determined. The positions of pixels are used in the matching phase to calculated the disparity map, which is then used directly in the process.

In order to better understand the stereo reconstruction problem, let us assume the system geometry shown in Figure ▭. The system has two cameras with projection centers $O$ and ${O}^{\text{'}}$, respectively. The optical axes are perfectly parallel to each other and the two camera images are in the same plane (coplanar). The focal distance $f$ is the distance from each projection center to each image plane, assumed to be the same for both cameras. The baseline b is the distance between the projection centers. The values of $b$ and $f$ are known a priori by using some camera calibration procedure [40].

In Figure ▭, $({x}_{o}^{\text{'}},{y}_{o}^{\text{'}})$ and $({x}_{o},{y}_{o})$ are the coordinates of the pixels in each image center, that is, in the intersection of the optical axes and the image planes. A point $P$ in the scene is projected in the pixels $({x}^{\text{'}},{y}^{\text{'}})$ (left) and $(x,y)$ (right) in the images. The distance ${z}_{c}$ of the camera system to the point $P$ can be calculated by Equation ▭, in camera frame reference:

where $b$ is the baseline as said above and $d$ is the disparity between the corresponding pixels, given by $d={x}^{\text{'}}-x$. Note that there is no disparity in $y$ coordinate because the axes are parallel. From Equation 3, we can observe that ${z}_{c}$ is inversely proportional to $d$. In practical situations, $d$ is limited considering a maximum and minimum distance of the system to the scene. These distances are empirically defined considering the scene. The other coordinates of point ${P}_{c}=({x}_{c},{y}_{c},{z}_{c})$ in relation to the stereo system can be calculated by using Equations ▭ and ▭.

The coordinates of a point in the camera coordinate system ${P}_{c}$ can be transformed to world frame. To do that, one just has to know the rotation matrix and translation vector that map the camera to world frame, and use Equation ▭.

Parameters $b$, $f$ , $({x}_{o}^{\text{'}},{y}_{o}^{\text{'}})$ e $({x}_{o},{y}_{o})$ are determined through a previous calibration of the stereo system. we remark that even if the system does not follow the restrictions depicted in Figure ▭, it is possible to derive a general mathematical model for the problem. In this case, it would be more complex to recover the 3D geometry. Besides, in the calibration process it is also possible to diminish or eliminate errors caused by lens distortions and illumination.

Stereo matching The stereo matching should determine, for all pixels in one image, the pixels that are their homologous in the other image. That is, to determine all pairs of pixels, one pixel from each image, that correspond to the projections of points in the scene. Once determined the matching, disparity $d$ can be calculated as the difference between the coordinates of the corresponding pixels in each image and the depth for all points in the scene can be calculated by using triangle similarity. So determination of matching for all pixels is a fundamental step. In fact, due to occlusions and image errors, we get not completely dense matchings.

Between the several methods used for matching and disparity map calculation, in this work we performed experiments with the two suggested in the work of Scharstein and Szeliski [41]: the block-matching and the graph-cuts. The block-matching determines the correspondence for pairs of pixels with characteristics that are well discernible. Basically, this method gives as result a disparity map in four main steps. The first is a previous filtering of the images to normalize brightness and texture enhancement. The second step is the search for correspondences using the epipolar constraint. This step performs the summation of absolute differences between windows of the same size of both images to find the matching. As the third step, a posterior filtering is performed in order to eliminate false matchings. In the fourth step, the disparity map is calculated for the trustable pixels. This method is considered fast, in fact, its computational complexity depends only on the image size.

The graph-cuts algorithm treats the best matching problem as a problem of energy minimization, including two energy terms. The smoothness energy (SE) measures the smoothness of disparity between neighboring pixels, which should be as smooth as possible. The second term is data energy (ED) that measures how divergent are corresponding pixels with basis on assumed disparity. A weighted graph is constructed with vertices representing the image pixels. The labels (or terminals) are all the possible disparities (or all discrete values in the interval of disparity variation). The edge weights correspond to the energy terms above defined. The graph cuts technique is used to approximate the optimal solution, that computes corresponding disparity values (each edge of the graph) to each pixel (vertex of the graph) [42].

### 4.2. Modeling probabilistic mapping with stereo disparity

The acquisition and manipulation of images produces a uncertainty $\Delta d$ for the error in the coordinates of a pixel. This causes an error factor $\Delta {z}_{c}$ in the estimation of the coordinate ${z}_{c}$ calculated from the disparity map, named depth resolution, which is given by Equation ▭.

In order to construct the environment model using occupancy grid, it is necessary to model the used sensor generally named the inverse measurement model. We adopt a modeling that is similar to the one depicted by Andert [43], however we propose to incorporate the errors inherent to robot motion in the calculation of the probability of occupation of a cell. With this modification, we get a map that is more coherent with the sensory data provided by the robot. We can guarantee a limit for the maximum error found in the map. This is extremely important in the treatment of uncertainties encountered in the scene.

The inverse model transform the data provided by the sensors in the information contained in the map. In our case, a distance measured from a specific point in the environment indicates the probability that part of the grid is occupied or free. Distance is calculated directly from disparity. In this way, it is possible to map each point of the space seen by the robot vision system with a valid disparity value into the possible values of elements of the occupancy grid.

In the measurement model, each point of the disparity map can act as a distance sensor measured along the ray defined by the world coordinates of the pixel (in the image plane) and the projection center of the camera. The point in the image plane is given in camera coordinates by $Pc=({x}_{c},{y}_{c},{z}_{c})$ and the distance of the camera to ${P}_{c}$ is calculated by Equation ▭ with a sensorial uncertainty given by Equation ▭.

Each cell pertaining to this ray defined by the projection center and point ${P}_{w}$ also in world coordinates must have a probability of occupation updated according to the function of density of probability given by Equation ▭:

$$P\left({m}_{n}\right|{x}_{1:t},{z}_{1:t})={P}_{occ}\left(l\right)+\left(\frac{k}{\Delta {l}_{p}\sqrt{2\pi}}+0.5-{P}_{occ}\left(l\right)\right){e}^{-\frac{1}{2}{\left(\frac{l-{l}_{p}}{\Delta {l}_{p}}\right)}^{2}}$$ | () |

where

$${P}_{occ}\left(l\right)=\left\{\begin{array}{cc}{p}_{min},\hfill & \text{if}\phantom{\rule{4.pt}{0ex}}0<l\le {l}_{p}\hfill \\ 0.5,\hfill & \text{if}\phantom{\rule{4.pt}{0ex}}l>{l}_{p}\hfill \end{array}\right.$$ | () |

Our proposal is just to incorporate the uncertainty that we have with respect to the robot motion in the calculation of the occupancy probability. With this, the occupancy grid map gets more coherency with the sensory data acquired by the robot perceptual system. In this way, Equation ▭ can be modified resulting in Equation ▭:

where $\epsilon $ is a function that describe the errors of the linear movements of the robot (systematic errors) that are modeled from repetitive experiments performed previously. This function $\epsilon $ represents the degradation that the motion errors of the robot produce in the certainty that a cell is occupied or not, like in a previous work [44]. So we consider the existence of real uncertainties in the robotic systems.

## 5. Preliminary experiments

We have performed experiments in order to evaluate and decide the better stereo matching algorithm to be used for the disparity map generation. In these experiments, we use the Minoru 3D stereo vision system mounted on the top of a Pioneer 3-AT robot, as seen in Figure ▭. The implementation is done using the Computer Vision OpenCV library.

The next set of experiments is done in order to evaluate the use of stereo
disparity as input to the probabilistic approach in the occupancy grid
construction (our main proposal). The stereo setup used is the same mounted on
the Pioneer 3-AT equipped with the Minoru 3D. The OpenCV library is used for
software developments. As said, the disparity map is estimated using the
*graph-cuts* algorithm. The experiments were conducted in a building of
Federal University of Rio Grande do Norte, which mixes scenes of outdoor and
indoor environments. Figure ▭ shows one of the images
captured from an scene of a typical corridor and the Figure ▭ illustrates a scene with the appearance of the external environment.

Figure ▭ shows the result of this experiment. It is shown only one of the slices of the cells of the occupancy grid, the one that has the plane in the center of projection of the camera system parallel to the ground. We remark that the robot could perform stops to better determining some regions with more details.

From these preliminary results of the 2D mapping, the first steps were made to perform 3D mapping. In this experiment, the figure was swept in both coordinate axes so that the 3D information could be inferred. The figures below illustrate a basic experiment of this construction.

In this figure, the red polygon represents the camera and the blue cubes represent the obstacle verified by stereo image processing.

## 6. Conclusions and future works

We propose a new approach to 3D mapping of environments by a mobile robots with representation based on a probabilistic occupancy grid. Our approach considers using the errors inherent to the robot. Visual information provided by a stereo vision system is included in the modeling. This has resulted in a more robust technique where the error can be well defined and limited, what is very relevant in robotics applications. With our proposal, we have the mapping of environments actually coherent with sensory data provided by the robotic perceptual system. This is one of the main contributions of our work, besides the manner of using stereo vision for 3D mapping using occupancy grid. A 3D representation is much more reliable than a 2D one provided by using only sonar, like in our previous work [44].

As future work, we will go perform experimentation in more complex environments in order to test the coherent construction of 3D occupancy grids. A strategy based on stop points and feature detection (mainly using gradient of disparity) is also being developed in order to determine regions in which a more detailed map is necessary. Further, a more robust modeling of the errors present in the robot movements and on the image processing will be performed in order for this proposal to upgrade to a SLAM application (Simultaneous Localization and Mapping).

## 7. Acknowledgements

We thank the University of the State of the Rio Grande do Norte to help in the developing work.