Using Object\'s Contour and Form to Embed Recognition Capability into Industrial Robots

A great effort has been made towards the integration of object recognition capability in robotics especially in humanoids, mobile robots and advanced industrial manipulators. Industrial robots today are not equipped with this capability in its standard version, but as an option. Robot vision systems can differentiate parts by pattern matching irrespective of part orientation and location and even some manufacturers offer 3D guidance using robust vision and laser systems so that a 3D programmed point can be repeated even if the part is moved varying its rotation and orientation within the working space. Despite these developments, current industrial robots are still unable to recognise objects in a robust manner; that is, to distinguish among equally shaped objects unless and alternative method is used, for instance taking into account not only the object’s contour but also its form, which is precisely the major contribution of this chapter. How objects are recognized by humans is still an open research field. There are researchers that favour the theory of object recognition via object-models like Geons (Biederman, 1987), but other researchers agree on two types of image-based models: viewpoint dependent or viewpoint invariant. But, in general there is an agreement that humans recognise objects as established by the similarity principle –among othersof the Gestalt theory of visual perception, which states that things which share visual characteristics such as shape, size, colour, texture, value or orientation will be seen as belonging together. This principle applies to human operators; for instance, when an operator is given the task to pick up a specific object from a set of similar objects; the first approaching action will probably be guided solely by visual information clues such as shape similarity. But, if further information is given (i.e. type of surface), then a finer clustering is accomplished to identify the target object. We believe that it is possible to integrate a robust invariant object recognition capability in industrial robots following the above assumptions by using image features from the object’s contour (boundary object information) and its form (i.e. type of curvature or topographical surface information). Both features can be concatenated in order to form an invariant vector descriptor which is the input to an Artificial Neural Network (ANN) for learning and recognition purposes. In previous work, it was demonstrated the feasibility of the approach 21

x Using Object's Contour and Form to Embed Recognition Capability into Industrial Robots I. Lopez-Juarez, M. Peña-Cabrera* and A.V. Reyes-Acosta

Introduction
A great effort has been made towards the integration of object recognition capability in robotics especially in humanoids, mobile robots and advanced industrial manipulators. Industrial robots today are not equipped with this capability in its standard version, but as an option. Robot vision systems can differentiate parts by pattern matching irrespective of part orientation and location and even some manufacturers offer 3D guidance using robust vision and laser systems so that a 3D programmed point can be repeated even if the part is moved varying its rotation and orientation within the working space. Despite these developments, current industrial robots are still unable to recognise objects in a robust manner; that is, to distinguish among equally shaped objects unless and alternative method is used, for instance taking into account not only the object's contour but also its form, which is precisely the major contribution of this chapter. How objects are recognized by humans is still an open research field. There are researchers that favour the theory of object recognition via object-models like Geons (Biederman, 1987), but other researchers agree on two types of image-based models: viewpoint dependent or viewpoint invariant. But, in general there is an agreement that humans recognise objects as established by the similarity principle -among others-of the Gestalt theory of visual perception, which states that things which share visual characteristics such as shape, size, colour, texture, value or orientation will be seen as belonging together. This principle applies to human operators; for instance, when an operator is given the task to pick up a specific object from a set of similar objects; the first approaching action will probably be guided solely by visual information clues such as shape similarity. But, if further information is given (i.e. type of surface), then a finer clustering is accomplished to identify the target object. We believe that it is possible to integrate a robust invariant object recognition capability in industrial robots following the above assumptions by using image features from the object's contour (boundary object information) and its form (i.e. type of curvature or topographical surface information). Both features can be concatenated in order to form an invariant vector descriptor which is the input to an Artificial Neural Network (ANN) for learning and recognition purposes. In previous work, it was demonstrated the feasibility of the approach to learn and recognise multiple 3D working pieces using its contour from 2D images using a vector descriptor called the Boundary Object Function (BOF) (Peña-Cabrera, et al., 2005). The BOF resulted invariant with different geometrical pieces, but did not considered surface topographical information. In order to overcome this condition and to have a more robust descriptor, a methodology that includes a shape index using the Shape From Shading (SFS) method (Horn, 1970) is presented. The main idea of our approach is to concatenate both vectors, (BOF+SFS) so that not only the contour but also the object's curvature information (form) is taken into account by the ANN. The organisation of the chapter is as follows. In section 2, the related work is reviewing from the perspective of 2D-2.5D object recognition. In section 3, the original contribution is explained. Section 4, describes inspiring ideas that motivated the use of the FuzzyARTMAP ANN and a qualitative description of the network. Section 5 presents the algorithm for determining the object's contour of an object using the BOF algorithm while section 6 presents formally the SFS algorithm. Section 7 describes the robotic test bed as well as the workpieces that were used during experiments. Experimental results are provided in section 8 and finally, in section 9 conclusions and further work are given.

Related Work
Vision recognition systems must be capable of perceiving and detecting images and objects, as close as the human vision does; this fact has encouraged research activity to design artificial vision systems based on the neural morphology of the biological human vision system. Now scientists understand better about how computational neural structures and artificial vision systems must be designed following neural paradigms, mathematical models and computational architectures. When a system involves these aspects, it can be referred to as a "Neuro-Vision System" (Gupta & Knopf, 1993), which can be defined as an artificial machine with ability to see our environment and provide visual formatted information for real time applications. It has been shown by psychological and clinical studies that visual object recognition involves a large activity area on the cerebral cortex when objects are seen the first time and the region's activity is reduced when familiar objects are perceived (Gupta & Knopf, 1993). New objects can also be learned quickly if certain clues are given to the learner. Following this psychological evidence a novel architecture was designed that included information from its shape as well as its form. Some authors have contributed with techniques for invariant pattern classification using classical methods as invariant moments (Hu, 1962), or artificial intelligence techniques, as used by (Cem Yüceer & Kemal Oflazer, 1993), which describes an hybrid pattern classification system based on a pattern pre-processor and an ANN invariant to rotation, scaling and translation, (Stavros J. & Paulo Lisboa, 1992), developed a method to reduce and control the number of weights of a third order network using moment classifiers and (Shingchern D. You and G. Ford, 1994), proposed a network for invariant object recognition of objects in binary images. More recently, Montenegro uses the Hough transform to invariantly recognize rectangular objects (chocolates) including simple defects (Montenegro, 2006). This was achieved by using the polar properties of the Hough transform, which uses the Euclidian distance to classify the descriptive vector. This method showed to be robust with geometric figures, however for complex object it would require more information coming from other www.intechopen.com techniques such as histogram information or information coming from images with different illumination sources and levels. Another example is the use of the Fourier descriptor, which obtains image features through silhouettes from 3D objects (Gonzalez, 2004). Gonzalez´s method is based on the extraction of silhouettes from 3D images obtained from laser scan, which increases recognition times. Worthington studies topographical information from image intensity data in gray scale using the Shape from Shading (SFS) algorithm (Worthinghton, 2001). This information is used for object recognition. It is considered that the shape index information can be used for object recognition based on the surface curvature. Two attributes were used, one was based in low level using curvature histogram and the other is based on structural arrangement of the shape index maximal patches and its attributes in the associated region. Lowe defines a descriptor vector named SIFT (Scale Invariant Feature Transform), which is an algorithm that detects distinctive image points and calculates its descriptor based on the histograms of the orientation of key points encountered (Lowe, 2004). The extracted points are invariants to scale, rotation as well as source and illumination level changes. These points are located within a maximum and minimum of a Gaussian difference applied to the space scale. This algorithm is very efficient, but the processing time is relatively high and furthermore the working pieces have to have a rich texture.

Original work
Moment invariants are the most popular descriptors for image regions and boundary segments, but computation of moments of a 2D image involves a significant amount of multiplications and additions in a direct method. In many real-time industry applications the speed of computation is very important, the 2D moment computation is intensive and involves parallel processing, which can become the bottleneck of the system when moments are used as major features. In addition to this limitation, observing only the piece's contour is not enough to recognise an object since object with the same contour can still be confused.
In order to cope with this limitation a novel method that also includes its form (i.e. type of curvature or topographical surface information) is proposed. Both features (contour and form) are concatenated in order to form a more robust invariant vector descriptor which is the input to an Artificial Neural Network (ANN). The methodology includes a shape index using the Shape From Shading (SFS) method (Horn, 1970). The main idea of our approach is to concatenate both vectors, (BOF+SFS) so that not only the contour but also the object's curvature information (form) is taken into account by the ANN.

Inspiring ideas and ART models
Knowledge can be built either empirically or by hand as suggested by Towell and Shavlik (Towell & Shavlik, 1994). Empirical knowledge can be thought of as giving examples on how to react to certain stimuli without any explanation and hand-built knowledge, where the knowledge is acquired by only giving explanations but without examples. It was determined that in robotic systems, a suitable strategy should include a combination of both methods. Furthermore, this idea is supported by psychological evidence that suggests that theory and examples interact closely during human learning (Feldman, 1993).

www.intechopen.com
Learning in natural cognitive systems, including our own, follows a sequential process as it is demonstrated in our daily life. Events are learnt incrementally, for instance, during childhood when we start making new friends, we also learn more faces and this process continues through life. This learning is also stable because the learning of new faces does not disrupt our previous knowledge. These premises are the core for the development of connectionist models of the human brain and are supported by Psychology, Biology and Computer Sciences. Psychological studies suggest the sequential learning of events at different stages or "storage levels" termed as sensory memory (SM), short term memory (STM) and long term memory (LTM). There are different types of ANN, for this research a Fuzzy ARTMAP network is used. This network was chosen because of its incremental knowledge capabilities and stability, but mostly because of the fast recognition and geometrical classification responses. The adaptive resonance theory (ART) is a well established associative brain and competitive model introduced as a theory of the human cognitive processing developed by Stephen Grossberg at Boston University. Grossberg resumed the situations mentioned above in what he called the Stability-Plasticity Dilemma suggesting that connectionist models should be able to adaptively switch between its plastic and stable modes. That is, a system should exhibit plasticity to accommodate new information regarding unfamiliar events. But also, it should remain in a stable condition if familiar or irrelevant information is being presented. He identified the problem as due to basic properties of associative learning and lateral inhibition. An analysis of this instability, together with data of categorisation, conditioning, and attention led to the introduction of the ART model that stabilises the memory of selforganising feature maps in response to an arbitrary stream of input patterns (Carpenter & Grossberg, 1987). The core principles of this theory and how STM and LTM interact during network processes of activation, associative learning and recall were published in the scientific literature back in the 1960s. The theory has evolved in a series of real-time architectures for unsupervised learning, the ART-1 algorithm for binary input patterns (Carpenter & Grossberg, 1987). Supervised learning is also possible through ARTMAP (Carpenter & Grossberg, 1991) that uses two ART-1 modules that can be trained to learn the correspondence between input patterns and desired output classes. Different model variations have been developed to date based on the original ART-1 algorithm, ART-2, ART-2a, ART-3, Gaussian ART, EMAP, ViewNET, Fusion ARTMAP, LaminART just to mention but a few.

FuzzyARTMAP
In the Fuzzy ARTMAP (FAM) network there are two modules ART a and ART b and an inter-ART module "Map field" that controls the learning of an associative map from ART a recognition categories to ART b categories. This is illustrated in Figure 1. The Map field module also controls the match tracking of ART a vigilance parameter. A mismatch between Map field and ART a category activated by input I a and ART b category activated by input I b increases ART a vigilance by the minimum amount needed for the system to search for, and if necessary, learn a new ART a category whose prediction matches the ART b category. The search initiated by the inter-ART reset can shift attention to a novel cluster of features that can be incorporated through learning into a new ART a recognition category, which can then be linked to a new ART prediction via associative learning at the Map-field.

Fig. 1 Architecture FuzzyARTMAP
A vigilance parameter measures the difference allowed between the input data and the stored pattern. Therefore this parameter is determinant to affect the selectivity or granularity of the network prediction. For learning, the FuzzyARTMAP has 4 important factors: Vigilance in the input module ( a ), vigilance in the output module ( b ), vigilance in the Map field ( ab ) and learning rate ().

Object's contour
As mentioned earlier, the Boundary Object Function (BOF) method considers only the object's contour to recognize different objects. It is very important to obtain as accurately as possible, metric properties such as area, perimeter, centroid, and distance from the centroid to the points of the contour of the object, to obtain better results and therefore a better analysis of visual data. In this section, a detailed description of the BOF method is presented.

Metric properties
The metric properties for the algorithm used are based on distance   2 1 , P P  between two points in the plane of the image. For this measure is used the Euclidean distance. As first step, the object in the image is located, performing a pixel-level scan from left to right and top to bottom, so that if an object is higher than the others, in the image, this will be the first object found. So the first point found inside an object is the highest pixel (first criterion) and more to the left (as the second criterion).

Perimeter
The definition of perimeter is the set of points that make up the shape of the object, in discrete form is the sum of all pixels that lie on the contour, which can be expressed as: The equation (1) shows how to calculate the perimeter, the problem is to know which are the pixels in the images that belong to the perimeter. For searching purposes, the system calculates the perimeter obtaining:  number of points around a piece  group of points coordinates X&Y, corresponding to the perimeter of the piece measured clockwise  boundaries of the piece 2D Bounding Box (2D-BB) The perimeter calculation for every piece in the ROI is performed after the binarization. Search is always accomplished from left to right and from top to bottom. Once a white pixel is found, all the perimeter is calculated with a search function (figure 2).

Fig. 2. Perimeter calculation of a workpiece
The next definitions are useful to understand the algorithm:  A nearer pixel to the boundary is any pixel surrounded mostly by black pixels in connectivity eight.  A farther pixel to the boundary is any pixel that is not surrounded by black pixels in connectivity eight.  The highest and lowest coordinates are the ones that create a rectangle (Boundary Box). The search algorithm executes the following procedures once it has found a white pixel: 1. Searches for the nearer pixel to the boundary that has not been already located.

Assigns the label of actual pixel to the nearer pixel to the boundary recently found. 3. Paints the last pixel as a visited pixel. 4. If the new coordinates are higher than the last higher coordinates, it is assigned the new values to the higher coordinates. 5. If the new coordinates are lower than the last lower coordinates, it is assigned the
new values to the lower coordinates. 6. Steps 1 to 5 are repeated until the procedure begins to the initial point, or no other nearer pixel to the boundary is found. This technique will surround any irregular shape, and will not process useless pixels of the image, therefore this is a fast algorithm that can perform on-line classification, and can be classified as linear: O (N * 8*4)

First pixel found
where N is the size of the perimeter, and 8 & 4 are the number of comparisons the algorithm needs to find the pixel farther to the boundary, the main difference with the traditional algorithm consist of making the sweep in an uncertain area which is always larger than the figure, this turns the algorithm into: O(N*M) N*M, is the size of the Boundary Box in use, and it does not obtain the coordinates of the perimeter in the desired order.

Area
The area of the objects is defined as the space between certain limits, in other words, the sum of all pixels that make up the object, which you can be defined by:

Centroid
The centre of mass of an arbitrary shape is a pair of coordinates (Xc, Yc) in which all its mass is considered concentrated and also on which the resultant of all forces is acting. In other words is the point where a single support can balance the object. Mathematically, for the discrete domain of any form they are defined as:

Distance from centroid to the contour
This phase provides valuable information for the invariant recognition of objects by the BOF, finding the distance from the centroid to the perimeter or boundary pixels. If assuming that  

Generation of descriptive vector
The part of the descriptor vector in 2D contains 180 elements, which are obtained every two degrees around the object and is normalized by dividing all vectors by the maximum value found in the same vector, so as shown in Figure 3, where the beginning or first value of the descriptor vector is at the top of the piece however it can start at any point for the case of a circle. Fig. 3. Obtaining BOF for a circle In more complicated figures, the starting point is crucial, so the following rules apply: the first step is to find the longest line passing through the centre of the piece, as shown in Figure 4(a), where there are several lines. The longest line is taken and divided by two, taking the centre of the object as reference. Thus, the longest middle part of the line is as shown in Figure 4(b) and this is taken as starting point for the vector descriptor.

Object's form
The Shape From Shading (SFS) consists primarily of obtaining the orientation of the surface due to local variations in brightness that is reflected by the object, in other words, the intensities of the greyscale image is taken as a topographic surface. SFS is the process of obtaining three-dimensional surface shape from reflection of light from a greyscale image. In the 70's, Horn formulated the problem of Shape From Shading finding the solution of the equation of brightness or reflectance trying to find a single solution (Horn, 1970). Today, the issue of Shape from Shading is known as an ill-posed problem, as mentioned by Brooks, causing ambiguity between what has a concave and convex surface, which is due to changes in lighting parameters (Brooks, 1983).
To solve the SFS problem, it is important to study how the image is formed, as mentioned by Zhang (Zhang, et al., 1999). A simple model of the formation of an image is Lambertian model, where the gray value in the pixels of the image depends on the direction of light and surface normal. So if we assume a Lambertian reflection, we know the direction of light and brightness can be described as a function of the object surface and the direction of light, the problem becomes a little simpler. The algorithm consist to find the gradient of the surface and determine the normals, since they are perpendicular to the normals and appear in the reflectance cone whose centre is given by the direction of light, to calculate the normal of the entire surface of the object to be recognised, then smoothing is performed so that the normal direction of the local regions are not very uneven. When smoothing is performed, some lie outside of the normal cone reflectance, then it is necessary to make them to rotate and to re-enter these normals within the cone, smoothing and rotation using iterations. Finally getting the kind of local curvature of the surface generates a histogram. The greyscale image is taken as a topographic surface and is known the vector of reflected light, so, the reflectance equation is calculated, as shown below: Where: s is a unit vector of the direction of light, E is the reflectance of the light in   The reflectance equation of the image is defined by a cone of possible directions normal to the surface as shown in Figure 6 where the reflectance cone has an angle of Fig. 6. Possible normal directions to the surface over the reflectance cone If the normals satisfy recovered reflectance equation of the image, then the normals must fall on their respective reflectance cones.

Image's Gradient
The first step is to calculate the surface normals which are calculated using the gradient of the image (I), as shown in equation (6).
Where   q p are used to obtain the gradient and they are known as Sobel operators.

Normals
As the normals are perpendicular to the tangents, the tangents can be finded by the cross product, which is parallel to   T q p 1 , ,  . Thus we can write the normal like: Assuming that z component of the normal to the surface is positive.

Smoothness and rotation
The smoothing, in few words can be described as avoiding abrupt changes between normal and adjacent. The Sigmoidal Smoothness Constraint makes the restriction of smoothness or regularization forcing the error of brightness to satisfy the matrix rotation  , deterring sudden changes in direction of the normal through the surface. With the normal smoothed, proceed to rotate these so that they are in the reflectance cone as shown in Figure 7.

Shape index
Koenderink (Koenderink, &Van Doorn, 1992) separated the shape index in different regions depending on the type of curvature, which is obtained through the eigenvalues of the Hessian matrix, which will be represented by 1 k and 2 k as showing the equation 7.

Robotic Test Bed
The robotic test bed is integrated by a KUKA KR16 industrial robot as it is shown in figure  10. It also comprises a visual servo system with a ceiling mounted Basler A602fc CCD camera (not shown).

Experimental results
The object recognition experiments by the FuzzyARTMAP (FAM) neural network were carried out using the above working pieces. The network parameters were set for fast learning (=1) and high vigilance parameter ( ab = 0.9). There were carried out three. The first experiment considered only the BOF taking data from the contour of the piece, the second experiment considered information from the SFS algorithm taking into account the reflectance of the light on the surface and finally, the third experiment was performed using a fusion of both methods (BOF+SFS).

First Experiment (BOF)
For this experiment, all pieces were placed within the workplace with controlled light illumination at different orientation and this data was taken to train the FAM neural network. Once the neural network was trained with the patterns, then the network was tested placing the different pieces at different orientation and location within the work space. The figure 12 shows some examples of the object's contour. The object's were recognised in all cases having only failures between Rounded shaped objects and Square shaped ones. In these cases, there was always confusion due to the fact that the network learned only contours and in both cases having only the difference in the type of surface the contour is very similar.

Second Experiment (SFS)
For the second experiment and using the reflectance of the light over the surface of the objects (SFS method), the neural network could recognise and differentiate between rounded and pyramidal objects. It was determined during training that for the rounded objects to be recognised, it was just needed one vector from the rounded objects because the change in the surface was smooth. For the pyramidal objects it was required three different patterns during training to recognise the objects, from which it was used one for the square and triangle, one for the cross and other for the star. It was noticed that the reason was that the surface was different enough between the pyramidal objects.

Third Experiment (BOF+SFS)
For the last experiment, data from the BOF was concatenated with data from the SFS. The data was processed in order to meet the requirement of the network to have inputs within the [0, 1] range. The results showed a 100% recognition rate, placing the objects at different locations and orientations within the viewable workplace area. To verify the robustness of our method to scaling, the distance between the camera and the pieces was modified. The 100% size was considered the original size and a 10% reduction for instance, meant that the piece size was reduced by 10% of its original image. Different values with increment of 5 degrees were considered up to an angle θ = 30 degrees (see figure  13 for reference). The obtained results with increments of 5 degrees step are shown in Table 2 The "numbers" are errors due to the BOF algorithm, the "numbers*" are errors due to SFS algorithm, and the "numbers*" are errors due to both, the BOF and SFS algorithm. The first letter is the capital letter of the curvature of the objects and the second one is the form of the object, for instance, RS (Rounded Square) or PT (Pyramidal Triangle). Figure 14 shows the behaviour of the ANN recognition rate at different angles. The Figure 14 shows that the pyramidal objects have fewer problems to be recognized in comparison with the rounded objects.

Conclusions and future work
The research presented in this chapter presents an alternative methodology to integrate a robust invariant object recognition capability into industrial robots using image features from the object's contour (boundary object information) and its form (i.e. type of curvature or topographical surface information). Both features can be concatenated in order to form an invariant vector descriptor which is the input to an Artificial Neural Network (ANN) for learning and recognition purposes. Experimental results were obtained using two sets of four 3D working pieces of different cross-section: square, triangle, cross and star. One set had its surface curvature rounded and the other had a flat surface curvature so that these object were named of pyramidal type. Using the BOF information and training the neural network with this vector it was demonstrated that all pieces were recognised irrespective from its location an orientation within the viewable area since the contour was only taken into consideration. With this option it is not possible to differentiate the same type of object with different surface like the rounded and pyramidal shaped objects. When both information was concatenated (BOF + SFS), the robustness of the vision system improved recognising all the pieces at different location and orientation and even with 5 degrees inclination, in all cases we obtained 100% recognition rate. Current results were obtained in a light controlled environment; future work is envisaged to look at variable lighting which may impose some consideration for the SFS algorithm. It is also intended to work with on-line retraining so that recognition rates are improved and also to look at the autonomous grasping of the parts by the industrial robot.