Robot Soccer,  ISBN: 978-953-307-036-0

Automated Camera Calibration for Robot Soccer

By Donald G Bailey and Gourab Sen Gupta

DOI: 10.5772/7341

Article top

Overview

Example field, clearly showing lens distortion and mild perspective distortion.
Figure 1. Example field, clearly showing lens distortion and mild perspective distortion.
Geometry for calculating the change as a result of radial lens distortion. The distortion and scales have been exaggerated to make the effects visible.
Figure 2. Geometry for calculating the change as a result of radial lens distortion. The distortion and scales have been exaggerated to make the effects visible.
Parallax correction geometry. Left: the effect of lateral error; right: the effect of height error. Note, the robot height, h, has been exaggerated for clarity.
Figure 3. Parallax correction geometry. Left: the effect of lateral error; right: the effect of height error. Note, the robot height, h, has been exaggerated for clarity.
The edge of the playing area.
Figure 4. The edge of the playing area.
As a result of lighting and specular reflection, the edge of the playing area may be harder to detect.
Figure 5. As a result of lighting and specular reflection, the edge of the playing area may be harder to detect.
The detected walls from the image in Figure 1.
Figure 6. The detected walls from the image in Figure 1.
The image after correcting for distortion. The blue + corresponds to the centre of distortion, and the red + corresponds to the detected camera position. The camera height is indicated in the scale on the bottom (10 cm per division).
Figure 7. The image after correcting for distortion. The blue + corresponds to the centre of distortion, and the red + corresponds to the detected camera position. The camera height is indicated in the scale on the bottom (10 cm per division).
Geometry for estimating the camera position.
Figure 8. Geometry for estimating the camera position.
The 3-aside field with validation points marked.
Figure 9. The 3-aside field with validation points marked.
Larger Robocup field without walls captured from two separate cameras. The poles placed in each corner of the field allow calibration of camera position.
Figure 10. Larger Robocup field without walls captured from two separate cameras. The poles placed in each corner of the field allow calibration of camera position.

Automated camera calibration for robot soccer

Donald G Bailey1 and Gourab Sen Gupta1

1. Introduction

Robot soccer has become popular over the last decade not only as a platform for education and entertainment but as a test bed for adaptive control of dynamic systems in a multi-agent collaborative environment (Messom, 1998). It is a powerful vehicle for exploration and dissemination of scientific knowledge in a fun and exciting manner. The robot soccer environment encompasses several technologies—embedded micro-controller based hardware, wireless radio-frequency data transmission, dynamics and kinematics of motion, motion control algorithms, real-time image capture and processing and multi-agent collaboration.

The vision system is an integral component of modern autonomous mobile robots. With robot soccer, the physical size of the robots in the micro-robot and small robot leagues limits the power and space available, precluding the use of local cameras on the robots themselves. This is overcome by using a global vision system, with one (or more for larger size fields) cameras mounted over the playing field. The camera or cameras are connected to a vision processor that determines the location and orientation of each robot and the location of the ball relative to the playing field. This data is then passed to the strategy controller, which determines how the team should respond to the current game situation, plans the trajectories or paths of the robots under its control, and transmits the appropriate low-level motor control commands to the robots, which enable them to execute the plan.

To manage complexity in collaborative robot systems, a hierarchical state transition based supervisory control (STBS) system can be used (Sen Gupta et al., 2002; Sen Gupta et al., 2004). However, the performance of such a system, or indeed any alternative higher-level control system, deteriorates substantially if the objects are not located accurately because the generic control functions to position and orient the robots are then no longer reliable.

The high speed and manoeuvrability of the robots make the game very dynamic. Accurate control of high-speed micro-robots is essential for success within robot soccer. This makes accurate, real-time detection of the position and orientation of objects of particular importance as these greatly affect path-planning, prediction of moving targets and obstacle avoidance. Each robot is identified in the global image by a “jacket” which consists of a pattern of coloured patches. The location of these coloured patches within the image is used to estimate the position and orientation of the robot within the playing area. The cameras must therefore be calibrated to provide an accurate mapping between image coordinates and world coordinates in terms of positions on the playing field (Bailey & Sen Gupta, 2004).

The portable nature of robot soccer platforms means that every time the system is set up, there are differences in the camera position and angle relative to the field. Each team has their own camera, and both cannot be mounted exactly over the centre of the playing area. It is also difficult to arrange the camera position so that it is perfectly perpendicular to the playing surface. Consequently, each camera is looking down on the playing area at a slight angle, which introduces a mild perspective distortion into the image. The size of the playing area, combined with constraints on how high the camera may be mounted, require that a wide angle lens be used. This can introduce significant barrel distortion within the image obtained. Both of these effects are readily apparent in Figure 1. The limited height of the camera combined with the height of the robots means that each detected robot position is also subject to parallax error. These factors must all be considered, and compensated for, to obtain accurate estimates of the location and orientation of each robot.

media/image1.png

Figure 1.

Example field, clearly showing lens distortion and mild perspective distortion.

2. Effects of distortion

At the minimum, any calibration must determine the location of the playing field within the image. The simplest calibration (for single camera fields) is to assume that the camera is aligned with the field and that there is no distortion. The column positions of the goal mouths, CLeft and CRight , and the row positions of the field edges at, or near, the centreline, RTop and RBottom , need to be located within the image. Then, given the known length, L, and width, W, of the field, the estimated height of the camera, H, and the known height of the robots, h, an object positioned at row R and column C within the image may be determined relative to the centre of the field as

x=(CCRight+CLeft2)(LCRightCLeft)(HhH)y=(RRBottom+RTop2)(WRBottomRTop)(HhH)

The first term sets the centre of the field as the origin, the second scales from pixel units to physical units, and the third term accounts for parallax distortion resulting from the height of the robot (assuming that the camera is positioned over the centre of the field).

While this calibration is simple to perform, it will only be accurate for an ideal camera positioned precisely in the centre of the field and aligned perpendicularly to the playing field. Each deviation from ideal will introduce distortions in the image and calibration errors.

2.1. Lens distortion

The most prevalent form of lens distortion is barrel distortion. It results from the lens having a slightly higher magnification in the centre of the image than at the periphery. Barrel distortion is particularly noticeable with wide-angle lenses such as those used with robot soccer. While there are several different physical models of the lens distortion based on the known characteristics of the lens (Basu & Licardie, 1995; Pers & Kovacic, 2002), the most commonly used model is a generic radial Taylor series relating the ideal, undistorted coordinates (xu ,yu ) to the distorted coordinates in the image (xd ,yd ):

xd=xu(1+κ1ru2+κ2ru4+...)+(ε1(ru2+2xu2)+2ε2xuyu)(1+ε3ru2+...)yd=yu(1+κ1ru2+κ2ru4+...)+(2ε1xuyu+ε2(ru2+2yu2))(1+ε3ru2+...)

Both sets of coordinates have the centre of distortion as the origin, and ru2=xu2+yu2 . The set of parameters κ and ε characterise a particular lens. Note that the centre of distortion is not necessarily the centre of the image (Willson & Shafer, 1994). For most lenses, two radial and two tangential terms are sufficient (Li & Lavest, 1996), and most calibration methods limit themselves to estimating these terms. A simple, one parameter, radial distortion model is usually sufficient to account for most of the distortion (Li & Lavest, 1996):

rd=ru(1+κru2)

This forward transform is most commonly used because in modelling the imaging process, the image progresses from undistorted coordinates to distorted image coordinates. Sometimes, however, the reverse transform is used. This swaps the roles of the two sets of coordinates. Since the model is an arbitrary Taylor series expansion, either approach is valid (although the coefficients will be different for the forward and reverse transforms). The first order reverse transform is given by

ru=rd(1+κrd2)

Effect on position

If the simple calibration was based on distances from the centre of the image, the position error would increase with radius according to the radially dependent magnification. However, since the calibration of eq. (1) sets the positions at the edges and ends of the field, the position error there will be minimal. The absolute position errors should also be zero near the centre of the image, increase with radius to a local maximum between the centre and edges of the playing area, and decrease to zero again on an ellipse through the table edge and goal points. Outside this ellipse, the errors will increase rapidly with distance, having the greatest effect in the corners of the playing field.

Effect on orientation

Determining the effect of radial distortion on the angle is more complicated. Consider a point in the undistorted image using radial coordinates (ru,φ) . At this point, the lens distortion results in a magnification

M=rdru

Next, consider a test point offset from this by a small distance, r, at an angle, θu , with a magnification, M 2. If the magnification is constant (M 2=M) then everything is scaled equally, and by similar triangles, the angle to the test point will remain the same. This implies that if the test point is in the tangential direction ( θuφ =90 ), there will be no angle error. Similarly, since the distortion is radial, if the test point is aligned ( θuφ =0) the test point will be stretched radially, but the angle will not change. These two considerations imply that it is best to consider offsets in the tangential and radial direction. This geometry is shown in Figure 2.

media/image14.jpg

Figure 2.

Geometry for calculating the change as a result of radial lens distortion. The distortion and scales have been exaggerated to make the effects visible.

After distortion, the angle to the test point becomes

tan(θdφ)=M2rsin(θuφ)M2(ru+rcos(θuφ))Mru=M2rsin(θuφ)M2rcos(θuφ)+(M2M)ru

The test distance, r, is small, therefore

M2MΔru2dMdru2=((rurcos(θuφ))2ru2)dMdru22rrucos(θuφ)dMdru2
so
tan(θdφ)MM+2ru2dMdru2tan(θuφ)

When the forward map of eq. (3) is used, eq. (8) becomes

tan(θdφ)1+κru21+3κru2tan(θuφ)

Since the magnification changes faster with increasing radius, the angle error ( θdθu ) will also be larger further from the centre of distortion, and increase more rapidly in the periphery.

2.2 Perspective distortion

Perspective distortion results when the line of sight of the camera is not perpendicular to the plane of the playing area. This will occur when the camera is not directly over the centre of the playing area, and it must be tilted to fit the complete playing area within the field of view. A perspective transformation is given by

xd=h1xu+h2yu+h3h7xu+h8yu+h9yd=h4xu+h5yu+h6h7xu+h8yu+h9

where (xu ,yu ) and (xd ,yd ) are the coordinates of an undistorted and distorted point respectively. This is often represented in matrix form using a homogenous coordinate system (Hartley & Zisserman, 2000):

[kxdkydk]=[h1h2h3h4h5h6h7h8h9][xuyu1]orPd=HPu

The 3x3 transformation matrix, H, incorporates rotation, translation, scaling, skew, and stretch as well as perspective distortion. Just considering perspective distortion and for small tilt angles H simplifies to

Hperspective=[100010pxpy1]

The effect of this is to change the scale factor, k, in eq. (11), giving a position dependent magnification. As a consequence, parallel lines converge, meeting somewhere on the vanishing line given by

pxxu+pyyu+1=0

Effect on position

Since the simple calibration sets four points on the edges of the playing area, these points will have no error. In direction that the camera is tilted, k will be greater than 1, shrinking the scene. The radial error will be positive, and the tangential errors will be towards the direction line. In the opposite direction, the magnification is greater than 1. The radial error will be negative, and the tangential errors will be away from the direction line. There will be a line approximately across the middle of the playing area where

pxxu+pyyu+1=1

which will have no error. The angle of the line, and the severity of the errors will depend on the angle and extent of the camera tilt respectively.

Effect on orientation

Again the distortion will depend on the change of magnification with position. As the content becomes more compressed (closer to the vanishing line) angle distortion will increase, because the slope of the magnification becomes steeper and the angle errors will be greater. Without loss of generality, consider the camera tilted in the x direction (py =0). Consider a test point offset from (xu ,yu ) by distance r at an angle θu . After distortion, the offset becomes:

Δy=yu+rsinθu1+px(xu+rcosθu)yu1+pxxuΔx=xu+rcosθu1+px(xu+rcosθu)xu1+pxxu

Hence, the angle of the test point after distortion is given by

tanθd=ΔyΔx=(yu+rsinθu)(1+pxxu)yu(1+px(xu+rcosθu))(xu+rcosθu)(1+pxxu)xu(1+px(xu+rcosθu))=rsinθu(1+pxxu)yupxrcosθurcosθu=(1+pxxu)tanθuyupx

Along the line of sight (yu =0) there is no distortion radially or tangentially. However, other orientations become compressed as the magnification causes foreshortening, especially as it approaches the vanishing line. At other positions, angles that satisfy

tanθu=yuxu

are not distorted. This is expected, since the perspective transformation will map straight lines onto straight lines. Orientations perpendicular to the line of sight ( θu =90 ) are not distorted. Other orientations are compressed by the perspective foreshortening.

The mild perspective distortion encountered with the robot soccer system will introduce mild distortion. However, angle errors will be largest on the side of the field where the image appears compressed.

2.3. Parallax distortion

In the absence of any other information, the camera is assumed to be at a known height, H, directly above the centre of the robot soccer playing area. This allows the change in scale associated with the known heights of the robots to be taken into account. Since the robot jackets are always a fixed height above the playing surface, parallax correction simply involves scaling the detected position relative to the position of the camera. Errors in estimating the camera position will only introduce position errors; they will not affect the angle.

media/image29.jpg

Figure 3.

Parallax correction geometry. Left: the effect of lateral error; right: the effect of height error. Note, the robot height, h, has been exaggerated for clarity.

Effect on position

Consider the geometry shown in Figure 3. When the camera is offset laterally by P, an object at location d in the playing area will appear at location v by projecting the height

v=(dP)HHh+P=dHPhHh

However, if it is assumed that the camera is not offset, the parallax correction will estimate the object position as d’. The error is given by

dd=v(Hh)Hd=PhH

The lateral error is scaled by the relative heights of the robot and camera. This ratio is typically 40 or 50, so a 5 cm camera offset will result in a 1 mm error in position. Note that the error applies to everywhere in the playing area, independent of the object location.

An error in estimating the height of the camera by ΔH will also result in an error in location of objects. In this case, the projection of the object position will be

v=d(H+ΔH)H+ΔHh

Again, given the assumptions in camera position, correcting this position for parallax will result in an error in estimating the robot position of

dd=v(Hh)Hd=dhΔH(H+ΔHh)H

Since changing the height of the camera changes the parallax correction scale factor, the error will be proportional to the distance from the camera location. There will be no error directly below the camera, and the greatest errors will be seen in the corners of the playing area.

2.4. Effects on game play

When considering the effects of location and orientation errors on game play, two situations need to be considered. The first is local effects, for example when a robot is close to the ball and manoeuvring to shoot the ball. The second is when the robot is far from play, but must be brought quickly into play.

In the first situation, when the objects are relatively close to one another, what is most important is the relative location of the objects. Since both objects will be subject to similar distortions, they will have similar position errors. However, the difference in position errors will result in an error in estimating the angle between the objects (indeed this was how angle errors were estimated earlier in this section). While orientation errors may be considered of greater importance, these will correlate with the angle errors from estimating the relative position, making orientation errors less important for close work.

In contrast with this, at a distance the orientation errors are of greater importance, because shooting a ball or instructing the robot to move rapidly will result in moving in the wrong direction when the angle error is large. For slow play, this is less significant, because errors can be corrected over a series of successive images as the object is moving. However at high speed (speeds of over two metres per second are frequently encountered in robot soccer), estimating the angles at the start of a manoeuvre is more critical.

Consequently, good calibration is critical for successful game play.

3. Standard calibration techniques

In computer vision, the approach of Tsai (Tsai, 1987) or some derivation is commonly used to calibrate the relationship between pixels and real-world coordinates. These approaches estimate the position and orientation of the camera relative to a target, as well as estimating the lens distortion parameters, and the intrinsic imaging parameters. Calibration requires a dense set of calibration data points scattered throughout the image. These are usually provided by a ‘target’ consisting of an array of spots, a grid, or a checkerboard pattern. From the construction of the target, the relative positions of the target points are well known. Within the captured image of the target, the known points are located and their correspondence with the object established. A model of the imaging process is then adjusted to make the target points match their measured image points.

The known location of the model enables target points to be measured in 3D world coordinates. This coordinate system is used as the frame of reference. A rigid body transformation (rotation and translation) is applied to the target points. This uses an estimate of the camera pose (position and orientation in world coordinates) to transform the points into a camera centred coordinate system. Then a projective transformation is performed, based on the estimated lens focal length, giving 2D coordinates on the image plane. Next, these are adjusted using the distortion model to account for distortions introduced by the lens. Finally, the sensing element size and aspect ratio are used to determine where the control points should appear in pixel coordinates. The coordinates obtained from the model are compared with the coordinates measured from the image, giving an error. The imaging parameters are then adjusted to minimise the error, resulting in a full characterisation of the imaging model.

The camera and lens model is sufficiently non-linear to preclude a simple, direct calculation of all of the parameters of the imaging model. Correcting imaging systems for distortion therefore requires an iterative approach, for example using the Levenberg-Marquardt method of minimising the mean squared error (Press et al., 1993). One complication of this approach is that for convergence, the initial estimates of the model parameters must be reasonably close to the final values. This is particularly so with the 3D rotation and perspective transformation parameters.

Planar objects are simpler to construct accurately than full 3D objects. Unfortunately, only knowing the location of points on a single plane is insufficient to determine a full imaging model (Sturm & Maybank, 1999). Therefore, if a planar target is used, several images must be taken of the target in a variety of poses to obtain full 3D information (Heikkila & Silven, 1996). Alternatively, a reduced model with one or two free parameters may be obtained from a single image. For robot soccer, this is generally not too much of a problem since the game is essentially planar.

A number of methods for performing the calibration for robot soccer are described in the literature. Without providing a custom target, there are only a few data points available from the robot soccer platform. The methods range from the minimum calibration described in the previous section through to characterisation of full models of the imaging system.

The basic approach described in section 2 does not account for any distortions. A simple approach was developed in (Weiss & Hildebrand, 2004) to account for the gross characteristics of the distortion. The playing area was divided into four quadrants, based on the centreline, and dividing the field in half longitudinally between the centres of the goals. Each quadrant was corrected using bilinear interpolation. While this corrects the worst of the position errors resulting from both lens and perspective distortion, it will only partially correct orientation errors. The use of a bilinear transformation will also result in a small jump in the orientation at the boundaries between adjacent quadrants.

A direct approach of Tsai’s calibration is to have a chequered cloth (as the calibration pattern) that is rolled out over the playing area (Baltes, 2000). The corners of the squares on the cloth provide a 2D grid of target points for calibration. The cloth must cover as much as possible of the field of view of the camera. A limitation of this approach is that the calibration is with respect to the cloth, rather than the field. Unless the cloth is positioned carefully with respect to the field, this can introduce other errors.

This limitation may be overcome by directly using landmarks on the playing field as the target locations. This approach is probably the most commonly used and is exemplified in (Ball et al., 2004) where a sequence of predefined landmarks is manually clicked on within the image of the field. Tsai’s calibration method is then used to determine the imaging model by matching the known locations with their image counterparts. Such approaches based on manually selecting the target points within the image are subject to the accuracy and judgement of the person locating the landmarks within the image. Target selection is usually limited to the nearest pixel. While selecting more points will generally result in a more accurate calibration by averaging the errors from the over-determined system, the error minimisation cannot remove systematic errors. Manual landmark selection is also very time-consuming.

The need to locate target points subjectively may be overcome by automating the calibration procedure. Egorova (Egorova et al., 2005) uses the bounding box to find the largest object in the image, and this is used to initialise the transform. A model of the field is transformed using iterative global optimisation to make the image of the field match the transformed model. While automatic, this procedure takes five to six seconds using a high end desktop computer for the model parameters to converge.

A slightly different approach is taken by Klancar (Klancar et al., 2004). The distortion correction is split into two stages: first the lens distortion is removed, and then the perspective distortion parameters are estimated. This approach to lens distortion correction is based on the observation that straight lines are invariant under a perspective (or projective) transformation. Therefore, any deviation from straightness must be due to lens distortion (Brown, 1971; Fryer et al., 1994; Park & Hong, 2001). This is the so-called ‘plumb-line’ approach, so named because when it was first used by (Brown, 1971), the straight lines were literally plumb-lines hung within the image. (Klancar et al., 2004) uses a Hough transform to find the major edges of the field. Three points are found along each line: one on the centre and one at each end. A hyperbolic sine radial distortion model is used (Pers & Kovacic, 2002), with the focal length optimised to make the three target points for each line as close to collinear as possible. One limitation of Klancar’s approach is the assumption that the centre of the image corresponds with the centre of distortion. However, errors within the location of the distortion centre results in tangential distortion terms (Stein, 1997) which are not considered with the model. The second stage of Klancar’s algorithm is to use the convergence of parallel lines (at the vanishing points) to estimate the perspective transformation component.

None of the approaches explicitly determines the camera location. Since they are all based on 2D targets, they can only gain limited information on the camera height, resulting in a limited ability to correct for parallax distortion. The limitations of the existing techniques led us to develop an automatic method that overcomes these problems by basing the calibration on a 3D model.

4. Automatic calibration procedure

The calibration procedure is based on the principles first described in (Bailey, 2002). A three stage solution is developed based on the ‘plumb-line’ principle. In the first stage, a parabola is fitted to each of the lines on the edge of the field. Without distortion, these should be straight lines, so the quadratic component provides data for estimating the lens distortion. A single parameter radial distortion model is used, with a closed form solution given for determining the lens distortion parameter. In the second stage, homogenous coordinates are used to model the perspective transformation. This is based on transforming the lines on the edge of the field to their known locations. The final stage uses the 3D information inherent in the field to obtain an estimate of the camera location (Bailey & Sen Gupta, 2008).

4.1. Edge detection

The first step is to find the edge of the playing field. The approach taken will depend on the form of the field. Our initial work was based on micro-robots, where the playing field is bounded by a short wall. The white edges apparent in Figure 1 actually represent the inside edge of the wall around the playing area, as shown in Figure 4. In this case, the edge of the playing area corresponds to the edge between the white of the wall and the black of the playing surface. While detecting the edge between the black and white sounds straightforward, it is not always as simple as that. Specular reflections off the black regions can severely reduce the contrast in some situations, as can be seen in Figure 5, particularly in the bottom right corner of the image.

media/image34.jpg

Figure 4.

The edge of the playing area.

Two 3x3 directional Prewitt edge detection filters are used to detect both the top and bottom edges of the walls on all four sides of the playing area. To obtain an accurate estimate of the calibration parameters, it is necessary to detect the edges to sub-pixel accuracy. Consider first the bottom edge of the wall along the side of the playing area in the top edge of the image. Let the response of the filtered image be f[x,y]. Within the top 15% of the image, the maximum filtered response is found in each column. Let the maximum in column x be located on row ymax,x. A parabola is fitted to the filter responses above and below this maximum (perpendicular to the edge), and the edge pixel determined to sub-pixel location as (Bailey, 2003):

edge[x]=ymax,x+f[x,ymax,x+1]f[x,ymax,x1]4f[x,ymax,x]2(f[x,ymax,x+1]+f[x,ymax,x1])
media/image36.png

Figure 5.

As a result of lighting and specular reflection, the edge of the playing area may be harder to detect.

A parabola is then fitted to all the detected edge points (x,edge[x]) along the length of the edge. Let the parabola be y(x)=ax2+bx+c . The parabola coefficients are determined by minimising the squared error

E=x(ax2+bx+cedge[x])2

The error is minimised by taking partial derivatives of eq. (23) with respect to each of the parameters a, b, and c, and solving for when these are equal to zero. This results in the following set of simultaneous equations, which are then solved for the parabola coefficients.

[x4x3x2x3x2xx2x1][abc]=[x2edge[x]x.edge[x]edge[x]]

The resulting parabola may be subject to errors from noisy or misdetected points. The accuracy may be improved considerably using robust fitting techniques. After initially estimating the parabola, any outliers are removed from the data set, and the parabola refitted to the remaining points. Two iterations are used, removing points more than 1 pixel from the parabola in the first iteration, and removing those more that 0.5 pixel from the parabola in the second iteration.

A similar process is used with the local minimum of the Prewitt filter to detect the top edge of the wall. The process is repeated for the other walls in the bottom, left and right edges of the image. The robust fitting procedure automatically removes the pixels in the goal mouth from the fit. The results of detecting the edges for the image in Figure 1 are shown in Figure 6.

media/image40.png

Figure 6.

The detected walls from the image in Figure 1.

4.2. Estimating the distortion centre

Before correcting for the lens distortion, it is necessary to estimate the centre of distortion. With purely radial distortion, lines through the centre will remain straight. Therefore, considering the parabola components, a line through the centre of distortion will have no curvature (a=0). In general, the curvature of a line will increase the further it is from the centre. It has been found that the curvature, a, is approximately proportional to the axis intercept, c, when the origin is at the centre of curvature (Bailey, 2002).

The x centre, x0, maybe determined by considering the vertical lines within the image (the left and right ends of the field) and the y centre, y0, from the horizontal lines (the top and bottom sides of the field). Consider the horizontal centre first. With just two lines, one at each end of the field, the centre of distortion is given by

x0=a2c1a1c2a2a1

With more than two lines available, this may be generalised by performing a least squares fit between the intercept and the curvature:

x0=ciaiciaici2ciai1aici

The same equations may be used to estimate the y position of the centre, y0.

Once the centre has been estimated, it is necessary to offset the parabolas to make this the origin. This involves substituting

x^=xx0y^=yy0

into the equations for each parabola, y=ax2+bx+c to give

y^=a(x^+x0)2+b(x^+x0)+cy0=ax^2+(2ax0+b)x^+(ax02+bx+0cy0)

and similarly for x=ay2+by+c with the x and y reversed.

Shifting the origin changes the parabola coefficients. In particular, the intercept changes, as a result of the curvature and slope of the parabolas. Therefore, this step is usually repeated two or three times to progressively refine the centre of distortion. The centre relative to the original image is then given by the sum of successive offsets.

4.3. Estimating the aspect ratio

For pure radial distortion, the slopes of the a vs c curve should be the same horizontally and vertically. This is because the strength of the distortion depends only on the radius, and not on the particular direction. When using an analogue camera and frame grabber, the pixel clock of the frame grabber is not synchronised with the pixel clock of the sensor. Any difference in these clock frequencies will result in aspect ratio distortion with the image stretched or compressed horizontally by the ratio of the clock frequencies. This distortion is not usually a problem with digital cameras, where the output pixels directly correspond to sensing elements. However, aspect ratio distortion can also occur if the pixel pitch is different horizontally and vertically.

To correct for aspect ratio distortion if necessary, the x axis can be scaled as x^=x/R . The horizontal and vertical parabolas are affected by this transformation in different ways:

y=ax2+bx+c=aR2x^2+bRx^+c
and
x^=xR=aRy2+bRy+cR

respectively. The scale factor, R, is chosen to make the slopes of a vs c to be the same horizontally and vertically. Let sx be the slope of a vs c for the horizontal parabolas and sy be the slope for the vertical parabolas. The scale factor is then given by

R=sx/sy

4.4. Estimating the lens distortion parameter

Since the aim is to transform from distorted image coordinates to undistorted coordinates, the reverse transform of eq. (4) is used in this work. Consider first a distorted horizontal line. It is represented by the parabola yd=axd2+bxd+c . The goal is to select the distortion parameter, κ , that converts this to a straight line. Substituting this into eq. (4) gives

yu=yd(1+κ(xd2+yd2))=(axd2+bxd+c)(1+κ(xd2+(axd2+bxd+c)2))=c(1+κc2)+b(1+3κc2)xd+(a+cκ(3ac+3b2+1))xd2+...

where the … represents higher order terms. Unfortunately, this is in terms of xd rather than xu. If we consider points near the centre of the image (small x) then the higher order terms are negligible so

xu=xd(1+κrd2)xd(1+κyd2)xd(1+κc2)
or
xdxu1+κc2

Substituting this into eq. (32) gives

yu=c(1+κc2)+b(1+3κc2)1+κc2xu+a+cκ(3ac+3b2+1)(1+κc2)2xu2+...

Again, assuming points near the centre of the image, and neglecting the higher order terms, eq. (35) will be a straight line if the coefficient of the quadratic term is set to zero. Solving this for κ gives

κ=ac(3ac+3b2+1)

Each parabola (in both horizontal and vertical directions) will give separate estimates of κ These are simply averaged to get a value of κ that works reasonably well for all lines. (Note that if there are any lines that pass close to the origin, a weighted average should be used because the estimate of κ from such lines is subject to numerical error (Bailey, 2002).)

Setting the quadratic term to zero, and ignoring the higher order terms, each parabola becomes a line

yu=b(1+3κc2)1+κc2xu+c(1+κc2)=myxu+dy

and similarly for the vertical lines. The change in slope of the line at the intercept reflects the angle distortion and is of a similar form to eq. (9). Although the result of eq. (37) is based on the assumption of points close to the origin, in practise, the results are valid even for quite severe distortions (Bailey, 2002).

4.5. Estimating the perspective transformation

After correcting for lens distortion, the edges of the playing area are straight. However, as a result of perspective distortion, opposite edges may not necessarily be parallel. The origin is also at the centre of distortion, rather than in more convenient field-centric coordinates. This change of coordinates may involve translation and rotation in addition to just a perspective map. Therefore the full homogenous transformation of eq. (11) will be used. The forward transformation matrix, H, will transform from undistorted to distorted coordinates. To correct the distortion, the reverse transformation is required:

Pu=H1Pd

The transformation matrix, H, and its inverse H, have only 8 degrees of freedom since scaling H by a constant will only change the scale factor k, but will leave the transformed point unchanged. Each line has two parameters, so will therefore provide two constraints on H. Therefore, four lines, one from each side of the playing field, are sufficient to determine the perspective transformation.

The transformation of eq. (38) will transform points rather than lines. The line (from eq. (37)) may be represented using homogenous coordinates as

[my1dy][xy1]=0   orLP=0

where P is a point on the line. The perspective transform maps lines onto lines, therefore a point on the distorted line (LdPd=0) will lie on the transformed line (LuPu=0) after correction. Substituting into eq. (11) gives

Lu=LdH

The horizontal lines, y=myx+dy , need to be mapped to their known location on the sides of the playing area, at y=Y. Substituting into eq. (40) gives three equations in the coefficients of H:

0=myh1h2+dyh31=myh4h5+dyh6Y=myh7h8+dyh9

Although there are 3 equations, there are only two independent equations. The first equation constrains the transformed line to be horizontal. The last two, taken together, specify the vertical position of the line. The two constraint equations are therefore

0=myh1h2+dyh30=Ymyh4Yh5+Ydyh6+myh7h8+dyh9

Similarly, the vertical lines, x=mxy+dx , need to be mapped to their known locations at the ends of the field, at x=X.

0=h4+mxh5+dxh60=Xh1+Xmxh2+Xdxh3h7+mxh8+dxh9

For the robot soccer platform, each wall has two edges. The bottom edge of the wall maps to the known position on the field. The bottom edge of each wall will therefore contribute two equations. The top edge of the wall, however, is subject to parallax, so its absolution position in the 2D reference is currently unknown. However, it should be still be horizontal or vertical, as represented by the first constraint of eq. (42) or (43) respectively. These 12 constraints on the coefficients of H can be arranged in matrix form (showing only one set of equations for each horizontal and vertical edge):

[my1dy000000000myYYdyYmy1dy0001mxdx000XmxXdxX0001mxdx][h1h2h9]=0  or  DH^=0 or
DH^=0

Finding a nontrivial solution to this requires determining the null-space of the 12x9 matrix, D. This can be found through singular value decomposition, and selecting the vector corresponding to the smallest singular value (Press et al., 1993). The alternative is to solve directly using least squares. First, the square error is defined as

E=DH^(DH^)T=DH^H^TDT

Then the partial derivative is taken with respect to the coefficients of H^ :

EH^=DTDH^=0
DD is now a square 9x9 matrix, and H^ has eight independent unknowns. The simplest solution is to fix one of the coefficients, and solve for the rest. Since the camera is approximately perpendicular to the playing area, h9 can safely be set to 1. The redundant bottom line of DD can be dropped, and the right hand column of DD gets transferred to the right hand side. The remaining 8x8 system may be solved for h1 to h8. Once solved, the elements are rearranged back into a 3x3 matrix for H, and each of the lines is transformed to give two sets of parallel lines for the horizontal and vertical edges.

The result of applying the distortion correction to the input image is shown in Figure 7.

4.6. Estimating the camera position

The remaining step is to determine the camera position relative to the field. While in principle, this can be obtained from the perspective transform matrix if the focal length and sensor size are known, here they will be estimated directly from measurements on the field. The basic principle is to back project the apparent positions of the top edges of the walls on two sides. These will intersect at the camera location, giving both the height and lateral position, as shown in Figure 8.

media/image72.png

Figure 7.

The image after correcting for distortion. The blue + corresponds to the centre of distortion, and the red + corresponds to the detected camera position. The camera height is indicated in the scale on the bottom (10 cm per division).

media/image73.jpg

Figure 8.

Geometry for estimating the camera position.

The image from the camera can be considered as a projection of every object onto the playing field. Having corrected for distortion, the bottom edges of the walls will appear in their true locations, and the top edges of the walls are offset by parallax.

Let the width of the playing area be W and wall height be h. Also let the width of the projected side wall faces be T1y and T2y. The height, H, and lateral offset of the camera from the centre of the field, Cy, may be determined from similar triangles:

HW2Cy+T1y=hT1y

Rearranging gives:

T1y=h(W2Cy)Hh

and similarly for the other wall

T2y=h(W2+Cy)Hh

Equations (48) and (49) can be solved to give the camera location

Cy=(T2yT1yT2y+T1y)W2
H=hWT1y+T2y+h

Similar geometrical considerations may be applied along the length of the field to give

Cx=(T2xT1xT2x+T1x)L2
H=hLT1x+T2x+h

where L is the length of the playing field and T1x and T2x are the width of the projected end walls.

Equations (50) to (53) give four independent equations for three unknowns. Measurement limitations and noise usually result in equations (51) and (53) giving different estimates of the camera height. In such situations, it is usual to determine the output values (Cx, Cy, and H) that are most consistent with the input data (T1x, T2x, T1y, and T2y). For a given camera location, the error between the corresponding input and measurement can be obtained from eq. (48) as

E1y=h(W2Cy)HhT1y

and similarly for each of the other inputs. The camera location can then be chosen that minimises the total squared error

E2=E1y2+E2y2+E1x2+E2x2

This can be found by taking partial derivatives of eq. (55) with respect to each of the camera location variables and solving for the result to 0:

E2Cy=4h2Cy(Hh)22h(T2yT1y)Hh=0
or
2hCyHh=T2yT1y

Similarly

E2Cx2hCxHh=T2xT1x

The partial derivative with respect to the camera height is a little more complex because H appears in the denominator of each of the terms. The partial derivative of the errors across the width of the field is

Ey2H=E1y2H+E2y2H=h(Hh)2(4h(W24+Cy2)HhW(T1y+T2y)2Cy(T2yT1y))

This can be simplified by eliminating Cy through substituting eq. (57)

Ey2H=h(Hh)2(hW2HhW(T1y+T2y))

Finally, combining the partial derivatives along the length of the field with those across the width of the field gives:

E2H=Ey2H+Ex2H=h(Hh)2(hW2HhW(T1y+T2y)+hL2HhL(T1x+T2x))=0

Solving for H gives

H=(W2+L2)hW(T1y+T2y)+L(T1x+T2x)+h

Finally, the result from eq. (62) can be substituted into equations (57) and (58) to give the lateral position of the camera:

(Cx,Cy)=12(W2+L2)W(T1y+T2y)+L(T1x+T2x)(T2xT1x,T2yT1y)

The detected position of the camera is overlaid on the undistorted image in Figure 7.

5. Applying the corrections

While it is possible to apply the distortion correction to the image prior to detecting the objects, in practise this is computationally inefficient. Only a relatively small number of objects need to be detected, and the distortion is not so severe as to preclude reliable detection directly within the distorted image. Therefore, the robots and ball positions are detected within the distorted image, with the position (and orientation in the case of the robot players) returned in distorted image coordinates. The procedure for correcting the image coordinates follows the calibration procedure described in the previous section.

5.1. Correcting object position

First, the detected feature location (xf,yf) is offset relative to the centre of distortion from eq. (27) and corrected for aspect ratio distortion if necessary:

(xd,yd)=((xfx0)/R,yfy0)

The lens distortion is then corrected by applying the radially dependent magnification from eq. (4)

(xu,yu)=(1+κrd2)(xd,yd)

This point is then transformed into field-centric coordinates and corrected for perspective distortion by applying eq. (38)

[kxpkypk]=H1[xuyu1]

where H is the inverse of the matrix obtained from solving eq. (46) in fitting the field edges. The resulting point is normalised by dividing through the left hand side of eq. (66) by k.

Finally, the feature point is corrected for parallax error. From similar triangles

x^fCxHhf=xpCxH

where hf is the known height of the object and (x^f,y^f) is the corrected location of the object feature point. Equation (67), and its equivalent in the y direction, may be rearranged to give the corrected feature location:

(x^f,y^f)=HhfH(xp,yp)+hfH(Cx,Cy)

The first term in eq. (68) is the scale factor that corrects for the height of the object, and the second term compensates for the lateral position of the camera as in eq. (19).

5.2. Correcting object orientation

As outlined in section 2, the distortion will affect the detected orientation of objects within the image. The simplest approach to correct the object orientation, θ, is to also transform a test point (xt,yt) that is offset a small distance, r, from the object location in the direction specified by the orientation:

(xt,yt)=(xf,yf)+r(cosθ,sinθ)

The offset should be of similar order to the offset used to measure the orientation in the distorted image (for example half the width of the robot). The corrected orientation may then be determined from the angle between the corrected test point and the corrected object location:

θ^=tan1(y^ty^fx^tx^f)

6. Results and discussion

As the image in Figure 7 shows, the calibration method is effective at correcting distortion around the edge of the field. However, to have confidence that the model is actually correcting points anywhere in the playing area, it is necessary to check the transformation at a number of points scattered throughout the image. The calibration procedure was tested on three fields. The first was a small (150 cm x 130 cm) 3-aside micro-robot playing field, captured using a 320x240 analogue camera (Bailey & Sen Gupta, 2004 ). The second two were larger (220 cm x 180 cm) 5-aside fields, captured using a 656x492 digital Firewire camera (Bailey & Sen Gupta, 2008).

On the small field, a set of 76 points was extracted from throughout the playing area using the field lines and free kick markers as input, as indicated in Figure 9. The RMS residual error after correcting the validation points was 1.75 mm, which corresponds to about 30% of the width of a pixel. The lateral position of the camera was in error by 1.2 cm, which results in a negligible parallax error (from eq. (19)). The height of the camera was over-estimated by approximately 9 cm. This will give a maximum parallax error in the corners of the playing field (from eq. (21)) of approximately 1.1 mm, which is again a fraction of a pixel.

For the larger fields (shown in Figure 1 and Figure 5) both resulted in a significantly improved image that appeared to be free from major distortions (see Figure 7). In both cases, the lateral position of the camera was also measured with good accuracy, with a total lateral error of 0.9 cm and 1.0 cm respectively for the two fields. Again, the consequent parallax error (from eq. (19)) is negligible. The error in the height, however, was significantly larger, with an under-estimate of 8.6 cm for the image in Figure 1 and an under-estimate of 33 cm for the image in Figure 5. Even this large height error results in an error of less than 3 mm in the corners of the field (from eq. (21)). This is still less than one pixel at the resolution of the image, and the error is significantly smaller over the rest of the field.

media/image99.png

Figure 9.

The 3-aside field with validation points marked.

The larger height errors are not completely unexpected, because the height is estimated by back projecting the relatively small parallax resulting from a low wall. The wall only occupied 3 pixels in the image of the small field and approximately 5 pixels in the larger field. Measurement of this parallax requires sub-pixel accuracy to gain any meaningful results. Any small errors in measuring the wall parallax are amplified to give a large error in the estimate of the camera height. The lateral position is not affected by measurement errors to the same extent, because it is based on the relative difference in parallax between the two sides.

The cause of the large height error was examined in some detail in (Bailey & Sen Gupta, 2008). The under-estimate of the height was caused by the measured parallax of the walls being larger than expected (although the error was still sub-pixel). Since the parallax appears in the denominator of eq. (62), any over-estimate of the parallax will result in an under-estimate of the height of the camera. Two factors contributed to this error. First, a slight rounding of the profile at the top of the wall combined with specular reflection resulted in the boundary between the white and black extending over the top of the wall, increasing the apparent width. A second factor, which exacerbates this in Figure 5, is that the position of the lights gave a stronger specular component in the vicinity of the walls.

The other fields were less affected for the following reasons. Firstly, the lights were positioned over the field rather than outside it. The different light angle means that these fields were less prone to the specular reflection effects on the top corner of the wall. Secondly, the other fields were in a better condition, having been repainted more recently, and with less rounding of the top edge of the walls. Consequently, the wall parallax was able to be measured more accurately and the resultant errors were less significant.

6.1. Future work

The next step is to extend the work presented to the larger 11-aside field. These fields have two cameras, one over each half of the field. The calibration principles will be the same. The biggest difference is that the centreline will form one of the edges of the calibration, and this does not have a height associated with it. This will limit the accuracy of the parallax correction data, although in principle the height of three walls should provide sufficient data for estimating the camera location.

A further extension is to the Robocup league, which has no walls. Again two cameras are required, one to capture each half of the field, as shown in Figure 10. The image processing algorithms will need to be modified for line detection rather than edge detection, and another mechanism found to estimate the camera position. One approach currently being experimented with is to place poles of known height in each corner and at each end of the centre-line as can be seen in Figure 10. The parabola based lens and perspective distortion correction will be based on the edges of the field and centreline, and the markers on the poles detected and back projected to locate the camera.

media/image100.png

Figure 10.

Larger Robocup field without walls captured from two separate cameras. The poles placed in each corner of the field allow calibration of camera position.

7. Conclusion

The new calibration method requires negligible time to execute. Apart from the command to perform the calibration, it requires no user intervention, and is able to determine the model parameters in a fraction of a second. The model parameters are then used to automatically correct both the positions and orientations of the robots as determined from the distorted images. It is demonstrated that just capturing data from around the field is sufficient for correcting the whole playing area. The residual errors are significantly less than one pixel, and are limited by the resolution of the captured images.

The lateral position of the camera was able to be estimated to within 1 cm accuracy. The scaling effect of the parallax correction makes this error negligible. The height of the camera is harder to measure accurately, because it is back-projecting the short height of the playing field walls. On two fields, it was within 8 cm, but on a third field the height was under-estimated by 33 cm. However, even with this large error, the parallax error introduced in estimating the robot position less than one pixel anywhere on the playing field. Accurate height estimation requires good lighting, devoid of specular reflections near the walls, and for the walls to be in good condition.

The significant advantage of this calibration procedure over others described in the literature is that it is fully automated, and requires no additional setup or user intervention. While not quite fast enough to process every image, the procedure is sufficiently fast to perform recalibration even during set play (for example while preparing for a free kick) or a short timeout. It is also sufficiently accurate to support sub-pixel localisation and orientation.

Acknowledgements

This research was performed within the Advanced Robotics and Intelligent Control Centre (ARICC). The authors would like to acknowledge the financial support of ARICC and the School of Electrical and Electronic Engineering at Singapore Polytechnic.

References