Augmented Reality copes with the problem of dynamically augmenting or enhancing the real world with computer generated virtual objects [Azuma, 1997; Azuma, 2001]. Registration is one of the most pivotal problems in augmented reality applications. Typical augmented reality applications track 2D patterns on rigid planar objects in order to acquire the pose of the camera in the scene. Although the problem of rigid registration has been widely studied [Yuan et al., 2005; Yuan et al., 2006; Guan et al., 2008a; Guan et al., 2008b; Li et al., 2008], non-rigid registration is recently receiving more and more attention. There are many non-rigid objects existing in the real world such as animated faces, deformable cloth, hand and so forth. How to overlaid virtual objects on the non-rigid objects is particular challenging.
Recently, many related non-rigid registration approaches have been reported. In many cases (e.g. human faces), only a few feature points can be reliably tracked. In [Bartoli et al., 2004], a non-rigid registration method using point and curve correspondences was proposed to solve this problem. They introduced curves into the non-rigid factorization framework because there are several curves (e.g. the hairline, the eyebows) can be used to determine the mapping. The mapping function is computed from both point and curve correspondences. This method can successfully augment the non-rigid object with a virtual object. In [Pilet et al., 2005; Pilet et al., 2007], they presented a real-time method for detecting deformable surfaces with no need whatsoever for a prior pose knowledge. The deformable 2D meshes are introduced. With the use of fast wide baseline matching algorithm, they can superimpose an appropriately deformed logo on the T-shirt. These methods are robust to large deformations, lighting changes, motion blur and occlusions. To align the virtual objects generated by computers with the real world seamlessly, the accurate registration data should be provided. In general, registration can be achieved by solving a point matching problem. The problem is to find the correspondence between two sets of tracked feature points. Therefore, the detection of feature points and the points tracking are the two main problems. Rigid object detection and tracking have been extensively studied and effective, robust, and real-time solutions proposed [Lowe, 2004; Lepetit & Fua, 2005; Lepetit et al., 2005; Rosten & Drummond, 2005]. Non-rigid object detection and tracking is far more complex because the object is deformable and not only the registration data but also a large number of deformation parameters must be estimated.
Active Appearance Models (AAM), introduced a few years ago [Cootes et al., 1998; Cootes et al., 2001], are commonly used to track non-rigid objects, such as faces and hands. There are many methods have been proposed to track non-rigid objects using AAM. A working system for finding and tracking a human face and its features using active appearance models was presented in [Ahlberg, 2001]. A wireframe model is adapted to the face in each image. Then the model is matched to the face in the input image using active appearance algorithm. In [Sung & Kim, 2004], the previous 2D AAM is extended to 3D shape model and modified model fitting algorithm was proposed. In [Markin & Prakash, 2006], occluded images are included into AAM training data to solve the occlusion and self-occlusion problem. This approach can improve the fitting quality of the algorithm.
With known coordinates of 3D points in the world coordinates and the corresponding 2D image points, the camera pose can be estimated. For non-rigid objects, AAM algorithm is a robust method to acquire the 2D image points. It has been proven to be a useful method for matching any of the statistical models to a new image rapidly. The 3D points of the non-rigid objects can be represented by a linear combination of a set of 3D basis shapes. By varying the configuration weights and camera pose parameters, the error between the estimated 2D points (projected by the estimated 3D shapes using estimated camera pose parameters) and 2D tracking points can be minimized.
Many methods have been proposed to recover 3D shape basis from 2D image sequences. In [Tomasi & Kanade, 1992], the factorization method is used to recover shape and motion from a sequence of images under orthographic projection. The image sequence is represented as a measurement matrix. It is proved that under orthography, the measurement matrix is of rank 3 and can be factored into 3D pose and 3D shape matrix. Unfortunately this technique can not be applied to non-rigid deforming objects, since they are based on the rigidity assumption. The technique based on a non-rigid model is proposed to recover 3D non-rigid shape models under scaled orthographic projection [Bregler et al., 2000]. The 3D shape in each frame can be expressed by a linear combination of a set of K basis shapes. Under this model, the 2D tracking matrix is of rank 3K and can be factored into 3D pose, object configuration and 3D basis shapes with the use of SVD.
In this chapter, a novel non-rigid registration method for augmented reality applications with the use of AAM and factorization method is introduced. We focus on AAM algorithm and factorization method which can obtain the 3D shape basis, object configuration and 3D pose simultaneously. The following demonstrations are mainly based on the researches presented in [Tomasi & Kanade, 1992; Cootes et al., 1998; Bregler et al., 2000; Cootes et al., 2001; Xiao et al., 2004; Zhu et al., 2006; Tian et al., 2008]. In section 2, we will have a detailed review of active appearance models and focus on the way of how to track non-rigid objects. In section 3, we will illustrate how to compute the 3D shape basis, the camera rotation matrix and configuration weights of each training frame simultaneously from the 2D tracking data using factorization algorithm. In section 4, we will introduce how to compute the precise configuration weights and camera rotation matrix by optimization method and the experimental results are also given.
2. Tracking non-rigid objects using AAM
Tracking non-rigid objects using AAM includes five main steps. The first step is to obtain landmarks in training image set. Then establish the shape model and texture model separately. These two models are unified into one appearance model in the next step. Finally the difference between the closest synthesized AAM model and input image is minimized to get the accurate tracking result. The flowchart is shown in Figure 1.
2.1. Obtain landmarks
Before establishing the models, we should place hundreds of points in each 2D training image. The landmarks in each image should be consistent. We usually choose the points of high curve or junctions as landmark points which will control the shape of the target object strictly. The acquisition process is cumbersome especially when it is done manually. Researchers have looked for different ways to reduce the burden. Ideally, one only need to place points in one image and the corresponding points in the remain traning images can be found automatically. Obviously, this is impossible. However many semi-automatic methods have been proposed and can successfully find the correspondences accross an image set. Here we focus on the tracking procedure, more details about semi-autometic placement of landmarks can be found in [Sclaroff & Pentland, 1995; Duta et al., 1999; Walker et al., 2000; Andresen & Nielsen, 2001]. The examples of training images manually labeled with consistent landmarks are shown in Figure 2. We use 7 training images taken from different viewpoints. Each image is labeled with 36 landmarks shown as the “ ” in the figures.
Given a training image set with key landmark points are marked on each example object, we can establish the shape and texture variations. These approaches will be detailedly illustrated in the following sections.
2.2. Establish shape model
Suppose L denotes the number of shapes in a training image set and m is the number of key landmark points on each image. The vector representation for each shape would then be:
Then the training image set .
The shape and pose (include position, scale and rotation) of the object in each training image is different, so the shapes should be aligned to filter out the pose effects. We will first explain how to align two shapes and then extend it to a set of shapes.
If X and X’ are two shape vectors, the goal is to minimize the square sum of corresponding landmarks after alignment using similarity transformation technique. That is to minimize E:
Choose three kinds of transformation factors: scale factor-s, rotation factor-θ and translation factor-t. And , . Hence . Then equation (2) can be rewritten as:
To solve the minimum value of equation (4) is equivalent to let the partial derivative equals to zero. Then the transformation parameters are:
So the new training image set . The mean shape is calculated by:
The covariance matrix can thus be given as:
Calculate the eigenvalue of and the corresponding eigenvector
Sort all the eigenvalues in descending order:
The corresponding set of eigenvectors is
To reduce the computational complexity, the principal component analysis (PCA) method is used to reduce the space dimensions. Choose t largest eigenvalues which satisfy the following condition:
Then any shape instance can be generated by deforming the mean shape by a linear combination of eigenvectors:
where is a matrix of dimension and contains eigenvectors corresponding to the largest eigenvalues, . is a dimensional vector and the eigenvectors are mutually orthogonal, so it can be given by:
By varying the elements of bs , new shape instance can be generated using equation (11).
2.3. Establish texture model
To establish a complete appearance model, not only the shape but also the texture should be considered. We should firstly acquire the pixel information over the region covered by the landmarks on each training image. Then the image warping method is used to consistently collect the texture information between the landmarks on different training images. Finally establish the texture model using the PCA method.
The texture of a target object can be represented as:
where n denotes the number of pixel samples over the object surface.
In an image, the target object may occupy a small region of the whole image. The pixel intensities and global changes in illumination are different in each training image. The most important information is the texture that can reflect the characteristic of the target object. Due to the number of pixels in different target region is different and it is difficult to acquire the accurate corresponding relationship between different images, the texture model can not be established directly. We need to obtain a texture vector with the same dimension and corresponding relationship. So we warp each example image so that its control points match the reference shape. The process of image warping is described as follows:
Firstly, apply Delaunay triangulation to the reference shape to obtain the reference mesh which is consisted of a set of triangles. We choose the mean shape as the reference shape.
Secondly, suppose , and are three vertices of a triangle in the example mesh, any internal point of the triangle can be written as:
where . Constrain because is inside the triangle. Given , , then can be calculated by:
Finally, the corresponding point v’ of the triangle in the reference mesh can be calculated by:
where , and are the vertices of the corresponding triangle in the reference mesh.
After image warping process, each shape in the training set is warped to the reference shape and sampled. The influence from the global linear changes in pixel intensities is removed.
To reduce the effects that caused by global lighting variations, the example samples are normalized by applying a scaling a and offset b:
The establishment of the texture model is identical to the establishment of the shape model which is also analyzed by PCA approach. The mean texture is calculated by:
The covariance matrix can thus be given as:
Calculate the eigenvalue of and the corresponding eigenvector :
Sort all the eigenvalues in descending order:
The corresponding set of eigenvectors is
To reduce the computational complexity, the PCA method is used to reduce the space dimensions. Choose t largest eigenvalues which satisfy the following condition:
Then any texture instance can be generated by deforming the mean texture by a linear combination of eigenvectors:
where is a matrix of dimension and contains t eigenvectors corresponding to the largest eigenvalues, . is a t dimensional vector and the eigenvectors are mutually orthogonal, so it can be given by:
By varying the elements of , new texture instance can be generated using equation (26).
2.4 Establish combined model
Since the shape model and texture model have been established, any input image can be represented using the shape parameter vector bs and texture parameter vector bg . Since there are some correlations between shape and texture variations, the new vector b can be generated by combining bs and bg :
where Ws is a diagonal matrix which adjust the weighting between pixel distances and pixel intensities. Ws is calculated by:
where r 2 is the ratio of the total intensity variation to the total shape variation:
Apply PCA on b, then
where is the eigenvectors of covariance matrix corresponding to b:
c is a vector of appearance model parameters controlling both the shape and texture of the models.
Using the linear nature of the model, the combined model including shape X and texture g can be expressed as:
Then a new image can be synthesised using equation (33) and (34) for a given c.
2.5 Fitting AAM to the input image
Fitting a AAMs to an image is considered to be a problem of minimizing the error between the input image and closest model instance [Wang et al., 2007]:
where is the texture vector of the input image, and is the texture vector of the model instance. The goal is to adjust the appearance model parameters to minimize . The simplest way is to construct a linear relationship:
where R can be computed by the following process:
Suppose c 0 is the model parameter of the current image, new model parameter c can be generated by variating c 0:
Generate new shape model X and normalized texture model gm according to equation (33) and (34).
Deform the current image and get the corresponding texture model gi , the difference vector can be written as:
The fitting procedure is shown in Table 2 and the object tracking results are shown in Figure 3 and Figure 4. From the tracking results we can see that when the camera moves around the scene, the tracking results are satisfying.
3. Factorization algorithm
The flowchart of the factorization algorithm is shown in Figure 5, and the detailed demonstration will be given in the following sections.
3.1. Basic knowledge
The 3D shape of the non-rigid object can be described as a key frame basis set . Each key frame basis is a matrix describing points. The 3D shape of a specific configuration is a linear combination of the basis set:
Under a scaled orthographic projection, the P points of S are projected into 2D image points (ui vi ):
R contains the first two rows of the 3D camera rotation matrix. T is the camera translation matrix. As mentioned in [Tomasi & Kanade, 1992], we eliminate the camera translation matrix by subtracting the mean of all 2D points, and henceforth can assume that the 3D shape S is centred at the origin.
Therefore, we can rewrite equation (40) as:
Rewrite the linear combination in equation (43) as a matrix-matrix multiplication:
The 2D image points in each frame can be obtained using AAM. The tracked 2D points in frame t can be denoted as . The 2D tracking matrix of N frames can be written as:
Using equation (44) we can rewrite equation (45) as:
where Rt denotes the camera rotation of frame t. lti denotes the shape parameter li of frame t.
3.2 Solving configuration weights using factorization
Equation (46) shows that the 2D tracking matrix has rank , and can be factored into the product of two matrixes: , where is a matrix, is a matrix. contains the camera rotation matrix and configuration weights of each frame. contains the information of shape basis . The factorization can be done using singular value decomposition (SVD) method:
Then the camera rotation matrix Rt and shape basis weights lti of each frame can be extracted from the matrix
Transform into a new matrix :
here, the can be factored using SVD method.
3.3 Solving true rotation matrix and shape basis
As mentioned in [Tomasi & Kanade, 1992], the matrix is a linear transformation of the true rotation matrix . Likewise, is a linear transformation of the true shape matrix :
where is found by solving a nonlinear data-fitting problem. In each frame we need to constrain the rotation matrix to be orthonormal. The constraints of frame t are:
In summary, given 2D tracking data , we can get the 3D shape basis , camera rotation matrix and configuration weights of each training frame simultaneously using factorization method.
4. Non-rigid registration method
The 2D tracking data can be obtained using the AAM algorithm mentioned in section 2. With the use of the factorization method mentioned in section 3, we can acquire the 3D shape basis. The 3D shape is represented as 3D points in the world coordinates. Given the configuration weights, the 3D shape can be recovered by linear combination of the 3D shape basis. By projecting the 3D points to the 2D image with known camera rotation matrix (suppose the intrinsic camera matrix has been calculated), the estimated 2D points can be acquired. If the error between the 2D tracking data and the estimated 2D points is small enough, we can accept the configuration weights and the rotation matrix. Finally, the virtual object can be overlaid to the real scene using the camera rotation matrix.
The initial configuration weights and camera rotation matrix can not be precise. Optimization of the configuration weights should be done to minimize the error between the 2D tracking points detected by AAM and the estimated 2D points which is projected by the 3D shape points. This is a non-linear optimization problem which can be successfully solved by the optimization methods. Different with [Zhu et al., 2006], we use the Levenberg-Marquardt algorithm. Equation (54) shows the cost function.
where is the 2D tracking data is the projected point.
The procedure of non-rigid registration is shown in Table 3.
Furthermore, we should take the orthonormality of rotation matrix into consideration. The proposed method has been implemented in C using OpenGL and OpenCV on a DELL workstation (CPU 1.7G×2, RAM 1G).
In offline stage, we construct the AAM hand models using 7 training images which are manually labelled with 36 landmarks as shown in Figure 1. We establish the hand shape basis using 300 training frames which is captured with a CCD camera. In online stage, the 2D points are tracked using AAM algorithm, and then Levenberg-Marquardt algorithm is used to optimize the parameters. Our experiment results are shown in Figure 6. From the results we can see that the virtual teapot can be overlaid on the hand accurately when the camera moves around the scene.
In this chapter, a non-rigid registration method for augmented reality applications using AAM and factorization method is proposed. The process is divided into two stages: offline stage and online stage. In the offline stage, the 3D shape basis is constructed. To obtain the shape basis of the object, we firstly factorize the 2D data matrix tracked by the AAM into the product of two matrixes. One matrix contains the camera rotation matrix and the configuration weights, and the other matrix contains the shape basis. Then the rotation matrix and the configuration weights can be separated using SVD method. Finally the orthonormality of the rotation matrix should be the constraints to get the true rotation matrix and configuration weights. In online stage, the 3D pose parameters and the shape coefficients are estimated. The purpose is to minimize the error between the 2D tracking points detected by AAM and the estimated 2D points which is projected by the 3D shape points. The Levenberg-Marquardt method is used to solve this problem. The rotation matrix and the configuration weights are optimized. Some experiments have been conducted to validate that the proposed method is effective and useful for non-rigid registration in augmented reality applications.