AAM and Non-rigid Registration in Augmented Reality

Yuan Tian; Tao Guan; Cheng Wang

doi:10.5772/7132

Author Information

Show +

Yuan Tian
- Digital Engineering and Simulation Research Center,Huazhong University of Science and Technology,Wuhan,, China
Tao Guan
- Digital Engineering and Simulation Research Center,Huazhong University of Science and Technology,Wuhan,, China
Cheng Wang
- Digital Engineering and Simulation Research Center,Huazhong University of Science and Technology,Wuhan,, China

*Address all correspondence to:

1. Introduction

Augmented Reality copes with the problem of dynamically augmenting or enhancing the real world with computer generated virtual objects [Azuma, 1997; Azuma, 2001]. Registration is one of the most pivotal problems in augmented reality applications. Typical augmented reality applications track 2D patterns on rigid planar objects in order to acquire the pose of the camera in the scene. Although the problem of rigid registration has been widely studied [Yuan et al., 2005; Yuan et al., 2006; Guan et al., 2008a; Guan et al., 2008b; Li et al., 2008], non-rigid registration is recently receiving more and more attention. There are many non-rigid objects existing in the real world such as animated faces, deformable cloth, hand and so forth. How to overlaid virtual objects on the non-rigid objects is particular challenging.

Recently, many related non-rigid registration approaches have been reported. In many cases (e.g. human faces), only a few feature points can be reliably tracked. In [Bartoli et al., 2004], a non-rigid registration method using point and curve correspondences was proposed to solve this problem. They introduced curves into the non-rigid factorization framework because there are several curves (e.g. the hairline, the eyebows) can be used to determine the mapping. The mapping function is computed from both point and curve correspondences. This method can successfully augment the non-rigid object with a virtual object. In [Pilet et al., 2005; Pilet et al., 2007], they presented a real-time method for detecting deformable surfaces with no need whatsoever for a prior pose knowledge. The deformable 2D meshes are introduced. With the use of fast wide baseline matching algorithm, they can superimpose an appropriately deformed logo on the T-shirt. These methods are robust to large deformations, lighting changes, motion blur and occlusions. To align the virtual objects generated by computers with the real world seamlessly, the accurate registration data should be provided. In general, registration can be achieved by solving a point matching problem. The problem is to find the correspondence between two sets of tracked feature points. Therefore, the detection of feature points and the points tracking are the two main problems. Rigid object detection and tracking have been extensively studied and effective, robust, and real-time solutions proposed [Lowe, 2004; Lepetit & Fua, 2005; Lepetit et al., 2005; Rosten & Drummond, 2005]. Non-rigid object detection and tracking is far more complex because the object is deformable and not only the registration data but also a large number of deformation parameters must be estimated.

Active Appearance Models (AAM), introduced a few years ago [Cootes et al., 1998; Cootes et al., 2001], are commonly used to track non-rigid objects, such as faces and hands. There are many methods have been proposed to track non-rigid objects using AAM. A working system for finding and tracking a human face and its features using active appearance models was presented in [Ahlberg, 2001]. A wireframe model is adapted to the face in each image. Then the model is matched to the face in the input image using active appearance algorithm. In [Sung & Kim, 2004], the previous 2D AAM is extended to 3D shape model and modified model fitting algorithm was proposed. In [Markin & Prakash, 2006], occluded images are included into AAM training data to solve the occlusion and self-occlusion problem. This approach can improve the fitting quality of the algorithm.

With known coordinates of 3D points in the world coordinates and the corresponding 2D image points, the camera pose can be estimated. For non-rigid objects, AAM algorithm is a robust method to acquire the 2D image points. It has been proven to be a useful method for matching any of the statistical models to a new image rapidly. The 3D points of the non-rigid objects can be represented by a linear combination of a set of 3D basis shapes. By varying the configuration weights and camera pose parameters, the error between the estimated 2D points (projected by the estimated 3D shapes using estimated camera pose parameters) and 2D tracking points can be minimized.

Many methods have been proposed to recover 3D shape basis from 2D image sequences. In [Tomasi & Kanade, 1992], the factorization method is used to recover shape and motion from a sequence of images under orthographic projection. The image sequence is represented as a measurement matrix. It is proved that under orthography, the measurement matrix is of rank 3 and can be factored into 3D pose and 3D shape matrix. Unfortunately this technique can not be applied to non-rigid deforming objects, since they are based on the rigidity assumption. The technique based on a non-rigid model is proposed to recover 3D non-rigid shape models under scaled orthographic projection [Bregler et al., 2000]. The 3D shape in each frame can be expressed by a linear combination of a set of K basis shapes. Under this model, the 2D tracking matrix is of rank 3K and can be factored into 3D pose, object configuration and 3D basis shapes with the use of SVD.

In this chapter, a novel non-rigid registration method for augmented reality applications with the use of AAM and factorization method is introduced. We focus on AAM algorithm and factorization method which can obtain the 3D shape basis, object configuration and 3D pose simultaneously. The following demonstrations are mainly based on the researches presented in [Tomasi & Kanade, 1992; Cootes et al., 1998; Bregler et al., 2000; Cootes et al., 2001; Xiao et al., 2004; Zhu et al., 2006; Tian et al., 2008]. In section 2, we will have a detailed review of active appearance models and focus on the way of how to track non-rigid objects. In section 3, we will illustrate how to compute the 3D shape basis, the camera rotation matrix and configuration weights of each training frame simultaneously from the 2D tracking data using factorization algorithm. In section 4, we will introduce how to compute the precise configuration weights and camera rotation matrix by optimization method and the experimental results are also given.

2. Tracking non-rigid objects using AAM

Tracking non-rigid objects using AAM includes five main steps. The first step is to obtain landmarks in training image set. Then establish the shape model and texture model separately. These two models are unified into one appearance model in the next step. Finally the difference between the closest synthesized AAM model and input image is minimized to get the accurate tracking result. The flowchart is shown in Figure 1.

Figure 1.
The flowchart of tracking non-rigid objects using AAM.

2.1. Obtain landmarks

Figure 2.
Examples of traning images manually labeled with consistent landmarks.

Before establishing the models, we should place hundreds of points in each 2D training image. The landmarks in each image should be consistent. We usually choose the points of high curve or junctions as landmark points which will control the shape of the target object strictly. The acquisition process is cumbersome especially when it is done manually. Researchers have looked for different ways to reduce the burden. Ideally, one only need to place points in one image and the corresponding points in the remain traning images can be found automatically. Obviously, this is impossible. However many semi-automatic methods have been proposed and can successfully find the correspondences accross an image set. Here we focus on the tracking procedure, more details about semi-autometic placement of landmarks can be found in [Sclaroff & Pentland, 1995; Duta et al., 1999; Walker et al., 2000; Andresen & Nielsen, 2001]. The examples of training images manually labeled with consistent landmarks are shown in Figure 2. We use 7 training images taken from different viewpoints. Each image is labeled with 36 landmarks shown as the “ × ” in the figures.

Given a training image set with key landmark points are marked on each example object, we can establish the shape and texture variations. These approaches will be detailedly illustrated in the following sections.

2.2. Establish shape model

Suppose L denotes the number of shapes in a training image set and m is the number of key landmark points on each image. The vector representation for each shape would then be:

X i = [ x 1 , x 2 , … , x m , y 1 , y 2 , … , y m ] T , i = 1,2, … , L E1

Then the training image set Ω = ( X 1 , X 2 , … , X L ) .

The shape and pose (include position, scale and rotation) of the object in each training image is different, so the shapes should be aligned to filter out the pose effects. We will first explain how to align two shapes and then extend it to a set of shapes.

If X and X’ are two shape vectors, the goal is to minimize the square sum of corresponding landmarks after alignment using similarity transformation technique. That is to minimize E:

E = | T ( X ) − X ′ | 2 E2

where

T ( x y ) = ( u − v v u ) ( x y ) + ( t x t y ) E3

Choose three kinds of transformation factors: scale factor-s, rotation factor-θ and translation factor-t. And u = s ⋅ cos θ , v = s ⋅ sin θ . Hence s 2 = u 2 + v 2 θ = tan − 1 ( v / u ) . Then equation (2) can be rewritten as:

E ( u , v , t x , t y ) = | T ( X ) − X ′ | 2 = ∑ i = 1 m ( u x i − v y i + t x − x ′ i ) 2 + ( v x i + u y i + t y − y ′ i ) 2 E4

To solve the minimum value of equation (4) is equivalent to let the partial derivative equals to zero. Then the transformation parameters are:

u = ( x ⋅ x ′ ) | x | 2 , v = ∑ i = 1 m ( x i y ′ i − y i x ′ i ) | x | 2 , t x = 1 m ∑ i = 1 m x ′ i , t y = 1 m ∑ i = 1 m y ′ i E5

The alignment of a set of shapes can be processed by iterative approach suggested by [Bookstein, 1996]. The detailed process is shown in Table 1.

So the new training image set Ω ^ = ( X ^ 1 , X ^ 2 , … , X ^ L ) . The mean shape is calculated by:

Step 1: Choose one shape as the initial estimate of the mean shape.

Step 2: Scale the mean shape so that | X ¯ 0 | = 1 .

Step 3: Align all the remaining shapes to the mean shape using the method described above.

Step 4: Re-estimate the mean shape from aligned shapes.

Step 5: Scale the new mean shape so that | X ¯ n e w | = 1 .

Step 6: If the new calculated mean shape doesn’t change significantly, convergence is declared; else return to step 3.

Table 1.

The process of aligning a set of shapes

X ¯ = 1 L ∑ i = 1 L X ^ i E6

The covariance matrix can thus be given as:

∑ s = 1 L − 1 ∑ i = 1 L ( X ^ i − X ¯ ) ( X ^ i − X ¯ ) T E7

Calculate the eigenvalue λ s , i of ∑ s and the corresponding eigenvector η s , i

∑ η s , i = λ s , i η s , i E8

Sort all the eigenvalues in descending order:

λ s , i ≥ λ s , i + 1 , i = 1,2, … ,2 m − 1 E9

The corresponding set of eigenvectors is Η s = [ η s ,1 , η s ,2 , … , η s ,2 m ]

To reduce the computational complexity, the principal component analysis (PCA) method is used to reduce the space dimensions. Choose t largest eigenvalues which satisfy the following condition:

∑ i = 1 t λ s , i ≥ 0.98 ( ∑ i = 1 2 m λ s , i ) E10

Then any shape instance can be generated by deforming the mean shape by a linear combination of eigenvectors:

X ^ ≈ X ¯ + Φ s b s E11

where Φ s is a matrix of dimension 2 m × t and contains t eigenvectors corresponding to the largest eigenvalues, Φ s = ( φ s ,1 | φ s ,2 | … | φ s , t ) . b s is a t dimensional vector and the eigenvectors are mutually orthogonal, so it can be given by:

b s = Φ − 1 s ( X ^ − X ¯ ) = Φ s T ( X ^ − X ¯ ) E12

By varying the elements of b_s , new shape instance can be generated using equation (11).

2.3. Establish texture model

To establish a complete appearance model, not only the shape but also the texture should be considered. We should firstly acquire the pixel information over the region covered by the landmarks on each training image. Then the image warping method is used to consistently collect the texture information between the landmarks on different training images. Finally establish the texture model using the PCA method.

The texture of a target object can be represented as:

g = [ g 1 , g 2 , … , g n ] T E13

where n denotes the number of pixel samples over the object surface.

In an image, the target object may occupy a small region of the whole image. The pixel intensities and global changes in illumination are different in each training image. The most important information is the texture that can reflect the characteristic of the target object. Due to the number of pixels in different target region is different and it is difficult to acquire the accurate corresponding relationship between different images, the texture model can not be established directly. We need to obtain a texture vector with the same dimension and corresponding relationship. So we warp each example image so that its control points match the reference shape. The process of image warping is described as follows:

Firstly, apply Delaunay triangulation to the reference shape to obtain the reference mesh which is consisted of a set of triangles. We choose the mean shape as the reference shape.

Secondly, suppose v 1 , v 2 and v 3 are three vertices of a triangle in the example mesh, any internal point v of the triangle can be written as:

v = v 1 + β ( v 2 − v 1 ) + γ ( v 3 − v 1 ) = α v 1 + β v 2 + γ v 3 E14

where α + β + γ = 1 . Constrain 0 ≤ α , β , γ ≤ 1 because v is inside the triangle. Given v = [ x , y ] T , v 1 = [ x 1 , y 1 ] T v 2 = [ x 2 , y 2 ] T v 3 = [ x 3 , y 3 ] T , then α , β , γ can be calculated by:

α = 1 − ( β + γ ) E15

β = y x 3 − x 1 y − x 3 y 1 − y 3 x + x 1 y 3 + x y 1 -x 2 y 3 + x 2 y 1 + x 1 y 3 + x 3 y 2 − x 3 y 1 − x 1 y 2 E16

γ = x y 2 − x y 1 − x 1 y 2 − x 2 y + x 2 y 1 + x 1 y -x 2 y 3 + x 2 y 1 + x 1 y 3 + x 3 y 2 − x 3 y 1 − x 1 y 2 E17

Finally, the corresponding point v’ of the triangle in the reference mesh can be calculated by:

v ′ = v ′ 1 + β ( v ′ 2 − v ′ 1 ) + γ ( v ′ 3 − v ′ 1 ) = α v ′ 1 + β v ′ 2 + γ v ′ 3 E18

where v ′ 1 , v ′ 2 and v ′ 3 are the vertices of the corresponding triangle in the reference mesh.

After image warping process, each shape in the training set is warped to the reference shape and sampled. The influence from the global linear changes in pixel intensities is removed.

To reduce the effects that caused by global lighting variations, the example samples are normalized by applying a scaling a and offset b:

g ′ = ( g ′ 1 , g ′ 2 g ′ 3 , … , g ′ n ) = ( g 1 − b a , g 2 − b a , g 3 − b a , … , g n − b a ) E19

where

b = 1 n ∑ i = 1 n g ′ i , a = σ , σ 2 = 1 n ∑ i = 1 n ( g ′ i − b ) 2 E20

The establishment of the texture model is identical to the establishment of the shape model which is also analyzed by PCA approach. The mean texture is calculated by:

g ¯ = 1 L ∑ i = 1 L g ′ i E21

The covariance matrix can thus be given as:

∑ g = 1 L − 1 ∑ i = 1 L ( g ′ i − g ¯ ) ( g ′ i − g ¯ ) T E22

Calculate the eigenvalue λ g , i of ∑ g and the corresponding eigenvector η g , i :

∑ η g , i = λ g , i η g , i E23

Sort all the eigenvalues in descending order:

λ g , i ≥ λ g , i + 1 , i = 1,2, … ,2 n − 1 E24

The corresponding set of eigenvectors is Η g = [ η g ,1 , η g ,2 , … , η g ,2 n ]

To reduce the computational complexity, the PCA method is used to reduce the space dimensions. Choose t largest eigenvalues which satisfy the following condition:

∑ i = 1 t λ g , i ≥ 0.98 ( ∑ i = 1 2 n λ g , i ) E25

Then any texture instance can be generated by deforming the mean texture by a linear combination of eigenvectors:

g ′ ≈ g ¯ + Φ g b g E26

where Φ g is a matrix of dimension 2 n × t and contains t eigenvectors corresponding to the largest eigenvalues, Φ g = ( ϕ g ,1 | ϕ g ,2 | … | ϕ g , t ) . b g is a t dimensional vector and the eigenvectors are mutually orthogonal, so it can be given by:

b g = Φ g − 1 ( g ′ − g ¯ ) = Φ g T ( g ′ − g ¯ ) E27

By varying the elements of b g , new texture instance can be generated using equation (26).

2.4 Establish combined model

Since the shape model and texture model have been established, any input image can be represented using the shape parameter vector b_s and texture parameter vector b_g . Since there are some correlations between shape and texture variations, the new vector b can be generated by combining b_s and b_g :

b = ( W s b s b g ) = ( W s Φ s T ( X ^ − X ¯ ) Φ g T ( g ′ − g ¯ ) ) E28

where W_s is a diagonal matrix which adjust the weighting between pixel distances and pixel intensities. W_s is calculated by:

W s = r I = [ r ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ r ] E29

where r ² is the ratio of the total intensity variation to the total shape variation:

r = λ g λ s , λ g = ∑ λ g , i , λ s = ∑ λ s , i E30

Apply PCA on b, then

b = Φ c c E31

where Φ c is the eigenvectors of covariance matrix corresponding to b:

Φ c = ( Φ c , s Φ c , g ) E32

c is a vector of appearance model parameters controlling both the shape and texture of the models.

Using the linear nature of the model, the combined model including shape X and texture g can be expressed as:

X = X ¯ + Φ s W − 1 Φ c , s c E33

g = g ¯ + Φ g Φ c , g c E34

Then a new image can be synthesised using equation (33) and (34) for a given c.

2.5 Fitting AAM to the input image

Fitting a AAMs to an image is considered to be a problem of minimizing the error between the input image and closest model instance [Wang et al., 2007]:

δ I = I i m a g e − I mod e l E35

Figure 3.
Results of tracking hand using AAM fitting.

where I i m a g e is the texture vector of the input image, and I mod e l is the texture vector of the model instance. The goal is to adjust the appearance model parameters c to minimize | δ I | 2 . The simplest way is to construct a linear relationship:

δ c = R δ I E36

where R can be computed by the following process:

Suppose c ₀ is the model parameter of the current image, new model parameter c can be generated by variating c ₀:
Generate new shape model X and normalized texture model g_m according to equation (33) and (34).
Deform the current image and get the corresponding texture model g_i , the difference vector can be written as:

δ g = g i − g m E38

δ g will change along with the variation of δ c and the shape model X

The fitting procedure is shown in Table 2 and the object tracking results are shown in Figure 3 and Figure 4. From the tracking results we can see that when the camera moves around the scene, the tracking results are satisfying.

3. Factorization algorithm

The flowchart of the factorization algorithm is shown in Figure 5, and the detailed demonstration will be given in the following sections.

3.1. Basic knowledge

The 3D shape of the non-rigid object can be described as a key frame basis set S 1 , S 2 , … , S K . Each key frame basis S i is a 3 × P matrix describing P points. The 3D shape of a specific configuration is a linear combination of the basis set:

Figure 4.
Results of tracking face using AAM fitting.

Step 1: Generate the normalized texture vector g m .

Step 2: Sample the image g i below the model shape.

Step 3: Evaluate the error vector δ g 0 = g i − g m .

Step 4: Evaluate the error E 0 = | δ g 0 | .

Step 5: Calculate the pose displacement δ t = R t δ g 0 .

Step 6: Calculate the displacement in model parameters δ c = R c δ g 0 .

Step 7: Set i = 1 .

Step 8: Update model parameters c = c − k i δ c .

Step 9: Transform the shape to invert the δ t transformation.

Step 10: Repeat step 1-4 to form a new error E i .

Step 11: If E i E 0 set i = i + 1 and go to step 8.

Step 12: Accept the new estimate.

where k = [ 1.5,0.5,0.125,0.0125,0.0625 ] T is the damping vector.

Table 2.

The procedure of fitting the input image to the model instance.

S = ∑ i = 1 K l i ⋅ S i S , S i ∈ R 3 × P , l i ∈ R E39

where

S = [ x 1 x 2 ... x P y 1 y 2 ... y P z 1 z 2 ... z P ] E40

.

Under a scaled orthographic projection, the P points of S are projected into 2D image points (u_i v_i ):

[ u 1 u 2 ... u P v 1 v 2 ... v P ] = R ( ∑ i = 1 K l i S i ) + T E41

Figure 5.
The flowchart of the factorization algorithm.

R = [ r 1 r 2 r 3 r 4 r 5 r 6 ] , T = [ t 1 t 2 t 3 ] E42

R contains the first two rows of the 3D camera rotation matrix. T is the camera translation matrix. As mentioned in [Tomasi & Kanade, 1992], we eliminate the camera translation matrix by subtracting the mean of all 2D points, and henceforth can assume that the 3D shape S is centred at the origin.

[ u ′ 1 u ′ 2 ... u ′ P v ′ 1 v ′ 2 ... v ′ P ] = [ u 1 − u ¯ u 2 − u ¯ ... u P − u ¯ v 1 − v ¯ v 2 − v ¯ ... v P − v ¯ ] E43

where

u ¯ = ∑ i = 1 P u i , v ¯ = ∑ i = 1 P v i E44

Therefore, we can rewrite equation (40) as:

[ u ′ 1 u ′ 2 ... u ′ P v ′ 1 v ′ 2 ... v ′ P ] = R ( ∑ i = 1 K l i S i ) E45

Rewrite the linear combination in equation (43) as a matrix-matrix multiplication:

[ u ′ 1 u ′ 2 ... u ′ P v ′ 1 v ′ 2 ... v ′ P ] = [ l 1 R l 2 R ... l K R ] ⋅ [ S 1 S 2 ... S K ] E46

The 2D image points in each frame can be obtained using AAM. The tracked 2D points in frame t can be denoted as ( u t i , v t i ) . The 2D tracking matrix of N frames can be written as:

W = [ u ′ 11 u ′ 12 ... u ′ 1 P v ′ 11 v ′ 12 ... v ′ 1 P u ′ 21 u ′ 22 ... u ′ 2 P v ′ 21 v ′ 22 ... v ′ 2 P ... ... ... ... u ′ N 1 u ′ N 2 u ′ N P v ′ N 1 v ′ N 2 v ′ N P ] E47

Using equation (44) we can rewrite equation (45) as:

W = [ l 11 R 1 l 12 R 1 ⋯ l 1 K R 1 l 21 R 2 l 22 R 2 ⋯ l 2 K R 2 ⋯ ⋯ ⋯ ⋯ l N 1 R N l N 2 R N ⋯ l N K R N ] ⋅ [ S 1 S 2 ⋯ S K ] E48

where R_t denotes the camera rotation of frame t. l_ti denotes the shape parameter l_i of frame t.

3.2 Solving configuration weights using factorization

Equation (46) shows that the 2D tracking matrix W has rank 3 K , and can be factored into the product of two matrixes: W = Q ⋅ B , where Q is a 2 N × 3 K matrix, B is a 3 K × P matrix. Q contains the camera rotation matrix R t and configuration weights l t 1 , l t 2 , ⋯ , l t K of each frame. B contains the information of shape basis S 1 , S 2 , ⋯ , S K . The factorization can be done using singular value decomposition (SVD) method:

W 2 N × P = U ˜ ⋅ D ˜ ⋅ V ˜ T = Q ˜ 2 N × 3 K ⋅ B ˜ 3 K × P E49

Then the camera rotation matrix R_t and shape basis weights l_ti of each frame can be extracted from the matrix Q ˜

q t = [ l t 1 R t l t 2 R t ⋯ l t K R t ] = [ l 1 r 1 l 1 r 2 l 1 r 3 ⋯ l K r 1 l K r 2 l K r 3 l 1 r 4 l 1 r 5 l 1 r 6 ⋯ l K r 4 l K r 5 l K r 6 ] E50

Transform q t into a new matrix q ˜ t :

q ˜ t = [ l 1 r 1 l 1 r 2 l 1 r 3 l 1 r 4 l 1 r 5 l 1 r 6 l 2 r 1 l 2 r 2 l 2 r 3 l 2 r 4 l 2 r 5 l 2 r 6 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ l K r 1 l K r 2 l K r 3 l K r 4 l K r 5 l K r 6 ] = [ l t 1 l t 2 ⋮ l t K ] ⋅ [ r t 1 r t 2 r t 3 r t 4 r t 5 r t 6 ] E51

here, the q ˜ t can be factored using SVD method.

3.3 Solving true rotation matrix and shape basis

As mentioned in [Tomasi & Kanade, 1992], the matrix R ˜ t is a linear transformation of the true rotation matrix R t . Likewise, S ˜ i is a linear transformation of the true shape matrix S i :

R t = R ˜ t ⋅ G , S i = G − 1 ⋅ S ˜ i E52

where G 3 × 3 is found by solving a nonlinear data-fitting problem. In each frame we need to constrain the rotation matrix to be orthonormal. The constraints of frame t are:

[ r t 1 r t 2 r t 3 ] G G − 1 [ r t 1 r t 2 r t 3 ] T = 1 E53

[ r t 4 r t 5 r t 6 ] G G − 1 [ r t 4 r t 5 r t 6 ] T = 1 E54

[ r t 1 r t 2 r t 3 ] G G − 1 [ r t 4 r t 5 r t 6 ] T = 0 E55

In summary, given 2D tracking data W , we can get the 3D shape basis S ˜ i , camera rotation matrix R ˜ t and configuration weights l t i of each training frame simultaneously using factorization method.

4. Non-rigid registration method

The 2D tracking data can be obtained using the AAM algorithm mentioned in section 2. With the use of the factorization method mentioned in section 3, we can acquire the 3D shape basis. The 3D shape is represented as 3D points in the world coordinates. Given the configuration weights, the 3D shape can be recovered by linear combination of the 3D shape basis. By projecting the 3D points to the 2D image with known camera rotation matrix (suppose the intrinsic camera matrix has been calculated), the estimated 2D points can be acquired. If the error between the 2D tracking data and the estimated 2D points is small enough, we can accept the configuration weights and the rotation matrix. Finally, the virtual object can be overlaid to the real scene using the camera rotation matrix.

The initial configuration weights and camera rotation matrix can not be precise. Optimization of the configuration weights should be done to minimize the error between the 2D tracking points detected by AAM and the estimated 2D points which is projected by the 3D shape points. This is a non-linear optimization problem which can be successfully solved by the optimization methods. Different with [Zhu et al., 2006], we use the Levenberg-Marquardt algorithm. Equation (54) shows the cost function.

∑ j ‖ s j − s ′ j ‖ 2 → min = ∑ j = 1 N ‖ s j − ( R ∑ i = 1 K l i S i ) ‖ 2 → min E56

where s j is the 2D tracking data j s ′ j is the projected point.

The procedure of non-rigid registration is shown in Table 3.

Step 1: Track the 2D points s j using AAM.

Step 2: Initialize the configuration weights l i .

Step 3: Initialize the camera rotation matrix R .

Step 4: Calculate the 3D shape S = ∑ i = 1 K l i S i

Step 5: Project the 3D points S to 2D image: s ′ j = A [ R | T ] ⋅ S

Step 6: Evaluate the projection error E = ∑ j = 1 N ‖ s j − s ′ j ‖ 2

Step 7: If E is not small enough, improve l i and R , then repeat step 4-6.

Step 8: Accept the configuration weights l i , and the camera rotation matrix R

Step 9: Overlaid the virtual object to the scene.

Table 3.

The procedure of non-rigid registration

Figure 6.
Examples of augmented images using our method.

Furthermore, we should take the orthonormality of rotation matrix into consideration. The proposed method has been implemented in C using OpenGL and OpenCV on a DELL workstation (CPU 1.7G×2, RAM 1G).

In offline stage, we construct the AAM hand models using 7 training images which are manually labelled with 36 landmarks as shown in Figure 1. We establish the hand shape basis using 300 training frames which is captured with a CCD camera. In online stage, the 2D points are tracked using AAM algorithm, and then Levenberg-Marquardt algorithm is used to optimize the parameters. Our experiment results are shown in Figure 6. From the results we can see that the virtual teapot can be overlaid on the hand accurately when the camera moves around the scene.

5. Conclusion

In this chapter, a non-rigid registration method for augmented reality applications using AAM and factorization method is proposed. The process is divided into two stages: offline stage and online stage. In the offline stage, the 3D shape basis is constructed. To obtain the shape basis of the object, we firstly factorize the 2D data matrix tracked by the AAM into the product of two matrixes. One matrix contains the camera rotation matrix and the configuration weights, and the other matrix contains the shape basis. Then the rotation matrix and the configuration weights can be separated using SVD method. Finally the orthonormality of the rotation matrix should be the constraints to get the true rotation matrix and configuration weights. In online stage, the 3D pose parameters and the shape coefficients are estimated. The purpose is to minimize the error between the 2D tracking points detected by AAM and the estimated 2D points which is projected by the 3D shape points. The Levenberg-Marquardt method is used to solve this problem. The rotation matrix and the configuration weights are optimized. Some experiments have been conducted to validate that the proposed method is effective and useful for non-rigid registration in augmented reality applications.

Acknowledgments

This work is supported in part by National Natural Science Foundation of China with project No. 60903095; in part by Postdoctoral Science Foundation of China with project No. 20080440941.

References

1. Ahlberg J. 2001 Using the active appearance algorithm for face and facial feature tracking. Proceedings of the 2nd International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 68 72 , 0-76951-074-4 Canada, July 2001, IEEE Computer Society, Los Alamitos.
2. Andresen P. R. Nielsen M. 2001 Non-rigid registration by geometry-constrained diffusion. Medical Image Analysis, 5 2 81-88, 1361-8415.
3. Azuma R. T. 1997 A survey of augmented reality. Presence-Teleoperators and Virtual Environments, 6 4 355-385, 1054-7460.
4. Azuma R. T. 2001 Recent advances in augmented reality. IEEE Computer Graphics and Applications, 21 6 34-47, 0272-1716.
5. Bartoli A. von Tunzelmann. E. Zisserman A. 2004 Augmenting images of non-rigid scenes using point and curve correspondences. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 699 706 , 0-76952-158-4 DC, June-Junly 2004, IEEE Conputer Society, Los Alamitos.
6. Bookstein F. L. 1996 Landmark methods for forms without landmarks: localizing group differences in outline shape. Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, 25 244 , 0-81867-367-2 Francisco, CA, June 1996, IEEE Computer Society, Los Alamitos.
7. Bregler C. Hertzmann A. Biermann H. 2000 Recovering non-rigid 3d shape from image streams. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 690 696 , 1063-6919, Hilton Head Island, SC, June 2000, IEEE Computer Society, Los Alamitos.
8. Cootes T. F. Edwards G. J. Taylor C. J. 1998 Active appearance models. Lecture Notes in Computer Science, 1407 484-498, 0302-9743.
9. Cootes T. F. Edwards G. J. Taylor C. J. 2001 Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 6 681-685, 0162-8828.
10. Cootes T. F. Taylor C. J. 2001 Statistical models of appearance for medical image analysis and computer vision. Proceedings of the Society of Photo-optical Instrumentation Engineers, 236 248 , 0-81944-008-6 Diego, CA, February 2001, SPIE-INT Society Engeering, Bellingham.
11. Duta N. Jain A. K. Dubuisson-Jolly M. P. 1999 Learning 2D shape models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 8 14 , 1063-6919, Fort Collins, USA, June 1999, IEEE Computer Society, Los Alamitos.
12. Guan T. Li L. J. Wang C. 2008a Registration using multiplanar structures for augmented reality systems. Journal of Computing and Information Science in Engineering, 8 4 041002-1~041002-6, 1530-9827.
13. Guan T. Li L. J. Wang C. 2008b Robust estimation of trifocal tensors using natural features for augmented reality systems. Computing and Informatics, 27 6 891-911, 1335-9150.
14. Lepetit V. Fua P. 2005 Monocular model-based 3D tracking of rigid objects: a survey. Foundations and Trends in Computer Graphics and Vision, 1 1 1-89, 1572-2740.
15. Lepetit V. Lagger P. Fua P. 2005 Randomized trees for real-time keypoint recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 775 781 , 0-76952-372-2 Diego, CA, June 2005, IEEE Computer Society, Los Alamitos.
16. Li L. J. Guan T. Ren B. et al. 2008 Registration based on Euclidean reconstruction and natural features tracking for augmented reality systems. Assembly Automation, 28 4 340-347, 0144-5154.
17. Lowe D. G. 2004 Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 20 2 91-110, 0920-5691.
18. Markin E. Prakash E. C. 2006 Tracking facial features with occlusions. Journal of Zhejiang University: Science A, 7 7 1282-1288, 1009-3095.
19. Pilet J. Lepetit V. Fua P. 2005 Real-time non-rigid surface detection. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 822 828 , 0-76952-372-2 Diego, CA, June 2005, IEEE Computer Society, Los Alamitos.
20. Pilet J. Lepetit V. Fua P. 2007 Fast non-rigid surface detection, registration and realistic augmentation. International Journal of Computer Vision, 76 2 109-122, 0920-5691.
21. Rosten E. Drummond T. 2005 Fusing points and lines for high performance tracking. Proceedings of 10th IEEE International Conference on Computer Vision, 1508 1515 , 076952334 Beijing, China, October 2005, IEEE Computer Society, Los Alamitos.
22. Sclaroff S. Pentland A. P. 1995 Modal matching for correspondence and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 7 545-561, 0162-8828.
23. Sung J. Kim D. J. 2004 Extension of AAM with 3D shape model for facial shape tracking. Proceedings of International Conference on Image Processing, 3363 3366 , 0-78038-554-3 October 2004, IEEE, New York.
24. Tian Y. Li Z. Y. Liu W. et al. 2008 Non-rigid registration using AAM and factorization method for augmented reality applications. Proceedings of the 2008 12th International Conference on Computer Supported Cooperative Work in Design, 705 709 , 978-1-42441-650-9 Xian, China, April 2008, IEEE, New York.
25. Tomasi C. Kanade T. 1992 Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9 2 137-154, 0920-5691.
26. Walker K. N. Cootes T. F. Taylor C. J. 2000 Determining correspondences for statistical models of appearance. Proceedings of 6th European Conference on Computer Vision, 829 843 , 3-54067-685-6 Ireland, June 2000, Springer, Berlin.
27. Wang S. C. Wang Y. S. Chen X. L. 2007 Weighted active appearance models. 3rd Internatioanl Conference on Intelligent Computing, 1295 1304 , 978-3-54074-170-1 Qingdao, China, August 2007, Springer, Berlin.
28. Xiao J. Baker S. Matthews I. Kanade T. 2004 Real-time combined 2D+3D active appearance models. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 535 542 , 0-76952-158-4 DC, June-July 2004, IEEE Computer Society, Los Alamitos.
29. Yuan M. L. Ong S. K. Nee A. Y. C. 2005 Registration based on projective reconstruction technique for augmented reality systems. IEEE Transactions on Visualization and Computer Graphics, 11 3 254-264, 1077-2626.
30. Yuan M. L. Ong S. K. Nee A. Y. C. 2006 Registration using natural features for augmented reality systems. IEEE Transactions on Visualization and Computer Graphics, 12 4 569-580, 1077-2626.
31. Zhu J. K. Hoi S. C. H. Lyu M. R. 2006 Real-time non-rigid shape recovery via active appearance models for Augmented Reality. 9th European Conference on Computer Vision, 186 197 , 3-54033-832-2 Austria, May 2006, Springer, Berlin.

[1] 1. Ahlberg J. 2001 Using the active appearance algorithm for face and facial feature tracking. Proceedings of the 2nd International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 68 72 , 0-76951-074-4 Canada, July 2001, IEEE Computer Society, Los Alamitos.

[2] 2. Andresen P. R. Nielsen M. 2001 Non-rigid registration by geometry-constrained diffusion. Medical Image Analysis, 5 2 81-88, 1361-8415.

[3] 3. Azuma R. T. 1997 A survey of augmented reality. Presence-Teleoperators and Virtual Environments, 6 4 355-385, 1054-7460.

[4] 4. Azuma R. T. 2001 Recent advances in augmented reality. IEEE Computer Graphics and Applications, 21 6 34-47, 0272-1716.

[5] 5. Bartoli A. von Tunzelmann. E. Zisserman A. 2004 Augmenting images of non-rigid scenes using point and curve correspondences. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 699 706 , 0-76952-158-4 DC, June-Junly 2004, IEEE Conputer Society, Los Alamitos.

[6] 6. Bookstein F. L. 1996 Landmark methods for forms without landmarks: localizing group differences in outline shape. Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, 25 244 , 0-81867-367-2 Francisco, CA, June 1996, IEEE Computer Society, Los Alamitos.

[7] 7. Bregler C. Hertzmann A. Biermann H. 2000 Recovering non-rigid 3d shape from image streams. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 690 696 , 1063-6919, Hilton Head Island, SC, June 2000, IEEE Computer Society, Los Alamitos.

[8] 8. Cootes T. F. Edwards G. J. Taylor C. J. 1998 Active appearance models. Lecture Notes in Computer Science, 1407 484-498, 0302-9743.

[9] 9. Cootes T. F. Edwards G. J. Taylor C. J. 2001 Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 6 681-685, 0162-8828.

[10] 10. Cootes T. F. Taylor C. J. 2001 Statistical models of appearance for medical image analysis and computer vision. Proceedings of the Society of Photo-optical Instrumentation Engineers, 236 248 , 0-81944-008-6 Diego, CA, February 2001, SPIE-INT Society Engeering, Bellingham.

[11] 11. Duta N. Jain A. K. Dubuisson-Jolly M. P. 1999 Learning 2D shape models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 8 14 , 1063-6919, Fort Collins, USA, June 1999, IEEE Computer Society, Los Alamitos.

[12] 12. Guan T. Li L. J. Wang C. 2008a Registration using multiplanar structures for augmented reality systems. Journal of Computing and Information Science in Engineering, 8 4 041002-1~041002-6, 1530-9827.

[13] 13. Guan T. Li L. J. Wang C. 2008b Robust estimation of trifocal tensors using natural features for augmented reality systems. Computing and Informatics, 27 6 891-911, 1335-9150.

[14] 14. Lepetit V. Fua P. 2005 Monocular model-based 3D tracking of rigid objects: a survey. Foundations and Trends in Computer Graphics and Vision, 1 1 1-89, 1572-2740.

[15] 15. Lepetit V. Lagger P. Fua P. 2005 Randomized trees for real-time keypoint recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 775 781 , 0-76952-372-2 Diego, CA, June 2005, IEEE Computer Society, Los Alamitos.

[16] 16. Li L. J. Guan T. Ren B. et al. 2008 Registration based on Euclidean reconstruction and natural features tracking for augmented reality systems. Assembly Automation, 28 4 340-347, 0144-5154.

[17] 17. Lowe D. G. 2004 Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 20 2 91-110, 0920-5691.

[18] 18. Markin E. Prakash E. C. 2006 Tracking facial features with occlusions. Journal of Zhejiang University: Science A, 7 7 1282-1288, 1009-3095.

[19] 19. Pilet J. Lepetit V. Fua P. 2005 Real-time non-rigid surface detection. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 822 828 , 0-76952-372-2 Diego, CA, June 2005, IEEE Computer Society, Los Alamitos.

[20] 20. Pilet J. Lepetit V. Fua P. 2007 Fast non-rigid surface detection, registration and realistic augmentation. International Journal of Computer Vision, 76 2 109-122, 0920-5691.

[21] 21. Rosten E. Drummond T. 2005 Fusing points and lines for high performance tracking. Proceedings of 10th IEEE International Conference on Computer Vision, 1508 1515 , 076952334 Beijing, China, October 2005, IEEE Computer Society, Los Alamitos.

[22] 22. Sclaroff S. Pentland A. P. 1995 Modal matching for correspondence and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 7 545-561, 0162-8828.

[23] 23. Sung J. Kim D. J. 2004 Extension of AAM with 3D shape model for facial shape tracking. Proceedings of International Conference on Image Processing, 3363 3366 , 0-78038-554-3 October 2004, IEEE, New York.

[24] 24. Tian Y. Li Z. Y. Liu W. et al. 2008 Non-rigid registration using AAM and factorization method for augmented reality applications. Proceedings of the 2008 12th International Conference on Computer Supported Cooperative Work in Design, 705 709 , 978-1-42441-650-9 Xian, China, April 2008, IEEE, New York.

[25] 25. Tomasi C. Kanade T. 1992 Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9 2 137-154, 0920-5691.

[26] 26. Walker K. N. Cootes T. F. Taylor C. J. 2000 Determining correspondences for statistical models of appearance. Proceedings of 6th European Conference on Computer Vision, 829 843 , 3-54067-685-6 Ireland, June 2000, Springer, Berlin.

[27] 27. Wang S. C. Wang Y. S. Chen X. L. 2007 Weighted active appearance models. 3rd Internatioanl Conference on Intelligent Computing, 1295 1304 , 978-3-54074-170-1 Qingdao, China, August 2007, Springer, Berlin.

[28] 28. Xiao J. Baker S. Matthews I. Kanade T. 2004 Real-time combined 2D+3D active appearance models. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 535 542 , 0-76952-158-4 DC, June-July 2004, IEEE Computer Society, Los Alamitos.

[29] 29. Yuan M. L. Ong S. K. Nee A. Y. C. 2005 Registration based on projective reconstruction technique for augmented reality systems. IEEE Transactions on Visualization and Computer Graphics, 11 3 254-264, 1077-2626.

[30] 30. Yuan M. L. Ong S. K. Nee A. Y. C. 2006 Registration using natural features for augmented reality systems. IEEE Transactions on Visualization and Computer Graphics, 12 4 569-580, 1077-2626.

[31] 31. Zhu J. K. Hoi S. C. H. Lyu M. R. 2006 Real-time non-rigid shape recovery via active appearance models for Augmented Reality. 9th European Conference on Computer Vision, 186 197 , 3-54033-832-2 Austria, May 2006, Springer, Berlin.

AAM and Non-rigid Registration in Augmented Reality

Augmented Reality

Author Information

Yuan Tian

Tao Guan

Cheng Wang

1. Introduction

2. Tracking non-rigid objects using AAM

Figure 1.

2.1. Obtain landmarks

Figure 2.

2.2. Establish shape model

Table 1.

2.3. Establish texture model

2.4 Establish combined model

2.5 Fitting AAM to the input image

Figure 3.

3. Factorization algorithm

3.1. Basic knowledge

Figure 4.

Table 2.

Figure 5.

3.2 Solving configuration weights using factorization

3.3 Solving true rotation matrix and shape basis

4. Non-rigid registration method

Table 3.

Figure 6.

5. Conclusion

Acknowledgments

References

Continue reading from the same book

Augmented Reality