Chapter 1 Efficient Transformation Estimation Using Lie Operators : Theory , Algorithms , and Computational Efficiencies

In many pattern recognition problems such as handwritten character recognition, it would be a challenge to design a good classification function, which can eliminate irrelevant variabilities among objects of the same class, while at the same time, being able to identify meaningful differences between objects of different classes. For example, in order for an automatic technique to “recognize” a handwritten digit, the incoming digit pattern needs to be accurately classified into one out of ten possible categories (from “0” to “9”). One straightforward yet inefficient way of implementation would be to match the pattern with a set of prototypes, where almost all possible instances (e.g., different sizes, angles, skews, etc.) of the digit in each category must be stored, according to a certain distance measure. Consequently, the pattern will be classified into the category where the closest match with one of its prototype instances was found. This approach would lead to impractically large prototype sets in order to achieve high recognition accuracy. An alternative method is to use only one prototype for each category, where different “deformed” instances of the same prototype can be generated by geometric transformations (e.g., thickened or rotated) during the matching process so as to best fit the incoming digit pattern. To this end, the concept of Lie operators for the transformations would be applicable.


Introduction
In many pattern recognition problems such as handwritten character recognition, it would be a challenge to design a good classification function, which can eliminate irrelevant variabilities among objects of the same class, while at the same time, being able to identify meaningful differences between objects of different classes.For example, in order for an automatic technique to "recognize" a handwritten digit, the incoming digit pattern needs to be accurately classified into one out of ten possible categories (from "0" to "9").One straightforward yet inefficient way of implementation would be to match the pattern with a set of prototypes, where almost all possible instances (e.g., different sizes, angles, skews, etc.) of the digit in each category must be stored, according to a certain distance measure.Consequently, the pattern will be classified into the category where the closest match with one of its prototype instances was found.This approach would lead to impractically large prototype sets in order to achieve high recognition accuracy.An alternative method is to use only one prototype for each category, where different "deformed" instances of the same prototype can be generated by geometric transformations (e.g., thickened or rotated) during the matching process so as to best fit the incoming digit pattern.To this end, the concept of Lie operators for the transformations would be applicable.
More precisely, the pixel values of an incoming pattern (an digital image with N × N pixels) can be viewed as the components of a N 2 -dimensional (N 2 -D) vector.One pattern, or one prototype, is a point in this N 2 -D space.If we assume that the set of allowable transformations is continuous, then the set of all the patterns that can be obtained by transforming one prototype using one or a combination of allowable transformations is a surface in the N 2 -D pixel space.For instance, when a pattern I is transformed (e.g., rotated by an angle θ) according to a transformation s(I, θ), where θ is the only parameter, then the set of all the transformed patterns T I = {x|∃θ, for which x = s(I, θ)} (1) is a one-dimensional curve in the N 2 -D space.Here we assume that s is differentiable with respect to both I and θ, and s(I, 0) = I.When the set of transformations is parameterized by m parameters θ i , where i = 1, 2, ..., m, T I becomes a manifold (topological surface) with an intrinsic dimension being m.For instance, if the allowable transformations of character images are rotations and scaling, the surface will be a 2-D manifold.
In practice, the search for the best matching deformation of a prototype for an incoming pattern would be expensive computationally, if the set of all the patterns that can be obtained by transforming one prototype using one or a combination of allowable transformations is large.Therefore, computationally efficient transformation estimation methods for pattern matching will be highly desired.It turns out that the surface of possible transforms of a pattern can be approximated by its tangent plane at the pattern [18].More specifically, a linear approximation to the transform s(I, θ) of the pattern I can be obtained by the Taylor expansion of s around θ = 0: where L = ∂s(I,θ) ∂θ is called the Lie Derivative of the transform s, which is also known as the tangent vector.
To facilitate a better understanding of the key concepts of Lie derivatives, which establish a connection between groups of transformations of the input space and their effect on a functional of that space, as well as Lie operators, which can be used to approximate the transformed pattern in a computationally efficient way, we first provide an explanation of the theory in Section 2, by working through some concrete examples of Lie groups and algebras.We then address in Section 3 the key problem of transformation estimation where both fast and accurate estimation methods are desired.The computational efficiency of transformation estimation algorithms based on Lie operator based approach is then investigated in Section 4, where several fast search algorithms for transformation estimation in video coding are presented.Further investigation is conducted in Section 5, by comparing the Lie operator based approach against transformation estimation based on a full affine transform model, in terms of the tradeoffs between accuracies and computational efficiencies.

Lie groups
Being an algebraic structure, a group is a set with an operation that combines any two of its elements to form a third element.To qualify as a group, the set and the operation must satisfy four conditions, namely, closure, associativity, identity, and invertibility (see definition below).For instance, the integers endowed with the addition operation form a group.
Definition: A set with elements g i , g j , g k , . . . together with a combinatorial operation • form a group G if the following axioms are satisfied [5]: (iii) Identity: There exists an element e such that for every element g i ∈ G, we have (iv) Inverse: Every group element g i has an inverse (called g −1 i ), with the property Some groups carry additional geometric structures.For example, Lie groups are groups that also have a smooth (differentiable) manifold structure.The circle and the sphere are examples of smooth manifolds.Named after Sophus Lie, a nineteenth century Norwegian mathematician who laid the foundations of the theory of continuous transformation groups, Lie groups lie at the intersection of two fundamental fields of mathematics: algebra and geometry.A Lie group has the property that the group operations are compatible with its smooth structure.That is, the group operations are differentiable.More precisely, we have Definition: A Lie group consists of a manifold M n that parameterizes the group elements g(x), x ∈ M n and a combinatorial operation defined by g(x) • g(y) = g(z), where the coordinate z ∈ M n depends on the coordinates x ∈ M n , and y ∈ M n through a function z = Φ(x, y).There are two topological axioms for a Lie group [5].
(i) Smoothness of the group composition map: The group composition map z = Φ(x, y) is differentiable.
(ii) Smoothness of the group inversion map: The group inversion map y = ψ(x), defined by g(x) −1 = g(y), is differentiable.
Almost every Lie group is either a matrix group or equivalent to a matrix group, which greatly simplifies the description of the algebraic, topological, and continuity properties of the Lie groups.Let us consider the following example encountered in pattern recognition, where a prototype pattern can be represented as a computer image P[i, j], which can be interpreted as the discrete version of the continuous function f (X) = f (x, y).Assume that f is a differential function that maps points X = (x, y) in the plane 2 to , which is the intensity (or pixel value) of the point X. f : Next, the image is deformed (e.g., rotate by an angle θ)) via a transformation T θ (parameterized by θ), which maps bijectively a point of 2 back to a point of 2 : For example, T θ could represent rotating the pattern by an angle θ: These transformations form a group G, which can be represented by a matrix group, with the combinatorial operation • being the matrix multiplication.In particular, each element g(θ) of G is parameterized by one parameter θ: We show that G is indeed a group: and (iii) Identity: There exists an element e = g(0) = I 2 = 1 0 0 1 such that for every element (iv) Inverse: Every group element g(θ) has an inverse g(θ) −1 = g(−θ), such that We further show that G is also a Lie group with one parameter.To verify the two topological axioms for a Lie group, consider the group elements g(θ 1 ), g(θ 2 ), and g(θ 3 ), which are parameterized by θ i ∈ M, where M is one-dimensional curve (a smooth manifold).Given the combinatorial operation g(θ 1 ) • g(θ 2 ) = g(θ 3 ), it follows that the group composition map is differentiable.Furthermore, given the inverse g(θ 1 ) −1 = g(θ 2 ) the group inversion map is also differentiable.
The study of Lie groups can be greatly simplified by linearizing the group in the neighborhood of its identity.This results in a linear vector space called a Lie algebra [4].The Lie algebra retains most of the properties of the original Lie group.Next, we use again the rotation of an image as an example of transformation to illustrate how to linearize the Lie transformation group.

Lie operators and Lie algebras
Assume that the intensity of the original 2D image at location (u, v) is given by f (u, v), where f is a differentiable function.In order to determine the intensity of the rotated image at a point (x, y), we need to calculate the location from which the rotation operation originated.This can be accomplished by taking the inverse transformation as Let s( f , θ)(x, y) denote the intensity of the rotated image at point (x, y), then That is, the intensity of the rotated pattern at point (x, y) equals to the intensity of the original pattern at the coordinate found by applying T −1 θ on (x, y).Differentiating s with respect to θ around θ = 0 gives Using Taylor series expansion, we have Thus the intensity of the rotated pattern image can be approximated by where L θ is the so-called Lie operator, given by Each rotated image with a certain angle θ corresponds to a point from a Lie group with one parameter.
More generally, if the transformation group is a Lie group with m parameters Θ = (θ 1 , θ 2 , . . ., θ m ), then after transformation, the intensity of the deformed image, s( f , Θ) is related to the original image f by the following approximation: where the operators L θ 1 , L θ 2 , • • • , L θ m are said to generate a Lie algebra, which is a linear vector space.A vector space is a mathematical structure formed by a collection of vectors, which may be added together and multiplied by numbers (scalars).More precisely, Definition: A Lie algebra is a vector space V over a field F, with an product operation V × V → V denoted by [X, Y], which is called the Lie bracket of X ∈ V and Y ∈ V, with the following axioms [16]: (i) The bracket operation is bilinear.
In axiom (i), the bilinear operation refers to a function that combining two elements of the vector space to yield a third element in the vector space, which is linear in each of its arguments.As an example, matrix multiplication is bilinear: To illustrate the concept of Lie brackets, let us consider another transformation with three parameters (a, b, c) T −1 (a,b,c) : (x, y) → (ax + c, by), (20) which corresponds to the matrix group Similar to the group g(θ) in ( 6), it can be shown that g(a, b, c) is also a Lie group.However, the intensity of the pattern image after this new transformation is given by By following the procedure outlined in ( 15) through (18), we can obtain the three Lie operators as follows: These three Lie operators generate a Lie algebra, with the Lie bracket between any two operators X and Y defined as where X • Y denotes the operation of applying the operator Y, followed by applying the operator X.
It can be easily checked that the Lie bracket [X, Y] is bilinear (axiom (i) of Lie algebra).Next, for any operator X ∈ L a , L b , L c , we have [X, X] = X • X − X • X = 0, thereby satisfying axiom (ii).Verifying the Jacob identify requires additional efforts.First, we have Similarly, and Therefore, It follows that the Jacob identity holds.
The result of applying the three Lie operators to a function f , which is a 2D image in our example, is the set of vectors known as tangent vectors (also called the Lie derivatives of the transformation).These tangent vectors generate the so-called tangent space.Each point in the tangent space corresponds to a transformation, and any transformation of the Lie group g(a, b, c) corresponds to a point in the tangent space.

Lie operators on discrete images
As shown in (17), given a continuous image f , by applying the Lie operator (L θ ) for rotation, we can approximate the rotated image as s( f However, in many practical applications, we need to deal with computer images.Given a discrete image I, in order to apply a Lie operator, which involves derivatives, we first convert I into a continuous one ( f ) by means of convolution: f = I * g σ , where g σ is a 2D Gaussian function defined in [18] as: In our study, besides rotation (R), we will consider several other types of transformations, such as scaling (S), parallel deformation (P), and diagonal deformation (D), as defined in Table 1.To distinguish the Lie operators for different types of transformations, we use L R to denote the Lie operator for rotation.After applying To avoid high computational complexity associated with the convolution operation and the calculation of the partial derivatives of the Gaussian function in (30), we can apply the Lie operator on the discrete image directly, by using the following approximations [14]. where and After the Lie operator is applied, the rotated version of the image I can then be easily obtained as For small angles (θ), the approximation tends to be reasonably good.
Similarly, we can obtain the transformed images for other types of transformations, based on their associated Lie operators (summarized in the third column of Table 1), which can be derived in a similar fashion to L R .

Transformation
Transformation matrix T θ Lie operator and (adapted from [18]) the transformed image Parallel Deformation (P) 1 + θ, 0 0, 1 − θ Table 1.Six types of transformation and their associated Lie operators (θ is the degree of the transformations).
We can see from (32) that only simple subtractions and multiplications are involved in applying the Lie operator to obtain L R (I), which needs to be calculated just once, since a different transformed version I R corresponding to a different degree of transformation (θ) can be obtained by using the same L R (I).Therefore, the implementation of Lie operators has fairly low computational complexity.

Lie operators for transformation estimation
Lie operators were proposed in [18] as an effective method for handling transformation invariance in handwritten digit pattern recognition [19].In order for an automatic method to "recognize" a handwritten digit, the incoming digit pattern needs to be accurately classified into one out of ten possible categories.one method is to use an only one prototype image (I) for each category, with different "deformed" instances, s(I, Θ), of the same prototype image being generated by geometric transformations during the matching process so as to best fit the incoming digit pattern.As mentioned in the section 1, when the set of transformations is parameterized by m parameters θ i ∈ Θ (rotation, scaling, etc.), the transformed image s(I, Θ) is a surface (manifold) with intrinsic dimension of at most m.In general, such a manifold is not linear.Matching a deformable prototype to an incoming pattern now amounts to finding the point on the manifold that is at a minimum distance from the point in the pixel space corresponding to the incoming pattern.Because the manifold has no analytical expression, the matching process can be very difficult.However, if the set of transformations happens to be linear in the pixel space, then the manifold is a linear subspace (a plane).The matching procedure is then reduced to finding the shortest distance between a point (vector) and a plane, or between two tangent planes corresponding to their original manifolds, which is the idea of tangent distance in [18].While the tangent distance is able to capture the transformation invariance, it involves solving of a complicated least-square problem, which is not only computationally expensive, but also prone to numerical instability issues associated with solving linear systems.Therefore, conventional Euclidean distance between patterns, due to its fast and easy calculation, was also used in conjunction with the tangent distance in actual implementation.
On the other hand, for many pattern recognition tasks, e.g., character recognition, a set of allowable deformations of the prototype might have been known a priori.Therefore, one can generate on-the-fly a set of varying transformed versions of the same prototype I, by using the Lie operators associated with the transforms, in a computationally efficient way.For example, a set of rotated images I R (θ i ), where i = 1, 2, . . ., n, can be readily obtained by where L R (I) can be pre-computed and shared by calculations of different I R (θ i ).
Thus, transformation estimation refers to matching an incoming pattern image P to the "closest" I R (θ i ), which has the shortest distance with P. For simplicity, the Euclidean distance could be used.
In transformation estimation, we search for a value for θ that best matches the degree of transformation the prototype has undergone in relation to the incoming pattern.If the best θ value is found to be zero in the case of rotation, then the resultant rotated version will be the same as the original prototype.If θ has a larger search range, then the probability of finding a better match may be increased; however, the complexity of searching will be increased as we have to examine more candidates.On the other hand, the step size of θ is also an important parameter that controls the "granularity" of the searching.By decreasing the step size, we may be able to enlarge the searching range of θ without increasing the search complexity.We can further lower the searching complexity by using variable step sizes.For example, we can employ finer-granular searching by taking smaller step sizes for small θ values, whereas the step size increases as the search drifts away from the centers of the range of allowable θ values [11].
Transformation estimation can be viewed as a generalized operation of the translation motion estimation.In the following, we present a case study to illustrate the design of computationally efficient transformation estimation algorithms based on Lie operators, by selecting the subject of local motion estimation in video coding, where both accurate and fast motion estimation is critical [12].We then discuss several methods in which multiple Lie operators can be combined to detect smaller degrees of object motions in video frames such as scaling, rotations and deformations, with varying computational complexities.We then provide both analytical and empirically obtained results regarding the tradeoffs between estimation accuracies and computational complexities for these methods [13].

Transformation estimation in video coding
Motion estimation is a critical component of almost every video coding system [10][17] [23].
Most compression techniques exploit the temporal redundancy that exists between the succeeding frames.In motion estimation, we search for any object in the previous frame that provides a good match of an object in the current frame within a sequence of images (frames).Motion compensation refers to representing objects in the current frame by their match objects in the previous frame.Conventional motion estimation algorithms in video coding consider only translations as an approximation to a combination of potential motions of objects in a video scene, including scaling, rotation, deformations and so on.

Block-based translation motion estimation
Block-based motion estimation [10][23] has been adopted in international standards for video coding such as MPEGs and H.264, where each frame is partitioned evenly into square blocks.Motion estimation is applied on a block-by-block basis so that each block is associated with a motion vector.Motion vectors are used to produce a motion-compensated prediction of a frame to be transmitted from a previously transmitted reference frame [1] [7].Motion estimation enables us to transmit the frame difference as an update between the current frame and the motion-compensated prediction of the current frame from the previous frame, rather than the entire current frame, thereby achieving compression by user fewer bits to code the current frame.
In block-based motion estimation, each frame is divided into evenly partitioned square blocks (4 × 4, 8 × 8, . .., etc.).We attempt to predict the current frame (F 2 ) from the previous frame (F 1 ).The prediction is obtained by taking the best match of each block of F 2 within the searching window of F 1 .The match criterion is typically based on mean square error (MSE).
The block with the minimum MSE is considered to be the best match, and its associated motion vector (dx, dy) is given by (dx, dy) = arg min where B is the block size, [−R, R] is the searching window, and x = m × B + i, and y = n × B + j for the block (m, n).Note that the motion vector for a still block is (0, 0).
After finding in F 1 the best match block of each block in F 2 , the prediction frame (P 1 ) of F 2 can then be constructed.To determine the accuracy of the prediction, the PSNR between F 2 and P1 is calculated as where MSE avg is the average mean square error between F 2 and P 1 as given by Note that MSE m,n is defined in (36), and M × N is the total number of blocks in a frame.
Conventional motion estimation algorithms in video coding consider only translations as an approximation to a variety of object motions; therefore, they have limitations in capturing potential motions such as scaling, rotations and deformations in a video scene other than the translation.The reason for the widespread use of the translation model lies partly in its simplicity -translation model can be readily characterized by displacement motion vectors and can thus be implemented with much lower complexity than other non-linear motion models used to describe non-translation motions.Nonetheless, the accuracy of the motion estimation would be sacrificed by considering the translation model alone.

Block-based transformation estimation
Non-translational transformation estimation can be introduced into video coding to further increase the overall motion estimation accuracy.More specifically, the conventional (translation) motion estimation is applied on the previous frame (F 1 ) based on the current frame (F 2 ).We can construct a predicted frame P (of the current frame) from the previous frame by using the resulting motion vectors associated with each block in the current frame (see Fig. 4).The accuracy of the predicted frame P (relative to the current frame F 2 ) can be represented by PSNR 1 .Next, transformation estimation based on the Lie operators is applied on the match blocks (B P ) in the predicted frame P to further improve the motion estimation accuracy.For each block B P in P, we search for the best parameter θ from the set of candidate parameters that yields the smallest mean square error between the transformed version B T and the corresponding block (B C ) in the current frame F 2 .Consequently, a new predicted frame P T can be formed by the resulting blocks of B T .The accuracy of the newly predicted frame P T can be represented as PSNR 2 .As expected, P T will become a better prediction of the current frame than P, thereby achieving an increased accuracy in motion estimation and prediction.The accuracy of the motion estimation can be measured by the PSNR between the current frame and the predicted frame.The improved accuracy due to the motion models is calculated as (PSNR 2 − PSNR 1 ).The accuracy of the motion estimation can be improved by considering other types of transformations as well.

Computational efficiencies of transformation estimation using multiple Lie operators
We first examine the full search method that exhaustively searches for the best combination of four types of Lie operators (R, S, P and D).In order to reduce the high computational complexity associated with the full search method, we then consider the following three parameter-search methods: dynamic programming (DP)-like search, iterative search, and serial search.They combine the Lie operators in different ways, with varying accuracy-complexity tradeoffs [13].most straightforward and yet the most computationally expensive method.In this method, we search through all possible (4 4 = 256) paths that start from block B P (of the predicted frame P) and end on the transformed block B T (of a more accurately predicted frame than P), and select the path (i.e., the combination of the four Lie operators) whose output block B T is the most accurately predicted version of block B C in the current frame.

Full search
Assume that x is the computational complexity of motion estimation for a single Lie operator.Thus the complexity associated with any path of four operators from B P to B T in Fig. 2 is 4x.Since we need to search all 256 possible paths, the overall complexity of the full search method will be 1024x.
We can reduce the complexity of this brute-force search approach by dividing the estimation process into four stages, with each stage corresponding to one column of operators in Fig. 2.
In the first stage, there will be four estimation operations for R, S, P, and D, respectively, with complexity being 4x.In the second stage, we will apply the same four operators on one of the four candidate transformed blocks generated by one of the four operators in stage one.For example, starting with R in the first stage, we will examine R → R (R in the first stage, followed by R in the second stage), R → S, R → P, and R → D. Note that applying the R operator again on a block already rotated by the best θ value as found in the first stage of estimation would not be beneficial in general.However, further gains in the estimation accuracy might be achievable by considering other combinations such as R → S, R → P, and R → D. Therefore, the total complexity of the second stage will be 4 × 4x = 16x.Likewise, the complexity of the third stage will be 4 × 16x = 64x.In the last stage, the complexity will amount to 4 × 64x = 256x.Therefore, the overall complexity of the reduced-complexity full search method is 340x ( = 4x + 16x + 64x + 256x), merely 1/3 of that of the brute-force full search method.Even so, the complexity of the full search is still unacceptably high in practical applications.In order to further reduce the complexity, let us consider the following search methods.

Dynamic-programming-like search
Compared to the full search, the DP search method is a sub-optimal search method, which has a flavor of the dynamic-programming (DP) solution in finding the shortest path through a weighted graph [2].Similar to the Viterbi algorithm used in the decoding of convolutional codes [9], the DP search method keeps only those "survivors" (i.e., the best result obtained by each operator) in each stage (of the four stages in Fig. 2) for further searching operations.
In the first stage, there are four transformed blocks ("survivors"), corresponding to the four Lie operators considered, as a result of the estimation operations (with complexity being 4x).
In the second stage, four operators will be again applied to the survivors of stage 1.Take operator R as an example.Out of the four possible partial paths (R → R, S → R, P → R, and D → R) entering into R in the second stage, we choose the one that gives a transformed version with the smallest MSE value (as compared against the block in the current frame to be predicted).The transformed block so obtained will be stored as the survivor for operator R in the second stage, so will its originating operator in the first stage.Information about all other inferior partial paths will be discarded.In order to obtain other survivor blocks for stage two, the same procedure will be repeated for the other three operators.Therefore, the total complexity for stage two will be 4 × 4x = 16x.The four surviving blocks obtained in stage two will then be used for obtaining another four survivors in stage three in the same fashion, with the partial paths leading to the survivors getting longer.The same procedure will be repeated for stage four in order to obtain yet another four survivors, one of which with the least MSE will be the final winner.Hence the overall complexity for the DP-like method is 52x ( = 4x + 3 × 16x), which is less than 1/6 of that of the reduced-complexity full search method.
Although there is no guarantee of optimality in theory for this DP-like method, we expect its search results to be reasonably close to those yielded by the full search method.

Iterative search
To further reduce computational complexity, we introduce the iterative search method that performs the motion-parameter estimation through multiple iterations.In each iteration, we choose the best Lie operator (Fig. 3).For example, in the first iteration, the S operator may turn out to be the best operator.The scaled block will go through the same estimation process in the next iteration, which will output a transformed block with lower MSE values.Here we consider only four iterations to ensure fair comparison between this method and the other two methods previously discussed.Therefore, the overall complexity of this method will be 4 × 4x = 16x, which is slightly less than 1/3 of the complexity of the DP-like method.

Serial search
In the foregoing two search methods, the best quadruplet of Lie operators will not be known until the search is completed.Their complexities are higher than a simplified search method, where Lie operators are applied sequentially in a pre-determined order (e.g., the order of R → S → P → D in Fig. 4).Although this serial search method has the lowest complexity (4x), it

Comparison of computational complexity
As summarized in Table 2, the complexity of the DP-like search is 13 times that of the serial search and the complexity of the iterative search is 4 times that of the serial search.

Search Method Complexity
DP-like 52x Iterative 16x Serial 4x Table 2. Complexities required by the three search methods (x is the complexity associated with estimation using an individual operator).θ was chosen from [−0.14, 0.14], with a step size of 0.02.

Simulation results
We tested the above mentioned methods on three standard video sequences "Table Tennis", "Mobile Calendar", and "Tempete", all in the CIF format (288 × 352).Some samples frames of these sequences are shown in Figure 6.
In the simulations, a block size of (4 × 4) was used.The size of the search window is chosen to be ±15 pixels in translation motion estimation that precedes the transformation estimation using Lie operators.The search range of ±0.14 (with step size being 0.02) is chosen for θ R , θ S , θ P , and θ D for the DP-like, iterative, and serial search methods.
Simulation results are illustrated in Fig. 6, Fig. 7, and Fig. 8 for the three test sequences.Some statistics (maximum, minimum and the average values) of the PSNR improvements effected by the three search methods are listed in Table 3.We can see in Table 3 that the DP-like method significantly increases the accuracy of the predicted frames of all three sequences by as high as 2.6 dB and above 2.1 dB on average.The largest improvement (2.47 dB on average) is observed in "Mobile Calendar".This may be attributed to the existence of a great deal of non-translational motions in "Mobile Calendar" (e.g., the ball keeps rotating, and the camera is zooming out).On the other two sequences, about 2.1 dB increase can be achieved by the DP-like method.With less than 1/3 of the complexity required by the DP-like method, the iterative search can deliver an impressive estimation accuracy, especially on the "Mobile Calendar" (up to 2.31 dB and about 2dB on average).Similar to the case with the DP-like method, slightly lower PSNR improvements are observed on the other two sequences: on average, 1.75 dB and 1.66 dB for the sequences "Table Tennis" and "Tempete", respectively.As can be observed in Fig. 6, numerous deep plunges of the PSNR improvement (occurring in a range of frames around, e.g., 90 and 149) affect adversely the average PSNR improvement for "Table Tennis".These plunges occur whenever there is a scene change.For "Tempete", although there is no major scene change, a continuous influx of large number of new objects (e.g., small leaves blown by wind) tends to make transformation estimation less effective.

Search Method
With only one quarter of the complexity required by the iterative search, the serial search achieves average PSNR improvements of 1.60dB, 1.35dB and 1.27 dB on "Mobile Calendar", "Table Tennis", and "Tempete", respectively.The accuracy of this method is the lowest, which  indicates that changing the order of the Lie operators in a sequence does affects the motion estimation accuracy.
We also measured the actual computation times of the three search methods on a PC running Windows XP (with 3.40 GHz Pentium 4 CPU and 2GB RAM).The total running time of the subroutine for each method was first measured over all the frames in a test sequence.Then the average running time per block for each search method was calculated and listed in Table 4. On average, Time (DP like search) / Time (Serial Search) = 13.12, and Time (Iterative) / Time (Serial Search) = 4.03, which is in agreement with the analytical results listed in Table 2.As a reference, the average time was also measured for executing the subroutine for the conventional translation-only motion estimation that precedes the transformation estimation.As shown in Table 4, the complexity of the DP-like search, iterative search and the serial search methods is 69%, 21% and 5%, respectively, relative to that of the translation-only motion estimation method.Fig. 9 shows the empirical tradeoffs between the accuracies of these three search methods and their complexities.The best performance achievable is again observed in "Mobile Calendar" -an increase of 2.47 dB, 2.04 dB and 1.60 dB can be achieved with additional computational complexity of approximately 69%, 21% and 5% of that of the translation-only motion estimation.

Comparison with full affine transformation model
We want to compare the computational complexity of transformation estimation using Lie-operators to the complexity of transformation estimation using a full transformation model.We consider the affine model [8,20,21,24], which was widely used in the literature to detect non-translation motions due to its ability to offer good compromise between complexity and performance.In its generic form, the 6-parameter affine model can be expressed as Since a 3 and b 3 in (39) are translational displacements, the 6-parameter affine model can be simplified to a 4-parameter model by estimating the translation motions using the conventional block matching method.In fact, even if the more complex gradient descent method is used for motion parameter estimation, to assure convergence, translation motion estimation is often employed as an initial stage that computes a coarse estimate of the translation component of the set of the motion parameters, so that the starting point of the gradient descent should be within the "basin" of the global minimum [3,8].
. Increased accuracy vs. complexity.For each of the three sequences, from the right to the left, the three operating points correspond to the DP-like search, iterative search and the serial search methods, respectively.The normalized complexity is calculated as the ratio between the computation time for each search method and that for the translation-only motion estimation method as shown in Table 4.

A five-parameter affine transform model
Based on the above discussions, we choose the following affine motion model, which was used in [6] to improve the local translation motion compensation by taking into account rotation and scaling of small objects.
The five parameters in (40) are estimated by using a two-step search method [6,22].First, parameters (t x , t y ) corresponding to the translational motion between blocks in the current frame and the reference frame are searched for.This is a common step also shared by the Lie-operator approach (see Fig. 1), which operates on top of the match block yielded by the conventional translation block matching process.In the second step, the remaining three parameters for rotation and scaling (θ, K x , K y ) are searched for.For ease of coding, θ, K x and K y are chosen from small sets of discrete values.For example, θ ∈ [−0.02π, 0, 0.02π], and K x , K y ∈ [0.9, 1.0, 1.1] were chosen in [6].On the other hand, the Lie-operator method is also suitable for the estimation of these small degrees of transformation.For example, the iterative approach discussed in Section 4.3 with three operators (R, S x and S y in Table 1) can be employed.Since (u,v) calculated by (40) can be real numbers, the pixel values at (u,v) have to be interpolated from the pixel values of the surrounding pixels.Bilinear interpolations are often employed [6][24, pp.59].More specifically, we assume that the four surrounding pixels in the reference frame have values I u , v , I u+1 , v , I u , v+1 , and I u+1 , v+1 , where s is the floor function, which returns the nearest integer less than or equal to s.Thus the signal value at (u,v) can be interpolated as where and Clearly, there will be extra computation cost incurred by these interpolation operations, which is not required by the Lie-operator approach.

Comparison of computational complexity
We now analyze the complexity required by motion estimation using the affine model described in Section 5.1, and the iterative Lie-operator approach described in Section 4.3, which can offer variable tradeoffs between the increased estimation accuracy and computational complexity by varying the number of iterations.The computational complexity is estimated by counting the number of additions/subtractions (C add ), and the number of multiplications (C mult ).
As shown in Table 5, the complexity of applying the affine model is since one has to search for the best combination of the three types of motion parameters from W 3 possible choices, where W is the dimensionality of the candidate set for each motion parameter, which is assumed to be the same for each type of parameter, for ease of analysis and without much loss of generality.In the case of the above affine model given in Section 5.1, W = 3 was chosen [6].On the other hand, the complexity of the iterative Lie-operator approach is given in Table 6 for each iteration involving three operators (R, S x and S y ).Therefore, if Q iterations are used, the total complexity is It can be seen from Table 7 that with only 3 iterations, the Lie operator method performs closely to the affine model approach in terms of PSNR improvement; with one additional round of iteration, the Lie operator approach comes very close (within less than 0.1 dB) to the affine model approach.On a PC running Windows XP (with 3.40 GHz Pentium 4 CPU and 2GB RAM), the average running times of these two approaches were measured to be 0.46 ms/block (Lie operator, 4 iterations) and 1.46 ms/block (affine model).That is, Time (Lie operator, Q = 4) ≈ 1/3 Time (affine model), which agrees with our analysis in Section 5.2.
On the other hand, by comparing the data for iterative Lie operator approach in Table 3 and Table 7, it is obvious that the accuracy of the motion estimation can be increased significantly by using larger sets of candidate parameters (i.e., by increasing W in ( 46)) and considering more operators.Nevertheless, for the affine model, using a large W can lead to unacceptably large complexity, which increases linearly with W 3 in (45), as opposed to the almost linearly increased of the Lie operator approach with W in (46).Therefore, the Lie operators have a clear advantage in terms of computational complexity, as long as they can provide good approximations to small degrees of transformation.Nevertheless, in the case of large degrees of transformations, the search method based on the full affine transformation model would be more accurate than the fast method based on Lie operator.

Conclusion
Lie operators are useful for efficient handwritten character recognition.Multiple operators can be combined to approximate small degrees of object transformations, such as scaling, rotations and deformations.In this chapter, we first explained in a tutorial fashion the underlying theory of Lie groups and Lie algebras.We then addressed the key problem of transformation estimation based on Lie operators, where exhaustive full search method is often impractical due to its prohibitively huge computational complexity.To illustrate the design of computationally efficient transformation estimation algorithms based on Lie operators, we selected the subject of motion and transformation estimation in video coding as an example.We presented several fast search algorithms (including the dynamic programming like, serial, and iterative search methods), which integrated multiple Lie operators to detect smaller degrees of transformation in video scenes.We provided a detailed analysis of the varying tradeoffs between estimation accuracies and computational complexities for these transformation estimation algorithms.We demonstrated that non-translational transformation estimation based on Lie operators could be used to improve the overall accuracy of motion estimation in video coding, with only a modest increase of its overall computational complexity.In particular, we showed that the iterative search method based on Lie operators has much lower complexity than the transformation estimation method based on the full affine transformation model, with only negligibly small degradation in the estimation accuracy.

Figure 1 .
Figure 1.The transformation estimation system using a Lie operator: We search for the best θ in the set of candidates [θ 1 , θ 2 , . .., θ M ] such that the transformed block B T of the block B P in the prediction frame P will have the smallest MSE compared to the corresponding block B C in F 2 .

Fig. 2 Figure 2 .
Fig.2illustrates all possible combinations of the Lie operators for rotation, scaling, parallel deformation, and diagonal deformation.The highlighted path shows one combination (D → P → S → R), which means that block B P will be first diagonally deformed, and then the deformed block will go through the parallel deformation.The resultant block will be scaled and then rotated to obtain B T .The degree of motions (θ) associated with each participating operator is optimized by the searching procedure illustrated in Fig.1.The full search is the

Figure 3 .Figure 4 .
Figure 3. Iterative search.In each iteration, the best operator is selected as the one with the largest MSE reduction on the input block B P .The transformed block B T generated by the best operator found will be further transformed optimally in the next iteration.is unlikely to provide very accurate transformation estimation due to the non-communicative nature of the transformations.R S D P B P B T Figure 4. Serial search: we apply R, S, P and D operators sequentially to obtain the transformed block B T .

Figure 5 .
Figure 5. Sample frames of the video sequences.(a) The 1st frame of the "Table Tennis" sequence.(b) The 20th frame of the "Table Tennis" sequence.(c) The 1st frame of the "Mobile Calendar" sequence.(d) The 200th frame of the "Mobile Calendar" sequence.(e) The 1st frame of the "Tempete" sequence.(f) The 50th frame of the "Tempete" sequence.

Table 3 .
Increased estimation accuracy (in dB) for the three video sequences.

Table 4 .
Computation times (in ms / block) of the three methods for three video sequences.The normalized complexity is calculated as the ratio between the average computation time for each search method and the reference time (3.60 ms/block) for translation-only motion estimation method.

Table 5 .
Number of arithmetic operations (per block) required by the transformation estimation using the affine model in (40), based on a displaced block with motion vector (t x ,t y ).Assume that values of sin θ and cos θ can be obtained by looking up from a pre-calculated table, and that M is the number of pixels in a block.