0 Computing and Updating Principal Components of Discrete and Continuous Point Sets

Efficient computation and updating of the principal components are crucial in many applications including data compression, exploratory data analysis, visualization, image processing, pattern and image recognition, time series prediction, detecting perfect and reflective symmetry, and dimension detection. The thorough overview over PCA’s applications can be found for example in the textbooks by Duda et al. (2001) and Jolliffe (2002).


Introduction
Efficient computation and updating of the principal components are crucial in many applications including data compression, exploratory data analysis, visualization, image processing, pattern and image recognition, time series prediction, detecting perfect and reflective symmetry, and dimension detection.
The thorough overview over PCA's applications can be found for example in the textbooks by Duda et al. (2001) and Jolliffe (2002).
Dynamic versions of the PCA applications, i.e., when the point set (population) changes, are of big importance and interest.Efficient solutions of those problems depend heavily on an efficient dynamic computation of the principal components (eigenvectors of the covariance matrix).Dynamic updates of variances in different settings have been studied since the sixties by Chan et al. (1979), Knuth (1998), Pébay (2008), Welford (1962) and West (1979).Pébay (2008) also investigated the dynamic maintenance of covariance matrices.
The principal components of discrete point sets can be strongly influenced by point clusters (Dimitrov, Knauer, Kriegel & Rote (2009)).To avoid the influence of the distribution of the point set, often continuous sets, especially the convex hull of a point set is considered, which lead to so-called continuous PCA.Computing PCA bounding boxes (Gottschalk et al. (1996), Dimitrov, Holst, Knauer & Kriegel (2009)), or retrieval of 3D-objects (Vranić et al. (2001)), are typical applications where continuous PCA are of interest.
The organization and the main results presented in this chapter are as follows: In Section 2, we present a standard approach of computing principal components of discrete point set in R d .In Section 3, we present closed-form solutions for efficiently updating the principal components of a set of n points, when m points are added or deleted from the point set.For both operations performed on a discrete point set in R d , we can compute the new principal components in O(m) time for fixed d.This is a significant improvement over the commonly used approach of recomputing the principal components from scratch, which takes O(n + m) time.In Section 4 we consider the computation of the principal components of a dynamic continuous point set.We give closed form-solutions when the point set is a convex polytope R 3 .Solutions for the cases when the point set is the boundary of a convex polytope in R 2 or R 3 , or a convex polygon in R 2 , are presented in the appendix.Conclusion and open problems are presented in Section 5.
For implementation and verification of some the theoretical results presented here, we refer interested readers to Dimitrov, Holst, Knauer & Kriegel (2009) and Dimitrov et al. (2011).

Basics. Computing principal components -discrete case in R d
The central idea and motivation of PCA is to reduce the dimensionality of a point set by identifying the most significant directions (principal components).Let P = { p 1 , p 2 ,..., p n } be a set of vectors (points) in R d , and µ =( µ 1 , µ 2 ,...,µ d ) ∈ R d be the center of gravity of P. For 1 ≤ k ≤ d, we use p i,k to denote the k-th coordinate of the vector p i .Given two vectors u and v, we use u, v to denote their inner product.For any unit vector v ∈ R d , the variance of P in direction v is (1) The most significant direction corresponds to the unit vector v 1 such that var(P, v 1 ) is maximum.In general, after identifying the j most significant directions v 1 ,..., v j , the (j + 1)-th most significant direction corresponds to the unit vector v j+1 such that var(P, v j+1 ) is maximum among all unit vectors perpendicular to v 1 , v 2 ,..., v j .
It can be verified that for any unit vector v ∈ R d , where Σ is the covariance matrix of P. Σ is a symmetric d × d matrix where the (i, j)-th component, σ ij ,1 ≤ i, j ≤ d, is defined as The procedure of finding the most significant directions, in the sense mentioned above, can be formulated as an eigenvalue problem.If λ 1 ≥ λ 2 ≥•••≥λ d are the eigenvalues of Σ, then the unit eigenvector v j for λ j is the j-th most significant direction.Since the matrix Σ is symmetric positive semidefinite, its eigenvectors are orthogonal, all λ j s are non-negative and λ j = var(X, v j ).
Computation of the eigenvalues, when d is not very large, can be done in O(d 3 ) time, for example with the Jacobi or the QR method (Press et al. (1995)).Thus, the time complexity of computing principal components of n points in R d is O(n + d 3 ).The additive factor of O(d 3 ) throughout the paper will be omitted, since we will assume that d is fixed.For very large d, the problem of computing eigenvalues is non-trivial.In practice, the above mentioned methods for computing eigenvalues converge rapidly.In theory, it is unclear how to bound the running time combinatorially and how to compute the eigenvalues in decreasing order.In Cheng & Y. Wang (2008) a modification of the Power method (Parlett (1998)) is presented, which can give a guaranteed approximation of the eigenvalues with high probability.

Updating the principal components efficiently -discrete case in R d
In this section, we consider the problem of updating the covariance matrix Σ of a discrete point set P = { p 1 , p 2 ,..., p n } in R d , when m points are added or deleted from P. We give closed-form solutions for computing the components of the new covariance matrix Σ ′ .Those closed-form solutions are based on the already computed components of Σ.The recent result of Pébay (2008) implies the same solution for additions that will be presented in the sequel.
The main result of this section is given in the following theorem.
Theorem 3.1.Let P be a set of n points in R d with known covariance matrix Σ.Let P ′ be a point set in R d , obtained by adding or deleting m points from P. The principal components of P ′ can be computed in O(m) time for fixed d.

Proof. Adding points
Let P m = { p n+1 , p n+2 ,..., p n+m } be a point set with center of gravity µ m =(µ m 1 , µ m 2 ,...,µ m d ).We add P m to P obtaining new point set and Plugging-in the values of µ ′ i and µ ′ j in (4), we obtain:

265
Computing and Updating Principal Components of Discrete and Continuous Point Sets Plugging-in the values of µ ′ i and µ ′ j in (5), we obtain: where is the i, j-th element of the covariance matrix Σ m of the point set P m .Finally, we have Note that σ m ij , and therefore σ ′ ij , can be computed in O(m) time.Thus, for a fixed dimension d, the covariance matrix Σ also can be computed in O(m) time.
The above derivation of the new principal components is summarized in Algorithm 3.1.

Deleting points
Let P m = { p n−m+1 , p n−m ,..., p n } be a subset of the point set P, and let µ m =(µ m 1 , µ m 2 ,...,µ m d ) be the center of gravity of P m .We subtract P m from P, obtaining a new point set P ′ .The j-th component, µ ′ j ,1≤ j ≤ d, of the center of gravity Algorithm 3.1 :ADDINGPOINTS(P, µ, Σ, P m ) Input: point set P, the center of gravity µ of P, the covariance matrix Σ of P, point set P m added to P Output: principal componenets of P ∪ P m 1: compute the center of gravity µ m of P m 2: compute the covariance matrix Σ m of P m 3: compute the center of gravity µ ′ of P ∪ P m : 4: , where and Plugging-in the values of µ ′ i and µ ′ j in (9), we obtain:

267
Computing and Updating Principal Components of Discrete and Continuous Point Sets Plugging-in the values of µ ′ i and µ ′ j in (10), we obtain: where is the i, j-th element of the covariance matrix Σ m of the point set P m .
Finally, we have Note that σ m ij , and therefore σ ′ ij , can be computed in O(m) time.Thus, for a fixed dimension d, the covariance matrix Σ also can be computed in O(m) time.
As a corollary of ( 13), in the case when only one point, p e , is deleted from a point set P, the elements of the new covariance matrix are given by and also can be computed in O(1) time.
Similar argument holds in the case when only one point is added to a point set P, and then the new covariance matrix also can be computer in constant time.
Algorithm 3.2 :DELETINGPOINTS(P, µ, Σ, P m ) Input: point set P, the center of gravity µ of P, the covariance matrix Σ of P, point set P m deleted from P Output: principal components of P \ P m 1: compute the center of gravity µ m of P m 2: compute the covariance matrix Σ m of P m 3: compute the center of gravity µ ′ of P \ P m : 4: The above derivation of the new principal components is summarized in Algorithm 3.2.
Notice, that once having closed-form solutions, one can obtain similar algorithms, as presented in this chapter, when the continuous point sets are considered.Therefore, we will omit them in the next section and in the appendix.

Computing and updating the principal components efficientlymboxcontinuous case R 3
Here, we consider the computation of the principal components of a dynamic continuous point set.We present a closed form-solutions when the point set is a convex polytope or the boundary of a convex polytope in R 2 or R 3 .When the point set is the boundary of a convex polytope, we can update the new principal components in O(k) time, for both deletion and addition, under the assumption that we know the k facets in which the polytope changes.Under the same assumption, when the point set is a convex polytope in R 2 or R 3 , we can update the principal components in O(k) time after adding points.But, to update the principal components after deleting points from a convex polytope in R 2 or R 3 we need O(n) time.This is due to the fact that, after a deletion the center of gravity of the old convex hull (polyhedron) could lie outside the new convex hull, and therefore, a retetrahedralization is needed (see Subsection 4.1 and Subsection 6.2 for details).Due to better readability and compactness of the chapter, we present in this section only the closed-form solutions for a convex polytope in R 3 , and leave the rest of the results for the appendix.

Continuous PCA over a convex polyhedron in R 3
Let P be a point set in R 3 , and let X be its convex hull.We assume that the boundary of X is triangulated (if it is not, we can triangulate it in a preprocessing step).We choose an arbitrary point o in the interior of X, for example, we can choose o to be the center of gravity of the boundary of X.Each triangle from the boundary together with o forms a tetrahedron.Let the number of such formed tetrahedra be n.The k-th tetrahedron, with vertices ), for 0 ≤ s, t, u ≤ 1, and s + t + u ≤ 1.For 1 ≤ i ≤ 3, we use x i,j,k to denote the i-th coordinate of the vertex x j of the tetrahedron Q k .

269
Computing and Updating Principal Components of Discrete and Continuous Point Sets The center of gravity of the k-th tetrahedron is The contribution of each tetrahedron to the center of gravity of X is proportional to its volume.
If M k is the 3 × 3 matrix whose l-th row is x l,k − x 4,k , for l = 1 . . .3, then the volume of the k-th tetrahedron is We introduce a weight to each tetrahedron that is proportional to its volume, define as where v is the volume of X.Then, the center of gravity of X is The covariance matrix of the k-th tetrahedron is We would like to note that the above expressions hold also for any non-convex polyhedron that can be tetrahedralized.A star-shaped object, where o is the kernel of the object, is such example.

Adding points
We add points to P, obtaining a new point set P ′ .Let X ′ be the convex hull of P ′ .We consider that X ′ is obtained from X by deleting n d , and adding n a tetrahedra.Let The center of gravity of X ′ is Let Then, we can rewrite (15) as The i-th component of µ a and µ d ,1≤ i ≤ 3, is denoted by µ i,a and µ i,d , respectively.The 271 Computing and Updating Principal Components of Discrete and Continuous Point Sets Plugging-in the values of µ ′ i and µ ′ j in (17), we obtain: Plugging-in the values of µ ′ i and µ ′ j in (18), we obtain: From ( 25) and ( 26), we obtain Under certain assumptions, we can recompute the new principal components faster: • If we know that a certain point of the polyhedron will never be deleted, we can choose o to be that point.In that case, we also have the same closed-formed solution as for adding a point.
• Let the facets of the convex polyhedron have similar (uniformly distributed) area.We choose o to be the center of gravity of the polyhedron.Then, we can expect that after deleting a point, o will remain in the new convex hull.However, after several deletions, o could lie outside the convex hull, and then we need to recompute it and the tetrahedra associated with it.
Note that in the case when we consider boundary of a convex polyhedron (Subsection 6.1 and Subsection 6.3), we do not need an interior point o and the same time complexity holds for both adding and deleting points.

Conclusion
In this chapter, we have presented closed-form solutions for computing and updating the principal components of (dynamic) discrete and continuous point sets.The new principal components can be computed in constant time, when a constant number of points are added or deleted from the point set.This is a significant improvement of the commonly used approach, when the new principal components are computed from scratch, which takes linear time.The advantages of some of the theoretical results were verified and presented in the context of computing dynamic PCA bounding boxes in Dimitrov, Holst, Knauer & Kriegel (2009); Dimitrov et al. (2011).
An interesting open problem is to find a closed-form solutions for dynamical point sets different from convex polyhedra, for example, implicit surfaces or B-splines.An implementation of computing principal components in a dynamic, continuous setting could be a useful practical extension of the results presented here regarding continuous point sets.
Applications of the results presented here in other fields, like computer vision or visualization, are of high interest.

Appendix: Computing and updating the principal components efficientlycontinuous case
6.1 Continuous PCA over the boundary of a polyhedron in R 3 .
Let X be a polyhedron in R 3 .We assume that the boundary of X is triangulated (if it is not, we can triangulate it in a preprocessing), containing n triangles.The k-th triangle, with vertices x 1,k , x 2,k , x 3,k , can be represented in a parametric form by T k (s, t ( x 3,k − x 1,k ), for 0 ≤ s, t ≤ 1, and s + t ≤ 1.For 1 ≤ i ≤ 3, we denote by x i,j,k the i-th coordinate of the vertex x j of the triangle T k .The center of gravity of the k-th triangle is The contribution of each triangle to the center of gravity of the triangulated surface is proportional to its area.The area of the k-th triangle is We introduce a weight to each triangle that is proportional to its area, define as where a is the area of X.Then, the center of gravity of the boundary of X is The covariance matrix of the k-th triangle is The (i, j)-th element of Σ k , i, j ∈{1, 2, 3},is with µ =(µ 1 , µ 2 , µ 3 ).Finally, the covariance matrix of the boundary of X is

Adding points
We add points to X. Let X ′ be the new convex hull.We assume that X ′ is obtained from X by deleting n d , and adding n a tetrahedra.Then the sum of the areas of all triangles is

275
Computing and Updating Principal Components of Discrete and Continuous Point Sets www.intechopen.com The center of gravity of X ′ is Let Then, we can rewrite (29) as The i-th component of µ a and µ d ,1≤ i ≤ 3, is denoted by µ i,a and µ i,d , respectively.The Plugging-in the values of µ ′ i and µ ′ j in (31), we obtain: Plugging-in the values of µ ′ i and µ ′ j in (32), we obtain:

277
Computing and Updating Principal Components of Discrete and Continuous Point Sets From ( 39) and ( 40), we obtain can be computed in O(n a + n d ) time.

Deleting points
Let the new convex hull be obtained by deleting n d tetrahedra from and added n a tetrahedra to the old convex hull.Consequently, the same formulas and time complexity, as by adding points, follow.

Continuous PCA over a polygon in R 2
We assume that the polygon X is triangulated (if it is not, we can triangulate it in a preprocessing), and the number of triangles is n.The k-th triangle, with vertices ), for 0 ≤ s, t ≤ 1, and s + t ≤ 1.The center of gravity of the k-th triangle is The contribution of each triangle to the center of gravity of X is proportional to its area.The area of the i-th triangle is where × denotes the vector product.We introduce a weight to each triangle that is proportional to its area, define as where a is the area of X.Then, the center of gravity of X is The covariance matrix of the k-th triangle is

279
Computing and Updating Principal Components of Discrete and Continuous Point Sets www.intechopen.com

Adding points
We add points to X. Let X ′ be the new convex hull.We assume that X ′ is obtained from X by deleting n d , and adding n a triangles.Then the sum of the areas of all triangles is The center of gravity of X ′ is Let Then, we can rewrite (43) as The i-th component of µ a and µ d ,1≤ i ≤ 2, is denoted by µ i,a and µ i,d , respectively.The Plugging-in the values of µ ′ i and µ ′ j in (45), we obtain: From ( 53) and ( 54), we obtain can be computed in O(n a + n d ) time.
The covariance matrix of the boundary of X is

Adding points
We add points to X. Let X ′ be the new convex hull.We assume that X ′ is obtained from X by deleting n d , and adding n a line segments.Then the sum of the lengths of all line segments is The center of gravity of X ′ is Let Then, we can rewrite (57) as The i-th component of µ a and µ d ,1≤ i ≤ 2, is denoted by µ i,a and µ i,d , respectively.The Let Plugging-in the values of µ ′ i and µ ′ j in (59), we obtain: (66) Plugging-in the values of µ ′ i and µ ′ j in (60), we obtain: From ( 67) and ( 68

Deleting points
Let the new convex hull be obtained by deleting n d tetrahedra from and added n a tetrahedra to the old convex hull.Consequently, the same formulas and time complexity, as by adding points, follow.
www.intechopen.comComputingand Updating Principal Components of Discrete and Continuous Point Sets 9 Let the new convex hull be obtained by deleting n d tetrahedra from and added n a tetrahedra to the old convex hull.If the interior point o (needed for a tetrahedronization of a convex polytope), after several deletions, lies inside the new convex hull, then the same formulas and time complexity, as by adding points, follow.If o lie outside the new convex hull, then, we need to choose a new interior point o ′ , and recompute the new tetrahedra associated with it.Thus, we need in total O(n) time to update the principal components.
) can be computed in O(n a + n d ) time.273Computing and Updating Principal Components of Discrete and Continuous Point Sets www.intechopen.comDeleting points ).