Open access peer-reviewed chapter - ONLINE FIRST

# Variable Selection in Nonlinear Principal Component Analysis

Written By

Hiroko Katayama, Yuichi Mori and Masahiro Kuroda

Reviewed: February 16th, 2022 Published: April 10th, 2022

DOI: 10.5772/intechopen.103758

From the Edited Volume

## Principal Component Analysis [Working Title]

Prof. Fausto Pedro García Márquez

Chapter metrics overview

View Full Metrics

## Abstract

Principal components analysis (PCA) is a popular dimension reduction method and is applied to analyze quantitative data. For PCA to qualitative data, nonlinear PCA can be applied, where the data are quantified by using optimal scaling that nonlinearly transforms qualitative data into quantitative data. Then nonlinear PCA reveals nonlinear relationships among variables with different measurement levels. Using this quantification, we can consider variable selection in the context of PCA for qualitative data. In PCA for quantitative data, modified PCA (M.PCA) of Tanaka and Mori derives principal components which are computed as a linear combination of a subset of variables but can reproduce all the variables very well. This means that M.PCA can select a reasonable subset of variables with different measurement levels if it is extended so as to deal with qualitative data by using the idea of nonlinear PCA. A nonlinear M.PCA is therefore proposed for variable selection in nonlinear PCA. The method, in this chapter, is based on the idea in “Nonlinear Principal Component Analysis and its Applications” by Mori et al. (Springer). The performance of the method is evaluated in a numerical example.

### Keywords

• quantification
• categorical data
• modified PCA
• stepwise selection
• cumulative proportion
• RV-coefficient

## 1. Introduction

Principal components analysis (PCA) is a popular dimension reduction method and is applied to analyze quantitative data. For PCA to qualitative data, the data are quantified by using optimal scaling that nonlinearly transforms qualitative data into quantitative data. The PCA with optimal scaling is called nonlinear PCA. Nonlinear PCA reveals all qualitative variables uniformly as numerical variables by using optimal scaling quantifiers in the analysis, that is, it can deal with nonlinear relationships among variables with different measurement levels.

Using this quantification, we can consider variable selection in the context of PCA for qualitative data. In PCA for quantitative data, Tanaka and Mori discussed a method called modified PCA (M.PCA) that can be used to compute principal components (PCs) using only a selected subset of variables that represents all of the variables, including those not selected [1]. Since M.PCA includes variable selection procedures in the analysis, if we quantify all the qualitative variables by using the optimal scaling and then apply M.PCA to the quantified data, we can select a reasonable subset of variables from the qualitative data.

In this chapter, we refer to Mori et al. [2] to revisit a variable selection problem in PCA for qualitative data. The proposed method here (we call it nonlinear M.PCA or NL.M.PCA) is an extension of M.PCA so as to deal with a mixture of quantitative and qualitative data. In Section 2 we provide the overview of NL.M.PCA (optimization, the original M.PCA and NL.M.PCA for qualitative data) based on studies by Mori et al. [2], and in Section 3, we apply this method to the customer engagement data [3] to show how it works in the real data and how you use the output from the method for variable selection, and to evaluate the performance of the method.

## 2. Modified PCA for mixed measurement level data

### 2.1 Quantification of qualitative data

We must use a suitable quantification method in the context of PCA because we here wish to consider a variable selection problem in PCA. One of the best methods is the optimal scaling in nonlinear PCA. Nonlinear PCA is a method to deal with qualitative data, which estimates the parameters of PCA and quantifies qualitative variables simultaneously by alternating between estimation and quantification. PRINCIPALS of Young et al. [4] and PRINCIPALS of Gifi [5] are algorithms for nonlinear PCA. Here we use PRINCIPALS.

PRINCIPALS is an algorithm using the alternating least squares (ALS) algorithm as follows: Let Y= (y1y2yp) be a data matrix of nobjects by pcategorical variables and let yjof Ybe a qualitative vector with Kjcategories labeled 1,,Kj. PRINCIPALS minimizes the loss function

σLZAY=trYŶYŶ=trYZAYZA,E1

where Yis an optimally scaled matrix form Y, Zis an n×rmatrix of ncomponent scores on r1rpcomponents, and A=a1a2aris a p×rweight matrix that gives the coefficients of the linear combinations. PRINCIPALS alternately makes two estimations: the model parameters Zand Afor ordinary PCA, and the data parameter for optimally scaled data Y.

In the computation of PRINCIPALS, Yare standardized for each variable such as to satisfy restrictions Y1n=0pand diagYYn=Ip. We denote the value θestimated the t-th iteration by θt. Given the initial data Y0(the observed data Ymay be used as Y0after the above standardization), PRINCIPALS iterates the following two steps:

• Model estimation step: By solving the eigenvalue problem (EVP) of the covariance matrix of Yt(=S)

SλIa=0,E2

where λis the eigenvalues, obtain At+1and compute Zt+1=YtAt+1. Update Ŷt+1=Zt+1At+1.

• Optimal scaling step: Obtain Yt+1such that

Yt+1=argminYttrYtŶt+1YtŶt+1E3

for fixed Ŷt+1by separately estimating yjfor each variable junder the measurement restrictions on each of the variables. That is, compute qjt+1for nominal variables as

qjt+1=GjGj1Gjŷjt+1,E4

where qjis a Kj×1category score vector for yjand Gjis an n×Kjindicator matrix

Gj=gjik=gj11gj1Kjgjn1gjnKj=gj1gjKj,E5

where

gjik=1ifobjectibelongstocategoryk0ifobjectibelongstosomeothercategorykk,E6

and then the optimally scaled vector yjis obtained by yj=Gjqj.

Re-compute qjt+1for ordinal variables using the monotone regression [6]. For nominal and ordinal variables, update yjt+1=Gjqjt+1and standardize yjt+1. For numerical variables, standardize the observed vector yjand set yjt+1=yj.

These two steps alternately iterate until convergence, and yjobtained at convergence is the quantified variable while Aand Zare the solutions of PCA for qualitative data.

### 2.2 Modified PCA

M.PCA of Tanaka and Mori [1] derives PCs that are computed using only a selected subset but represent all of the variables, including those not selected. This means that M.PCA naturally includes variable selection procedures in its estimation process. Although there are several variable selection methods in PCA, we use M.PCA, because a subset of variables selected by M.PCA can represent all the variables very well and it is easy to incorporate the quantification method in Section 2.1 into M.PCA, which will be described in Section 2.3.

Suppose we obtain an n×pdata matrix Ythat consists of numerical variables or optimally quantified variables. Let Ybe decomposed into an n×qsubmatrix Y1and an n×pqsubmatrix Y21qp. Yis represented by rPCs, which is a linear combination of a submatrix Y1, that is, Z=Y1A, where ris the number of PCs 1rq. To derive A=a1ar, the following Criterion 1 based on Rao [7] and Criterion 2 based on Robert and Escoufier [8] can be used:

(Criterion 1)The prediction efficiency Yis maximized using a linear predictor in terms of Z.

(Criterion 2)The closeness of configurations between Yand Zis maximized using the RV-coefficient.

We denote the covariance matrix of Y=Y1Y2as S=S11S12S21S22,where the subscript iof Scorresponds to Yi. The maximization criteria for the above Criterion 1 and Criterion 2 are given by the proportion P

P=j=1rλj/trS,E7

and the RV-coefficient

RV=j=1rλj2/trS21/2,E8

respectively, where λjis the j-th eigenvalue with the order of magnitude of the EVP

S112+S12S21λS11a=0.E9

The solution is obtained as a matrix A, the columns of which consist of the eigenvectors associated with the largest reigenvalues of EVP (9), and Y1that provides the largest value of Por RVis the best subset of qvariables among all possible subsets of size q. Thus, to obtain a reasonable subset of variables with size qin PCA, you apply M.PCA to the data and find the subset of size q, Y1, that has the largest Por RV. The selected subset Y1is reasonable in the sense of PCA because it contains information that includes not only the selected variables Y1but also the deleted ones Y2.

### 2.3 Modified PCA for mixed measurement level data

M.PCA is a good method to find a reasonable subset of numerical variables as described in the previous section. To select variables from mixed measurement level data by using a criterion in M.PCA, qualitative/categorical variables in the data should be quantified in an appropriate manner. Based on the original idea in ref. [9], considering PRINCIPALS in Section 2.1 and M.PCA in Section 2.2, it is easy to incorporate the quantification (PRINCIPALS) into M.PCA, because we can formulate M.PCA for qualitative data only by replacing the EVP (2)) in the Model estimation stepof PRINCIPALS by the EVP (9) to get the model parameters Aand Zfor M.PCA. Thus, M.PCA and optimal scaling are alternately executed until θ=trYŶYŶ=trYZAYZAis minimized. This is nonlinear M.PCA or NL.M.PCA.

Here, we rewrite the ALS algorithm of PRINCIPALS as follows—for given initial data Y0=Y10Y20from the original data Y, the following two steps are iterated until convergence:

• Model estimation step: From Yt=Y1tY2t, obtained Atby solving the EVP (9).

Compute Ztfrom Zt=Y1tAt. Update Ŷt+1=ZtAt.

• Optimal scaling step: Obtain Yt+1for fixed Ŷt+1by separately estimating yj(=Gjqj)for each variable junder the measurement restrictions. Re-compute Yjt+1by an additional transformation to keep the monotonicity restriction for ordinal variables and skip this computation for numerical variables.

Y=Y1Y2obtained after convergence is an optimally scaled (quantified) matrix of Y, and Y1corresponding to Y1is a subset to be selected and Y2to Y2is one to be deleted.

NL.M.PCA procedure for fixed qis as described above, but since the variable selection performs M.PCA calculation for q=p,,rand pCqtimes to find the best Y1, there are three possible types of selection according to where the quantification is implemented in the computation flow (see Fig. 4.1 in [2]).

The first type (Type 1) is that the quantification is performed only once at first, that is, nonlinear PCA is applied to the data Yto obtain the quantified data Y, and ordinary M.PCA selection is applied to Y. No more quantification is carried out in the selection stage. The second type (Type 2) is that the quantification is carried out every time after the best subset of size qis found in the selection stage. That is, the quantified Y1Y2based on the best subset of the size qfound in the previous selection is used to find the best subset of size q1or q+1in the next selection. The third type (Type 3) is that the quantification is carried out for every temporary Y1Y2in the section stage, that is, NL.M.PCA is performed whenever temporary Y1Y2is given to compute its criterion value.

A reasonable subset of size qis given as Y1corresponding to the best subset Y1which is finally found at qwhen the selection procedure is terminated.

## 3. A numerical example

### 3.1 Data

The data we analyze here was gathered in the survey about the relationship among customer engagement on “fashion,” “brand,” and “shop staff” [3]. The questions (variables) are divided into three groups based on the purposes for consumption: “Involvement” (16 variables), “Expectations” (35 variables), and “Values” (34 variables). The total number of questions is 85 on a five-level scale and 825 responses are obtained. Ohyabu et al. [3] analyzed this data to find the structure of the customer consciousness, but we use this data simply as sample data for variable selection in PCA without considering the original purpose in ref. [3]. Here we apply NL.M.PCA to the second question group “Expectation” (35 variables) to show the performance of the proposed method. The questions asked in the survey are indicated in the “Question” column of Table 1 and answers (responses) are shown in Table 2.

GroupItemQuestionq = 25
Q1I think about fashion by putting on clothes or choosing the clothes.×
Q2I think about fashion by putting on clothes or choosing the clothes.×
Q3I want to know about fashion by putting on clothes or choosing clothes.×
Q4I’m enthusiastic when I think about fashion.×
Q5I’m happy with thinking about fashion.×
Q6I feel relaxed when I think about fashion.×
Q7I’m proud of my fashions when I think about fashion.×
Q8I spend a lot of one’s time when I think about the fashions.
Q9I talk about fashion with my friends.
Q10I’m checking about SNS or writing comment for fashion.×
Q11I’m posting about a fashion to SNS.×
Q12I think about the brand by putting on clothes or choosing the clothes.
Q13I think about the brand by putting on clothes or choosing the clothes.×
Q14I want to know about the brand by putting on clothes or choosing the clothes.×
Q15I’m enthusiastic when I think about the brand.×
Q16I’m happy when I think about the brand.
Q17I feel relaxed good when I think about the brand.×
Q18I’m proud when I think about the brand.×
Q19I spend a lot of one’s time when I think about the brand.×
Q20I always use a specific brand when I wear or choose clothes.
Q21I always use the brand when I clothes or choice of clothes.×
Q22I’m checking about SNS or writing comment for the brand.×
Q23I’m posting about a brand to an SNS of mine.×
Q24I think about the staff member by talking to other staff×
Q25I think about staff members when I speak to other shop staff.
Q26I want to know more about shop staff by speaking.×
Q27I’m enthusiastic when I’m talking with staff members.
Q28I’m happy when I’m talking with staff members.×
Q29I feel relaxed when I’m talking with staff members.×
Q30I’m proud when I’m talking with shop staff.
Q31I spend a lot of time talking with shop staff.×
Q32I always talk to the specific staff member when choosing clothes or putting on clothes.
Q33I always talk the specific staff member.×
Q34I’m checking about the specific staff member of SNS or writing comment for the brand.
Q35I’m posting about the specific staff member to my SNS.×

### Table 1.

35 questions in “expectation” and 25 selected ones (marked by ×in the right column).

### Table 2.

Expectation data (825 responses on 35 variables).

### 3.2 Output from NL.M.PCA

Table 3 shows the output of NL.M.PCA when NL.M.PCA is applied to Expectation data with r=5of the number of PCs, proportion Pas a criterion, and forward-backward stepwise selection and type3 quantifications as selection procedures. The number qis the number of selected variables and the value Pis the criterion value. Y1|Y2shows that the left side of each row is the question numbers to be selected Y1and the right side to be deleted Y2. If you have a specific number qfor variables to be used, such as 20, 10, or 2/3 = 24, 1/2 = 18, you can use variables whose numbers are displayed in Y1at that q. If the number of variables to be used is not determined, the proportion Pcan be used. For example, since the proportion Pis 66.95% with all 35 variables, if you want to keep Pup to 65%, looking at the row of P=0.6512(i.e., q=20), you can use 20 variables in Y1. Alternatively, if the difference between the proportion with all 35 variables and that with selected variables should be less than 1%, 25 variables can be used because 0.66950.01=0.6595, which is the Pvalue at q=25. Figure 1 shows the change of Pfor every q. This graph can be used to obtain guidance on the determination of the number of variables. Looking at this graph, if there is a large drop in P, the number of variables just before that point can be used (for this data, no particular drop is observed).

### Table 3.

Selection results (expectations, r=5, proportion P, forward-backward stepwise selection, Type3).

When using RV, the same considerations are applied, and scatter plots are also considered to see how close the configurations are.

### 3.3 Results of variable selection

Here we select a subset of variables from 35 variables of Expectation data focusing on the loss of proportion P. Suppose we want to keep it under 1%, q=25which is assigned from Table 3 and Figure 1. The selected variables are marked by ×in the right column of Table 1. As far as looking at the variables deleted from each block, two variables {8, 9} from 11 variables in “fashion” block, three variables {12, 16, 20} from 12 variables in “brand” block, and five variables {25, 27, 30, 32, 34} from 12 variables in “shop staff” block are deleted. That is, nine variables are selected from the first two blocks and seven from the third block. It can be stated that the proposed method selects a reasonable subset of variables. Comparing the number of deleted variables in the three blocks, a slightly larger number of variables are removed from the third block, so it is thought that questions on “shop staff” have little information rather than those in the other two blocks and some of them have less significance on the prediction efficiency. From this point of view, we can evaluate the usefulness of each question in the questionnaire.

To evaluate the significance of variables, we observe how many times each variable is selected through the selection for q=35,,5. Extracting the variables selected over 2/3times (24 or more), for example, in the “fashion” block, variables {1, 6, 10, 11} were selected. Given the fact that the close-up questions are located close to each other (1 to 3, recognition on fashion, 4 to 7—consciousness on fashion, 8 to 11—activity on fashion), it is generally clear that NL.M.PCA using the proportion Pselects variables well-balanced from the close-up questions. Similarly, if the most frequently selected variables (such as the above four items) are considered as the most important questions, they should be involved in future surveys. If variables are selected a few times, they should not be involved in the future. In such a way, there is a possibility to use the selection results to evaluate the questionnaire itself.

## 4. Concluding remarks

We reconsider a variable selection problem in PCA for qualitative data based on the idea of Mori et al. [2]. For the problem of how to deal with qualitative data, we apply optimal scaling with the ALS algorithm [4] to the qualitative data. For the variable selection in PCA, we use the criteria in M.PCA of Tanaka and Mori [1] for optimally quantified data. That is, the proposed method is an extension of M.PCA by implementing optimal scaling into M.PCA so as to select a subset of qualitative variables. Using this method, since the quantification is done separately for each variable, we can select a subset of variables from mixed measurement level data.

We apply this method to real data from a customer engagement study [3] to select a subset of qualitative variables by using a criterion that maximizes the prediction efficiency. For a case where there is no preassigned number of variables to be selected, it can be suggested to specify the number in such a way that the maximum loss of the efficiency is not over a certain percentage.

As a result, variables are selected in a well-balanced manner from questions asking similar contents, and the selected subset, therefore, provides as much information as possible. It is expected that the nonlinear M.PCA works well for any mixed measurement level data.

## References

1. 1. Tanaka Y, Mori Y. Principal component analysis based on a subset of variables: Variable selection and sensitivity analysis. American Journal of Mathematics and Management Sciences. 1997;17(1 & 2):61-89
2. 2. Mori Y, Kuroda M, Makino N. Nonlinear Principal Component Analysis and its Applications (JSS Research Series in Statistics). Singapore: Springer; 2017
3. 3. Ohyabu R, Kuroda M, Seino S, Zhang Z. Exploring interplay among customer engagements with multiple objects, the 6th Naples forum on service. Service Dominant Logic, Network & Systems Theory and Service Science: Integrating three Perspectives for a New Service Agenda. 2019:103
4. 4. Young FW, Takane Y, de Leeuw J. Principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features. Psychometrika. 1978;43:279-281
5. 5. Gifi A. Nonlinear Multivariate Analysis. Chichester: Wiley; 1990
6. 6. Kruskal JB. Nonmetric multidimensional scaling: A numerical method. Psychometrika. 1964;29:115-129
7. 7. Rao CR. The use and interpretation of principal component analysis in applied research. Sankhya. 1964;A26:329-358
8. 8. Robert P, Escoufier Y. A unifying tool for linear multivariate statistical methods: The RV-coefficient. Applied Statistics. 1976;A25:257-265
9. 9. Mori Y, Tanaka T, Tarumi T. Principal component analysis based on a subset of qualitative variables. In: Hayashi C, editor. Data Science, Classification and Related Methods. Springer. 1997:547-554

Written By

Hiroko Katayama, Yuichi Mori and Masahiro Kuroda

Reviewed: February 16th, 2022 Published: April 10th, 2022