The Role of Reviews in Decision-Making

With the rise of social media such as blogs and social networks, these interpersonal communication expressed by online reviews has become more and more important as an influential source of information both for the managers and for the consumers. In-depth purchasing-related information is made available to markers. Now we can utilize this new source of information to understand how consumers evaluate products and make decision in relation with it. Since reviews are text data, new ways to analyze the data is needed and text-mining plays the role here together with the help of traditional statistical methods. With these methods, we can examine the contents of reviews and identify the key areas that impact consumers’ decision-making.


Introduction
We finally enter this technologic era where people and technology integrate with each other. It only took 4 years for Internet to reach 50 million people while comparing to telephone it is 75 years. Especially for Generation Z, majority of them grow up using the Internet and social media. Overall, more than half of the world's population is online. The rise of Internet and especially social media shifted the way people communicate and interact with each other. With the rise of social media such as blogs and social networks, these interpersonal communication expressed by online reviews has become more and more important as an influential source of information both for the managers and consumers. With the rapid growth of comments by consumers over the Internet, in-depth purchasing related information is available to markers. The wide availability of lengthy and numerous text-based online reviews provides a treasure trove of information that can potentially reveal a much wider set of variables that determine the consumers' attitude/evaluation toward the products. There has been numerous of research on how to utilize the information. In [1], the authors investigated consumers' usage of online recommendation sources and their influence on online product choices. Later in [2], Kumar and Benbasat use empirical evidence to demonstrate the influence of recommendations and online reviews on the consumers' perceptions of usefulness and social presence of the websites. As for the firms, online consumer reviews can provide valuable insights and help them improve their products accordingly.
There also have been many research articles (i.e., "for example, see [3][4][5][6]") which try to identify the variables that affect the decisions of individuals to make recommendations of product, or not. By their very nature, these studies are only able to identify a limited number of such determinant variables. In particular, customers' satisfaction has been linked to recommendation to others as in [7], Ladhari et al. identified three drivers-perceived service quality, emotional satisfaction, and image-that are positively related to each other and positively influence loyalty and recommendation. However, almost all studies in the previous research have used numeric variables. So only a limited number of determinants have been studied.
Built upon the previous work, we utilize text-mining method to identify the important product dimensions comparing to the traditional survey method, which are highly related to the quality and thus consumers' attitudes toward the products.

Methodology
In this section we describe the methods that we use for analysis of text content. So far text mining has become a very standard procedure to deal with text and here the detailed process is listed for education purpose.
Text classification is a supervised learning process to predict the class of a document based on a set of features describing the document [8]. The predefined categories are given compared to the un-supervised learning process. The prediction model is automatically learned from a training set and can be used to predict new cases. Text classification utilizes various machine-learning algorithms to classify the sentence based text documents into one of the previous defined categories. Suppose we have a set of documents which could be the reviews posted on the websites by consumers, emails by various users, etc. A vector of attributes represents each document as ( X 1 , X 2 , ⋯, X n ) . All documents belong to one of predefined categories Y belongs to ( Y 1 , Y 2 , … , Y m ) . The attributes are usually term weights from indexing which will be discussed in detail in the following sections. For most cases, we deal with binary situations. Different machine learning algorithms can be used to predict the class of the document Y = f (X) . Popular machine learning algorithms such as Naïve Bayes, multinomial Naïve Bayes, Decision tree, and SVM have been applied in text classification problems. Witten and Frank gave the detailed description of these common methods if further information is needed.

Preprocessing
Before applying the learning methods, several preprocessing steps are necessary to get the data in the ready format for future analysis. The preprocessing of raw data includes: raw text tokenization, case conversion, stop-words removal, and stemming.
Firstly, the raw texts are divided into tokens (single word, special symbols, etc.) using whitespaces (space, tab, newline character, etc.) as separators to break the entire review document into tokens. For example, suppose we have a document "I like iPhone. It is the first phone I got and I really like the appearance." The tokenization step will break this sentence into tokens like "I," "like," "iPhone," "got," etc. Secondly, all words are converted to lower cases-case conversion. All the capitalized letters will be converted into lower cases. In our examples, the letter "P" is converted to "p" and the word "iPhone" is converted to "iphone." The purpose of case conversion is to reduce the number of redundant words by converting them all into the lower cases. The third step is stop-words removal. The purpose of the stop-words removal is to reduce the size of the classification matrix by reducing the number of irrelevant terms. Lots of overly common used words like "the," "I," "to," etc., are useless in classifying the document into the predefined categories. The efficiency and accuracy of the classifications can be improved by removing these words. In our study, a general stop-word list, which contains a consequence of standard stop words with manually adaption, is applied. The last step in the preprocessing is the stemming. Word variations are conflated into a single representative form called the stem. For example, connect is the stem for connected, connection, connecting, etc. Stemming significantly reduces the number of features and increases the retrieval performance [9]. Here we use a dictionary-based stemmer, which is commonly used in text mining. When a term is unrecognizable, we use standard decision rules to give the word a correct stem.

Indexing
The result so far is a high-dimensional term-by-document matrix with each cell represents the raw frequencies of appearance for each term in each document. The rows of the matrix correspond to terms (usually terms are words), and the columns represent documents (reviews for example). In [10], Spark Jones showed that there is a significant improvement in retrieval performance by using the weighted terms vectors. The term weight is often decided by the product of the Term Frequency (TF) and the Inverse Document Frequency (IDF) by Spark Jones [11].
The TF measures the frequency of the occurrence of an indexed term in the document [12]. The higher the frequency is, the more important this term is in characterizing the document. Such frequency of occurrence of an indexed word is used to indicate term importance for content representation, i.e., "for example, see [13][14][15]." In our study, the TF was obtained by the raw term frequency. However, not every word appears equally across the whole set of review documents. Some words appear more frequently than others by nature. The more rarely a term occurs in a document collection, the more discriminating that term is. Therefore the weight of a term is inversely related to the number of documents in which it appears. So IDF is used to take into account of this effect. The logarithm of the IDF was taken to reduce the effect of raw IDF-factor.
Finally the total weight of a term i in document j is given by Here, TF ij is equal to the term frequency of term i in document j ; IDF i is equal to the inverse document frequency of term i .
Mathematically, TF ij = n ij with n ij equals to the frequency of term i in document j and IDF i = log 2 ( n _ df i ) + 1 , with n equals to the total number of document in the entire reviews collection and df i equals to the number of review documents where term i was present.

Multi-word phrases
So far the tokenization gives the term-by-document matrix. Each term in the matrix is the frequency of a single word. As most of the cases, multi-word phrases are also important because phrases have more complete context information than individual word. So the most popular class of features used for text classification is n-grams [16]. Word n-gram includes the single word (unigram), and higher order n-grams like bi-grams, tri-grams. Word n-grams have been used effectively in various studies. Unigram to tri-grams have typically been used in text mining and large n-gram phrases set require the following use of attribute selection to reduce the dimensionalities [17,18]. For instance, we have sentence "I like iPhone." We have three unigram "I," "like," "iPhone"; we have two bi-grams "I like," "like iPhone,"; and Application of Decision Science in Business and Management 4 we have one tri-grams "I like iPhone." For most cases, multi word phrases are not popularly used due to the low frequency.

Dimensionality reduction
So far this weighted term-by-document matrix is a high dimensional matrix due to the many distinct terms. Moreover, it is very sparse with many zeros since not all documents contain all terms. Large attribute dimensionality incurs high computational cost and more seriously cause over-fitting problem on many classification methods. We choose Gini index as our method for attributes selection since it is base upon the distinguishing ability of the word as well as importance of the word.
Gini index was proposed and studied by Aggarwal and Chen [19]. It aims to decide which feature variables are decision variables for a decision support application. In the training data the key decision variables are identified and trained to predict the decisions classes. Training dataset D trian contains n reviews and each review q belongs to a predefined class with labels s which is drawn from the set {1…k}. Overall we have a dxn feature-review matrix with each feature is denoted i with i range from 1 to d and each review is denoted by q with q range from 1 to n. In our case since the labels will be a binary situation of recommend or not. Now the Gini index is calculated to define the level of class discrimination among the data points of each feature as follows: Then we can use Gini index to help us find the key features that are important to the decisions. With a bigger Gini index, it indicates a higher discriminating ability of that word. So we set a threshold of choosing high value Gini-indexed attributes. In previous research the frequency of occurrence of an indexed word has been used to indicate term importance for content representation [13][14][15]. So we set another threshold of selecting attributes based on the frequency.

Classification technique
There are various classification techniques applied in text mining such as Naïve Bayesian, vector support machine (SVM), and decision trees. SVM performs classification more accurately than most other methods in applications, especially for high dimensional data. SVM was invented by Vapnik and Chervonenkis in [20] and has been used a lot in various areas [16,21]. SVM are supervised learning models that can classify data into the groups. Given a set of training examples, each data record is marked as one or the other of two categories. An SVM training algorithm builds a model that can assign new examples to one category or the other. In our example, we have categories of class: recommend or not recommend the product to others.

Evaluation criteria
In order to evaluate the performance of different classification models, the most common measure of accuracy is used.
Accuracy: the percentage correctly classified. If TP, FP, TN, and FN are, respectively, the number of positive reviews predicted as positive, the number of negative reviews predicted as positive, the number of negative reviews predicted as negative, and the number of positive reviews predicted as negative, the accuracy is  + TN + FN) . The accuracy should be benchmarked to the proportional chance criteria (percentage positive 2 + (1 − percentage positive ) 2 ) in order to confirm the predictive capabilities of a classifier [22].

Data and analysis results
In order to illustrate the method we proposed, we applied the method on two examples from two industries for generalization-hotel industry and clothing industry.

Example 1: hotel industry
For hotel industry, the data was obtained from orbitz.com, which is one of the leading websites in the travel industry in US. On the websites, consumers can only leave their reviews, ratings, and recommendation choices after they stayed in the hotel and registered with the hotel. We collected the data of a high quality hotel in Las Vegas: five-star hotel "Venetian." We chose Las Vegas among the various cities across the whole nation because it is one of the most popular tourist cities in the U.S., and attracts a large number of hotel consumers staying and leaving reviews. We pick a five-star hotel because as in Las Vegas, in order to attract visitors, lots of high-level hotels were built and also because of the low price comparing to other locations, five-star hotels are very popular among consumers. Figure 1 showed an example of the data.
After preprocessing of the raw reviews we get the term (attribute) by document matrix. For each attribute, we calculated the Gini index of that feature and select only the ones with a Gini value higher then 0.75 [19] and also frequency is higher than the average frequency of the words appearance. Through this we are able to find the major attributes that are both important and distinguishing in the evaluations of the hotel. List of feature is shown in Table 1.
From the table we see, around 40 features which are both important and distinguishing were extracted from the consumer online reviews. For each feature, we calculated the tf-idf value to reflect the frequency of occurrences of the word features, which indicate the importance of the features for representation of the content of the reviews. The evaluation of importance of features was usually determined by consumer surveys in the past.
Next, classification (SVM) is performed using the selected 38 features as the predictive variables. The accuracy is 91.6%. The high accuracy indicated text reviews could be used to represent the true thinking of the consumers toward the hotel, which can be further used to identify the factors that consumers value as when they evaluate the hotels.  Last, factor analysis was applied using principal axis factoring in order to identify the underlying factors of the two hotels. The principle axis factoring analysis with a Varimax rotation showed 14 factors with an eigenvalue of one or greater for the functions of apps. As stated in Table 3, total variance explained by each factor of apps' functions was also revealed. Specifically, the first factor has an eigenvalue of 3.48, which is 21% of the total variance of seven items. The second factor has an eigenvalue of 1.74, which is 16% of the total variance of seven items. Then the next five factors have an eigenvalue bigger then 1.3. The rest has too small values (either below 1 or close to 1) so we did not include them. Normally, eigenvalues greater than 1.0 are recommended as a criterion. First seven factors are chosen as in Table 2.
The factors are labeled as: (1) room, (2) value, (3) Las Vegas specific hotel amenity-casino, (4) other amenities, (5) location, (6) staff, and (7) Las Vegas specific hotel amenity-entertainment. Among the 38 items, three items were deleted for appropriate data reduction for future statistical analysis. As you can see in Table 4, AF5 (beautiful), AF13 (experience), AF25 (weekend), AF30 (close), and AF33 (expense), were eliminated because they had no significant loading on any of the factors above (factor loading less than 0.20) as in Table 3.

Example 2: clothing industry
In this study, data was obtained from a website which contains information of clothes purchasing and reviews by the consumers.
After preprocessing of the raw reviews we get the term (attribute) by document matrix. For each attribute, we calculated the Gini index of that feature and select only the ones with a Gini value higher then 0.75 [19] and also frequency is higher than the average frequency of the words appearance. Through this we are able to find the major attributes that are both important and distinguishing in the evaluations of the clothes at different category and performed the text classification. The high accuracy (84.9%) indicated text reviews can be used to represent the true thinking of the consumers which can be fatherly used to identify the factors that consumers value as when they evaluate the clothes.
From the narrowed list of both important and distinguishing features, we are able to perform some qualitative diagnostic analysis to identify the determinant attributes for each category and also make the comparisons.
Last, we conduct factor analysis using principal axis factoring in order to identify the underlying factors for each category. As stated in Table 4, we showed the factors for each category and their loading score.

Discussions
In marketing, means-end chain theory is a widely applied theory which is a conceptual cognitive model that suggests consumer decision-making process is a series of cognitive developments through linkages between product attributes, consequences, and value [23] In the context of product usage, product it self is the "means" and the value of the products is the "end." The product attributes are  retained in the minds of consumers at abstract level and can influence the evaluation of the product by the consumers. Means-end theory has been used a lot e-service quality research [7,24,25]. Parasuraman et al. [26] applied this theory as the theoretical foundation to develop and conceptualize e-service quality delivered by websites. In our study, the means are the attributes of the hotels/clothes extracted from online consumer reviews while the end (consequences) are the key areas categorized by factor analysis based on the importance of the attributes also extracted from online consumer reviews through text mining as indicated in Figure 2. Through text-mining and factor analysis, a combination of new and traditional method, we are able to identify the key drivers of consumers decision-making in purchase of two different products.

Conclusions
A major finding conclusion of our study is that we can utilize the great volume of reviews online to help us identify the key aspects of different product category. Online reviews of products and services are present all over the Internet. Potential consumers value these greatly. Marketers can also get valuable information from reading these reviews. These reviews predominantly contain text-based information. This can be of great value to the marketers: we can form this standardized line of business analysis procedure which can be applied to any business scenarios and offer business insights for business organizations especially for managing products and advertising.
We can utilize text-mining methodology to show that consumers' attitudes can be accurately predicted by text mining. In addition to making predictions of recommendations, marketers would benefit tremendously by identifying the key information from many thousands of reviews.
A framework was developed by which companies can get this important diagnostic information. This framework consists of reliance on the importance of words based on frequency of occurrence and a new way to look at how certain words have greater power to discriminate/distinguish between existence and non-existence of recommendations (Gini index). Factor analysis is conducted to extract the key dimensions for product evaluations.
Advertisers and marketers would be among the prime beneficiaries once they can glean the appropriate information from text-based reviews. The identified information can either be strongly used in advertising or to improve the business.