Facet Decomposition and Discouse Analysis: Visualization of Conflict Structure

The current book is a combination of number of great ideas, applications, case studies, and practical systems in the domain of Semantics. The book has been divided into two volumes. The current one is the second volume which highlights the state-of-the-art application areas in the domain of Semantics. This volume has been divided into four sections and ten chapters. The sections include: 1) Software Engineering, 2) Applications: Semantic Cache, E-Health, Sport Video Browsing, and Power Grids, 3) Visualization, and 4) Natural Language Disambiguation. Authors across the World have contributed to debate on state-of-the-art systems, theories, models, applications areas, case studies in the domain of Semantics. Furthermore, authors have proposed new approaches to solve real life problems ranging from e-Health to power grids, video browsing to program semantics, semantic cache systems to natural language disambiguation, and public debate to software engineering.

proposed a sentiment analysis approach for extracting polar opposite sentiments of subjects from a document by identifying the semantic relationship between the subject term and semantic expressions such as "good" and "bad," instead of classifying the entire document as positive or negative. By defining the terms of sentiment expression, it is possible to calculate an individual's attitude toward the subject term and determine whether participants in a discussion have different attitudes toward to subject term While this method certainly has its merits, it is unable to determine the reason for the differences in attitude and the underlying relationships in the debate. In order to enhance mutual understanding and consensus building in debates, it is necessary to clarify the conflict structure and its change throughout multiple debates, evaluate the significance of debate context on participants' sentiments, and develop facilitation techniques for managing conflict structures.
Opinion mining is a relatively new research field that includes review analysis, opinion extraction, and answering opinion questions. Opinion mining utilizes natural language processing techniques and facilitates deep semantic classification of opinions. Sophisticated techniques for deep semantic classification have not yet been developed for and applied to public debates as they require substantial linguistic knowledge and the ability to process additional non-literal information such as heuristic rules. The rules-based approaches to opinion mining also have some limitations. Somasundaran and Wiebe (2010) examined opinions in ideological online debates by developing an argument lexicon using the Multi-Perspective Question Answering (MPQA) annotations. Meanwhile, the Statement Map (Murakami et al., 2010) and WISDOM (Akamine et al., 2009) methods focus on analyzing information credibility through opinion mining techniques. These methods classify semantic relationships between subjective statements such as agreement and conflict relationships using rules-based approaches that work well with experimental data but generally are more www.intechopen.com Facet Decomposition and Discouse Analysis: Visualization of Conflict Structure 191 difficult to implement than machine learning approaches. Recognizing Textual Entailment (RTE) Challenges (Dagan, 2006;PASCAL, 2005) is a method that focuses on the semantic relationships between texts to determine whether the meaning of one text is derived from another text. Although the RTE datasets included semantic relations such as agreement and conflict, Marnefe (2008) found that opinions in real world debates were much more difficult to classify and as such, more sophisticated classification methods are required for RTE to work in the real world. Cross-document Structure Theory (CST) is another approach for identifying semantic relationships between sentences (Radev, 2000). The CSTBank Corpus built by Radev (2000) is annotated with 18 kinds of semantic relationships including agreement and conflict. As with the RTE, the CST approach does not effectively classify opinions in debates.
The rules-based approaches discussed above clarify linguistic characteristics using detailed categorizations and rules to ensure accuracy in analysis. However, they are limited in that they are difficult to implement and their classifications are too simplified to accurately classify more complicated opinions in real-world debates. This study proposes a simpler, more effective approach based on machine learning techniques for visualizing conflict structures in terms of rapid prototyping. Machine learning approaches are more effective and perform better than rules-based approaches. We then apply this approach to a series of real public debates where participants with different backgrounds argue their opinions.

Content Analysis
Debate participants' different cognitive frames and subjective interests lead to differing opinions. In order to effectively summarize the debate contents, we need to maintain the relevance between textual and contextual information using qualitative and descriptive analyses (Hashiuchi, 2003). In this study, we used the method of content analysis proposed by Stone et al. (1966). Content analysis refers to ". . . any research technique for making inferences by systematically and objectively identifying specified characteristics within text" (Stone et al., 1966). The purpose of content analysis is to examine the corpus data of dialogue (e.g., debate minutes), taking into consideration the relevance between the words and the context embedded in the dialogue. Krippendorff (1980) identified the three most important features of content analysis: it is an unobtrusive technique; it can assess unstructured material; and is context-sensitive. These features allow researchers to examine relatively unstructured data for meanings, symbolism, expressed information, and the role this information plays in the lives of the data sources, while at the same time acknowledging the textuality of the data, that is, recognizing that the data are read and understood by others and interpreting this data based their own contexts (Koreniusa et al., 2007).
Discourse analysis is a type of content analysis that clarifies debate content and structure by targeting language use in the real world (Schiffrin et al., 2003). Discourse analysis reveals the structural and functional mechanism of language in relation to the utterances in their context. Discourse has traditionally been defined as "anything beyond the sentence." Schiffrin (1994) referred to "utterances" rather then "sentences" because utterances are contextualized sentences, that is, they are context-and text-bound. In discourse analysis, context refers to the information that surrounds a sentence and determines its meaning, and includes a broad range of information such as the speaker's knowledge, beliefs, facial expressions, gestures, and social circumstance, and culture. Hatori et al. (2006) suggested a discourse analysis based on facet theory. In this method, utterances in a debate are decomposed into three conceptual units (hereafter called "facets")-"what" should be solved, "how" it should be solved, and "why" it should be solved, in order to effectively examine interest conflict and cognitive dissonance among debate participants. This facet decomposition task depends on how researchers analyze and interpret the utterances in a debate. However, to be able to analyze multiple debates with huge amounts of language data, improved reproducibility and computational approaches are required (Francis, 1982).
This study develops a computational method of facet decomposition and discourse analysis based on corpus linguistics. Corpus is generally defined as a collection of text or written and spoken material that highlights how a language is used. Corpus linguistics is a linguistic method that explains the meanings of words and phrases using a corpus. It is a reconstructive method for analysis of language data using a computer (Sinclair, 1991;Stubbs, 2002;Wang, 2005;McCarthy, 1998). In this study we use SVM learning to decompose debate utterances into facets. Then, using this facets-tagged corpus, we propose an approach for calculating the differences among the utterances of the debate participants and visualize their dynamic change.

Management of public debate
A public debate is essentially a third-party committee where stakeholders, including experts, public enterprises, and citizens, share their perspectives on public projects in their early planning stage. The purpose of this third-party committee is to promote mutual understanding among the debate participants and ensure transparency of the debate system by opening the discussion process to the various stakeholders and making available information on related projects. The third-party committee is not involved in the decisionmaking, but rather in the collection of information related to public project and the building of public trust in the decision makers (Hatori et al., 2008).
The third-party committee is directly entrusted by the government and the citizens and as such, must suggest solutions that are socially desirable and beneficial to the government and its citizens. As representatives of the government and its citizens, committee members must assess solutions based on their social perspective (i.e., check for adequacy) and technical and professional perspectives (i.e., check for rigor). In short, the third-party committee is expected to assess project-related information for adequacy and rigor to facilitate decision making.
Previous studies have used questionnaires and interviews to understand the perspectives of stakeholders on public projects. According to Giddens (1990), these methods are based on the segmented, asynchronous communication between the stakeholders and thus suffer from the problem of faceless commitments. These methods are limited in that questionnaire may contain the subjective perspectives of the researchers and as such both the researcher and participants readily trust the questions and answers on the questionnaire, respectively. In addition, the questionnaire may contain the assumptions and perspectives of the researchers on the projects which the respondents or citizens do not necessarily recognize. In contrast, a debate within a third-party committee realizes the richness of social communication facilitated by the direct meeting among stakeholders. Such debates ensure the effective collection of information and transparency in the decision-making process.
Debate participants are interested only in certain aspects of an issue, and as such, are motivated to steer the discussion toward their desired goals. Debate participants such as experts and engineers usually evaluate the benefits of the projects based on scientific and professional evidences highlighting rigor, while general citizens evaluate them based on their common sense and their interests, checking for adequacy. If debate participants consider only one aspect of the issue, that is, either rigor or adequacy only, the debate would not reach a consensus. The most important role of public debate involving the third-party committee is to achieve a common perspective on the project under discussion by recognizing each stakeholder's interest and point of view. To evaluate the progress of the debate toward mutual understanding and consensus, it is necessary to accumulate the debate minutes and summarize the discussions. This study proposes a method of discourse analysis that manages public debate by summarizing the debate contents and tracking the progress of the debate process toward its goal.

Facet decomposition and the Support Vector Machine (SVM)
In this study, we define utterance as the language used to establish one's position on an issue by addressing the position's pros and cons. During a debate, participants legitimize their positions with technical and scientific evidences or their common sense and selfinterests. They strengthen their positions by expressing positive or negative attitudes toward another participant's utterances depending on whether those utterances coincide or contrast their positions. Based on these assumptions, we identified the four facets to be used in this study: facet A refers the positions of participants or their pros and cons on the issue; facet B refers their efforts to strengthen their positions using evidence; facet C refers the characteristics of evidences, that is, whether they pertain to either adequacy or rigor; and facet D refers to the participants' interpretation of each other's utterances, that is, their expression of positive or negative attitudes toward each other's utterances. We coded the utterances using a combination of four facets (i.e., stractable). In this study, the utterances in the debate minutes comprise the primary corpus and the utterances encoded with facet stractables comprise the secondary corpus. Utterances that do not explicitly pertain to a position are considered neutral and as such, classified under a third category for each facet. Section 3.2 presents the facet stractables in more detail.
In order to create the secondary corpus, we encoded the utterances with the facet stractables. Owing to the enormous amount of data in the primary corpus, decomposing the utterances based on facets is not easy. As such, we created the secondary corpus using a statistical facet learning model that reduces the efforts required of researcher while at the same time maintaining the logical consistency of the primary corpus. This statistical facet learning model is based on a pattern recognition technique that classifies the huge amount of primary corpus into the secondary corpus through the SVM, a type of statistical learning machine. Pattern recognition is generally defined as a method for extracting data features such as those in letters, video, and audio and determining the data categories based on the standard pattern of the features. This study decomposed the facets using the SVM model originally developed by Vapnik (see Kudo, 2002 yUh  . These bounding hyperplanes are defined by a high-dimensional discrimination function that determines the amount of terms included in the primary corpus and maximizes the distance between the border and each utterance. Although the discrimination function is usually represented by a nonlinear kernel function, we used a d-dimensional polynomial function that flexibly approximates the function. The SVM model used in this study is able to process large amounts of data and produces good estimation results. In this model, the utterances in the primary corpus, (1 ,, ) i Ui m   , are each classified into one of the three categories of facets, determines the corresponding relationship and is defined as: The two types of kernel functions, 1 () which are located in the n dimensional space ( n is the total number of terms in the primary corpus) are defined as follows: In addition, we define the facet vector i G for an utterance i U using 0-1 variables The set of facet vector i G for the all utterances (1 ,, ) i Ui m   is the secondary corpus.

Case study
This paper's case study examines the debates of the Yodo-River committee which was established in 2001 in order to obtain advice for the planning and policy handling of a river improvement project related to the building of a dam and gather the opinions of the representatives of affected citizens and public organizations. The Yodo-river committee meetings consists of a general meeting, four regional meetings, five theme meetings, five working group meetings, and three meetings of the sub-working group. A total of 400 meetings have been held since 2001. The debate minutes are available from the committee website and are downloadable as PDF files. The minutes record the names of speakers and their utterances chronologically.
From these meetings, we selected 14 debates between citizens and experts or between administrators (i.e., river managers) and experts. Participants were classified into five groups according to their roles: facilitator; expert; citizen with a "con" opinion; citizen with a "pro" opinion; and administrator. The minutes from the 14 debates consisted of 8,831 utterances. A unit of utterance is a single sentence ended by punctuation mark.

Outline of the facet decomposition
Four researchers from construction consulting companies with experience in mobilizing public involvement made a training set of facet decomposition. To maintain the objectivity of the training set, two researchers conducted facet decomposition on the same utterances. We completed the training set the corresponding facet decomposition. Table 1 shows the framework of the four facets with the facet elements and the typical language use.
An utterance is coded as facet A (1)  Example 1: The dam construction is the only way to ensure the safety of the town's residents given the limited public finances. → A (1) B (1) C (-1) D (-1) Example 2: There have been many regional development projects that constructed dams but only a few of them succeeded. → A (-1) B (1) C (1) D (-1) The first utterance example is decomposed into facet A(1)B(1)C(-1)D(-1). The expression "is the only way" indicates that the speaker agrees to the dam construction, hence, A (1), but does not have a positive attitude toward it, hence, D(-1). The expression "given the limited public finances" is an evidence and is therefore classified as B(1); however, this evidence is also based on the speaker's anxiety and concern, and as such, is classified as C(-1). Meanwhile, the second utterance example expresses pessimism regarding the dam construction, hence, A(-1). The phrase "only a few of them have succeeded" expresses the speaker's reason for his negative view and thus is denoted as B(1), C(1), and D(-1).
As mentioned earlier, researchers created and completed the training utterance set of facet decomposition. Of the total 8,831 utterances, 34% were decomposed as facet A, 65% as facet B, 59% as facet C, and 52% as facet D. Over half of the utterances are selected as training utterence excepting facet A. The training utterance set determines the statistical facet

Definition Elements
(1) Refers to the position of the participants (i.e., whether they have taken a "pro" or "con" position on the project.

Pros
Cons Neutral utterances including greetings, questions, and other utterances during the progress of the debate. These utterances are categorized as X(0) and not X(1) or X(-1).

Facet B
Refers to whether participants provided evidences to strengthen their position and help others understand their perspectives decomposition using the SVM. We then investigate the accuracy of the SVM decomposition by calculating the matching rate between the results done by the researchers and that by the SVM.
The accuracy of the SVM decomposition was estimated by using kernel-fold cross validation wherein we divide the 8,831 utterances into K subgroups and use K-1 subgroups as the training set (i.e., the utterances facet-encoded by researcher) while the remaining subgroup is decomposed into facets using the SVM. In this study, 10 K  . By changing the subgroup of the training set to the other subgroup, the statistical facet decomposition for all the utterances using the SVM can be implemented a total of K times.  Table 2 presents the results of the facet decomposition. The total number of facet decomposition performed by the researchers exceeded that of the utterances as each utterance can be classified into multiple facet elements. 35% of the utterances expressed the pros of the dam project; 33% mentioned the cons percent of the project; 31% of the utterances had evidences while 24% did not; 21% had rigorous evidence while 41% had adequate evidence; 12% expressed a positive attitude toward the project and 55% expressed a negative attitude. Over 50% of the utterances were negative responses to the project that had rigorous evidence. This result does not mean, however, that over half of the debate participants had a negative opinion of the project because the utterances were only the opinions of those who spoke. Meanwhile, in the facet decomposition resulting from the SVM, 4% expressed the pros of the project and 1% expressed the cons; 26% had evidences and 20% did not; 10% presented rigorous evidence and 24% presented adequate evidence; 5% expresses positive attitudes toward the project and 34% negative attitudes. The accuracy of the SVM decomposition was calculated using Equation 3b which resulted in an Fmeasure between 0.48 and 0.66. In general, the accuracy level should be between 0.6 and 0.7. There are at least two possible reasons for this low level of accuracy. First, many of the utterances were decomposed into the third category, neutral (X(0)). Second, the accuracy of the decomposition for facet A was particularly low because most participants imply rather than express the pros and cons of the project and as such, their utterances cannot be easily decomposed into the appropriate facets. The final facet classification lies on the researcher. To ensure the objectivity of the researchers' decomposition, a feedback system is used to review and revise the facet decomposition based on the discourse patterns. Subsection 4.2 presents the analysis of the discourse patterns.

Results
The less than ideal accuracy level of the SVM decomposition is not uncommon, as only few classification techniques utilizing natural language processing for discourse data have achieved sufficient accuracy. As such, the results of the SVM decomposition in this study are still useful for determining facet decomposition. The final facet decomposition was improved by adjusting the two classifications by the SVM which supported and facilitated the researchers' tasks.
The SVM decomposition provides useful information that helps researchers in their final analysis and significantly reduces their efforts for confirming the secondary corpus. In addition, the SVM helps monitor and improve the performance of rule-based approaches for opinion mining (e.g., Dagan, 2006;Radev, 2000). The SVM method is facilitates the comparison of the results of two approaches using disagreements to track changes in the discourse rules. Future studies should test the applicability of the SVM method to rulesbased approaches.

Similarity of utterance meaning
Using the secondary corpus (i.e., the facet-encoded utterance developed in the previous section), we conducted the discourse analysis in order to examine the interest conflict among the participants and visualize the discourse patterns during the debate process. Using non-metric multi-dimensional scaling (MDS), we distributed the participants on a two-dimensional space based on the similarity of their utterance meanings (Kruskal, 1964a and1964b;Qian, 2004 The similarity between the facet vectors of two participants is calculated based on the cosine angle distance: (1) (1) (1) 22 ( We can determine the dissimilarity of the facet vectors through the inverse of the cosine function: By changing the dimension value to a lower value, the optimal arrangement for the coordinate value in a two-dimensional space can be defined. Consequently, the semantic similarity of all individuals is illustrated as the distances in a two-dimensional space. Figure 1 shows the distribution of the participants in this study (49 professionals, 27 administrators [i.e., river managers], and 5 citizens) on a two-dimensional space based on the similarity of the facet vectors derived from 14 debates minutes (i.e., the primary corpus). The symbols ◆, ✳, and □ denote citizens, experts, and administrators, respectively. As shown in Figure 1, the participant groups usually have similar interests. For instance, citizens are located on the leftmost part of the graph. Meanwhile the administrators are distributed on the top portion of the space, while, the experts are positioned on the bottom of the space, indicating that these two groups have interest conflict.  Figure 2 illustrates the five types of participants who expressed their utterances: the facilitators (○ ×); experts (✳); citizens with a "pro" opinion (◆); citizens with a "con" opinion (◇), and administrators (□). As shown in the figure, the distance between facilitator and expert is wider than that between facilitator and citizen or administrator, which means that the interest structure of facilitators is similar to that of experts, but substantially differs from that of citizens and administrators. In this case, it is important to examine the change in the interest structure during the debate rather than the different interest structures themselves. It is also important to remember that different interest structures are not necessarily a negative thing. On the contrary, it will help groups obtain and understand the diverse perspectives of the various stakeholders on a project. In the following subsection, we will analyze the change in interest structure during the debate process using Figure 2 once again.

The change of discourse pattern
We created a time-sequence of the primary corpus and calculated the cumulative frequency of the facet in order to analyze the change in the discourse patterns of the five types of participants. Figures 3 to 7 illustrate the discourse change for the different types of participants. The horizontal axis on the graphs represents the number of time-sequence of utterances while the vertical axis represents the cumulative frequency of the elements of each facet. The line will go up if the frequency of category (1) X of facet X increases. The line will go down if the frequency of category (1 ) X  of facet X increases. The line will remain constant if the frequency of (1) X matches that of (1 ) X  . Figure 3 illustrates the temporal variation of the cumulative frequency of the facets of the facilitator. The facilitator expresses a total of 849 utterances in 8 out of the 14 debates. The results of the facet decomposition for facilitators show that the facilitators do not take a "pro" or "con" position on the project; expressed utterances without evidence; when they do provide evidence, they presented rigorous evidence during the debate in general and used adequate evidence on certain issues; and expressed more negative attitudes rather than positive attitudes throughout the debates.
In contrast, the cumulative frequency of the facets of the other types of participants exhibits a stable pattern (Figures 4 to 7). Experts had a total of 444 utterances in 6 out of the 14 debates; administrators expressed 151 utterances in one debate; citizens (pros) expressed 143 utterances in one debate; and citizens (cons) had 228 utterances in one debate. Except for the facilitators and experts, the participants were involved in only one debate; as such, their opportunities for expressing their perspectives on the project was limited, only managing to gain mutual understanding and knowledge during the debate. The results also show that administrators and experts have a tendency to take a neutral position on many facets while citizens have a tendency to repeat same categories of facets, that is, the pros or cons.
www.intechopen.com  The cumulative frequency of facet is a simple and useful indicator for understanding the change in discourse and consequently the change in the interest structures. By examining the changes in the discourse patterns, we can uncover the possible causes of the interest conflicts.
In sum, this study's proposed method of discourse analysis enables us to identify debate content and structure and consequently the conflict structure and dynamic change throughout the debate process. This new approach overcomes the limitations previous existing approaches.

Conclusions
This study proposed a new method for examining the interest conflict of participants and discourse pattern changes in the process of debate using discourse analysis. We applied this method to a series of debates on a real public project during which we uncovered interest conflict among certain types of participants.
Results of our analysis suggest that during public debates, it is important to identify and adopt facilitation techniques that help identify discourse patterns, which in turn uncovers the cause of interest conflicts. This will help the debate participants examine the different perspectives of stakeholders and arrive at a consensus.
While the proposed discourse analysis method in this study helps manage and support the debate progress, it is not without its limitations. First, the validity of the proposed method relies mainly on the facet decomposition framework. Further empirical analysis is needed to confirm this validity. Second, the accuracy of the facet decomposition needs to be improved. Facet decomposition resulting from the use of the SVM has low accuracy levels for highly www.intechopen.com context-dependent utterances. Further studies may address this problem using a feedback system that reviews the results of discourse pattern changes. Third, utterances are dependent on the debate context and the characteristics of the participants. As such, it is necessary to improve the method by taking into account latent variables such as social context and unobservable individual characteristics. Fourth, it is easy to lose diverse contextual information in the process of facet decomposition. It is therefore necessary to build a database that includes other contextual information such as the right to speak (e.g., utterance turn or length of utterance) or the social relationship between participants. Finally, further normative research of public debate is needed. This involves developing normative rules of public debate and evaluating models of desirability of public debate. By overcoming these limitations, facilitation techniques can be vastly improved.