An Application Of Support States From Speech Emotions In Consensus Building

In public organizations and business discussion, various ideas and opinions are made. In order to reach agreement among participants, discussion is essential. During which, differences in the processing of forming an agreement will affect the final conclusion. So major statements should be analyzed to reach the consensus. However, in the process of reaching consensus, a proper under-standing of the statements will be necessary acting as the basis between various discussants. In this study, the relationship between the consciousness of the conversation and the state of support is discovered through the speech emotional perception of the conversation, and the support states of supportive, negative and unknown are inferred. In addition to making listening experiments on the emotions and states, the application of the analysis of the speech emotion recognition process is discussed on the basis of verifying the emotion of speech and the dependence of the support state. The accuracy of average objective recognition rate can be increased to 75% in the formation of the consensus building during speech conversation.


Discussion is essential for opinionholders to seek agreement with each other when they are involved in a public organization or within an enterprise environment. The process of clarifying the explicitization of tacit knowledge through the discussion, and making mutual agreement between them is called the consensus building. The difference in the consensus building process will affect the final conclusion. Therefore, it is necessary to analyze the important clue utterance. In the process of reaching an agreement, it is necessary to correctly get the agreement and disagreement information between the various utterances, which h2is necessary for the discovery of ‘clue’. Stakeholders basically communicate with each other orally. However, because the language is consciously controlled, if only the texture record converted from the language is used, many information will be lost. Therefore, this study is based on the correct estimation of the speaker's intentions from the speech. In this research, emotion data contained in human voice in the dialogue of consensus building discussion is focused on and aimed to estimate three intents such as positive, negative and unknown.

Among the existing researches on consensus building, most of which emphasis on languages such as text analysis, keyword extraction analysis and language statistics based estimation method. However, in this research, in addition to text analysis by the language of the dialogue, the feature quantities such as fundamental frequency, duration, sound intensity, tempo, and emotional features of the speech which express emotions and intentions contained in the speech data are utilized.

In this research, the hypothesis that ‘a strong dependence between emotion and support state’ is made to verify its validity through speech recognition method for discussion. By combining speech recognition of consensus building which automatically analyzing the state of support among debater and text analysis methods, it is possible to improve the accuracy of finding remarkable utterance in consensus building process.

Research about Consensus Building

Now, there are many theories and methods for consensus building. Since 1999, more than thousand pages of ‘Consensus Building handbook’ have been issued. However, consensus building is a very complex process. In many open source projects, consensus building that were originally formed in the form of conferences can be gradually turned to computer assisted. Many studies based on decision making in computermediated discussions suggest that groups are more likely to come to consensus engaging in structured synchronous discussions. Other active of consensus research area is in computational statistics. They mainly consider the algorithms for performing inference on large or unwieldy datasets by partitioning the data, analyzing each piece separately, and then putting the inferences together under ‘consensus Monte Carlo’. Pathmaker's data analysis tools provide assistance to the conference moderator, creative thinking and consensus building. Schuman proposed a method of decision making meeting from the angle of group decision. Polish System Research Institute built a computer support system called ‘Mediator’ based on expert advice group and examined nine different convergence methods. In addition, many Japan and American companies provided technology and consultation software such as AHPbased GDSS or MRV on consensus formation and group decision making.

In the conference and dialogue, when repeatedly entering wash-up questions, drilldown questions, suggestion questions, and summary questions, the accuracy with which consensus can be formed within a certain time is significantly increased. Microblog here is used to form the discussion environment for consensus building. The research on the quantitative analysis of consensus building process in the tree structure is proposed. In that study, a method of tracking ‘clue’ is used to measure changes in information entropy. Microblog comments in tree structure are generally widespread, and comments about microblogs with speech intent are used in microblog reviews. In that commentary, fields can be marked as ‘speaker’, ‘commentator’, ‘content’ and ‘ intention’.

When constructing an agreed microblog comment tree, the speaker's intention must be clear. So it is necessary to evaluate the commentator's response and the relationship between his speech and intention. The state of content, whether he or she is support or against speaker should be analyzed. Estimation of utterance intention can by performed by using Negotiation Process Model. The evaluation which represents the position of the commentator to the speaker’s will be ‘positive’, ‘negative’ and ‘unknown’.


In human consensus building dialogue, basically speaking the language advances interaction. Language is manifested information and is consciously controlled. Therefore, text information often contains falsehoods. In this section, proposed method of consensus building process analysis in speech emotion recognition is outlined firstly. Then, the explanation of speech emotion recognition, and the relation between emotion and support status is carried out. Finally, an algorithm for estimating the support state will be described.

Outline of proposed method

This study presents a method for analyzing the protocol formation process of speech emotion recognition in Figure 1. Six kinds of emotions ‘happy’, ‘anger’, ‘sadness’, ‘dislike’, ‘fear’ and ‘neutral’ are recognized for speech data in the discussion. Three support states ‘positive’, ‘negative’ and ‘unknown’ are obtained. Then, the relation between emotion and supportive condition is examined.

Speech emotion recognition

In a previous study on speech emotion recognition, the most important issue to be solved in human computer interactions by Stanford University's Reeves and Nass research is called ‘emotional intelligence’. In 1990, at the Massachusetts Institute of Technology's Multimedia Research Institute, an emotion editor was build. That is, the machine can recognize emotions extracted from various signals and train them to respond appropriately. In 1996, researchers at Nihon Seikei University advocated the concept of emotional space and built a special speech emotion model. In recent years, modeling of speech emotion recognition using neural network, hidden Markov model are applying.

To obtain the overall emotion of a long conversation, long one should be divided into short parts at first. For recognizing emotion data, extraction of feature quantity called MFCC will be described. Continuous speech recognition in Hidden Markov Model using speech data including these emotions will be applied next.

Hidden Markov Models have a long tradition in speech recognition. The underlying idea is that the statistics of voice are not stationary. Instead of that, voice is modeled as a concatenation of states, each of which models different sounds or sound combinations, and has its own statistical properties. There are many advantages of HMM’s in front of global statistics for emotion recognition: first, the structure of HMM’s may be useful to catch the temporal behavior of speech; second, HMM technology has been long time studied for speech recognition purposes, being available well established procedures for optimizing the recognition framework: BaumWelch algorithm, discriminative training, etc.

In this study, the emotion recognition unit is designed with patterns of time series of 12dimensional feature quantity of MFCC explained before. The HMM is a nondeterministic model in which states transit according to preset transition probabilities for a given input. The system consists of eight corresponding to six types of emotions, and a Viterbi algorithm recognition unit for collecting outputs from each HMM. The input to the system is a time series data of feature quantities, but vector quantization (VQ) is performed before inputting to HMM.

Association of support and affective state

Even if it is a word of praise on the character surface, for example ‘it is wonderful’, depending on the emotional way of saying there is a case to convey the intention of reproach. Or, even if it is a blame word on the character surface like ‘foolish’, it may be the case to convey the affection of love.

In consensus building process not only the linguistic meaning of utterances but also the emotions involved are important elements to convey information. Therefore, from the six kinds of emotions recognized in Section 3. 2 above, in order to estimate the state of support among debaters, it is necessary to examine the relation between emotions and supportive state. Using the chisquared test method, the relation between emotion and supporting state will be described.

Support state

Supported state has various definitions. The definition of well-known is to agree with certain opinions or assertions, and to encourage it. Either from the aspect of cognition or emotion, it is regarded as an internal mental state. The state of support for the partner in discussion is assumed that ‘the attitude or emotion of the responding partner is equal in one's attitude and feeling’. This is different from sympathy.

Support state in this research is defined for utterance Y which responds to the preceding utterance X during discussion, and whether the speaker of utterance Y has the same opinion or not. It holds the respect to the topic content of utterance X and is classified as follows.

Relationship of Chi Square

The state of support can be estimated from emotion, but it should be investigated whether or not by chisquare test.

In the binomial test, a simple occurrence probability of whether an event occurs or not is targeted. However, when the observed event is classified into three or more categories, the sample distribution has a theoretical distribution Chi square test is used to test how much it applies to the distribution. Based on the cross tabulation table (division table) for the bivariate A and B, it is tested whether there is a relation between the bivariate. When actually used, it is necessary to examine from the viewpoint whether the difference of the factors defined for variable A affects the distribution of data corresponding to each factor of variable B, that is, whether there is a difference in data distribution between groups. Since the χ 2 distribution is used for the determination, it has the common name ‘χ 2 test’.

Estimation of Support State in Emotion

Taken as a whole, the ultimate goal of the proposed method of this research is to estimate the supporting state of the utterance using the emotional sequence in the dialogue.

In this study, we estimate the supporting state by simple alcoholism. Here, based on the relation taken in relation test method in Section 3. 3, we estimate the support state from emotion. Emotions are consolidated into three categories, which are ‘positive’, ‘negative’ and ‘unknown’. Collect all the sections of one conversation and get the statistical number of occurrences in each category. After that, each statistic number is normalized, and the result is used as output to the classifier.

As shown in prog code, the support state of ‘positive’ is related to the emotion of ‘joy’, ‘fear’, and ‘neutral’ ; ‘negative’ is ‘anger’, ‘ dislike ‘; ‘Unknown’ is related to ‘sadness’.


Using the methods in the previous section, we analyze the emotions contained in the speech and further estimate the support state by experiments and evaluate the results.

In subjective judgement of speech corpus, data analysis was performed for each speaker using data in closed environment with total 228 sentences in 100 dialogues of speakers. As a result of the experiment, the subjective discrimination rate of the emotion was about 85 to 100% and the average rate 93. 7% was obtained. Also, the discrimination ratio of the support state was about 85 to 100%, and the subjective discrimination ratio close to 93% was obtained on average. Looking at the results individually, 100% of the subjective discrimination ratio in anger emotional and supporting state ‘positive’ is particularly high compared with others is obtained.

Then, experiments are conducted by the method of speech emotion recognition described in the proposed method. Here, 600 corpuses of 1080 samples are targeted for learning and 480 set as the corpus of emotional voice conversation are for recognition. Emotions were recognized from 80 emotions using Matlab. In this experiment, we test the relevance of emotions and supportive status in 422 remarks of a controversial meeting. On the contrary, subjective discrimination of each utterance was made.

Based on the emotion recognition model and the relevance test, we carry out the estimation experiment of the support condition. Here, the data to be used is a dialogue speech corpus of 228 sentences total of 100 dialogues of the speaker. A corpus that can be used for general purpose has constructed a no-emotion emotional speech corpus and a dialogue speech corpus. Moreover, subjective discrimination rate of each corpus is 93. 7%. In addition, a speech emotion recognition experiment is performed using this corpus, and the average accuracy rate reached 75%.


Depending on the experimental result of emotion and support status, It was tested that there is a relation between them. Finally, based on this relevance, estimation experiments of emotional supportive state is conducted, and the average accuracy rate reaches 74. 76%. In other words, the hypothesis that the support state can be estimated from the emotion. Regarding the relevance of emotions and supportive status, in the case of ‘neutral’ there is no judgment of dependence sufficiently, so re-estimate linguistic support status from the ‘neutral’ feelings need to be devised as a future improvement it is conceivable that. It is expected that better results will be obtained using classifiers for estimating supportive states from emotions.

15 July 2020
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon

Your essay sample has been sent.

Order now
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now