Adapting to Personality Over Time: Examining the Effectivenewss of Dialogue Policy Progressions in Task-Oriented Interaction 1. Do you think that extroversion/introversion plays out in a different way using textual communication? I am wondering if they might get different results if they were to use a spoken dialogue tutoring system. 2. It seems like they gave a test to the students first to figure out their extroversion or introversion. Would it be possible for the system to learn teh users personality just by interacting with it in the tutoring lesson? 3. Do you think their results would have improved if instead of breaking it up into a binary introvert/extrovert they had used a scale and trained the models for each point on the scale? This paper did a good job cataloging the different bigram policies that affect learning. It would be interesting to see if there are any more useful features if they consider trigrams, etc. One problem I had with this paper was that after it had shown the empirical differences between extravert and intravert learning, it tried to 'explain' the cause of each of these individual differences with no further evidence. It seemed like they were just guessing at the cause, but they didn't really justify their guesses. It would have been nice to see the researchers actually build a tutoring system that uses these results. It would be interesting to see if there is a way to predict a subject's Big Five Factor (personality) scores from the dialogue only (ie, without subjecting them to an explicit personality test). It was interesting to see how this paper presented the result that seemed rather intuitive in numbers and equations. However, the first question that came across to me when reading the paper was the accuracy of the students’ extraversion scores. This study does suggest a good point that adapting to personality would bring improvement to future dialogue system, although I think it might not be the most urgent factor to consider at the stage of dialogue system that we have nowadays. I found it surprising that introverted students averaged 36 utterances while extraverted students averaged 34 utterances per session. Although the introverted students had fewer words, it’s interesting that extraverts did not initiate significantly more utterances. 2. How would a tutoring system know the personality of the user? Would the user take a personality test before starting a session, or can we use heuristics to detect introversion/extraversion? 3. They only tried one model (J48) to classify dialogue acts. Why didn’t they try other models to see if they could get better classification performance? Students whose extraversion scores were less than or equal to 7 were treated as introverts and those above 7 were treated as extraverts in this study. I wonder if they could get better or more obvious results if they took out the subjects with extraversion scores around median 7 and only consider those are clearly introverts and extraverts. ● In real automatic tutoring dialogue systems, especially commercial systems, it is unlikely to let the user do the personality questionnaire first. Can the system infer user personalities from their interaction with the system? Is there any work done related to this topic? ● The similar researches were probably done by psychologists and educators. This paper is more like a psychology paper to me than a computer science paper. The only difference might be that the tutoring process is through a computer or chatting interface. Would the results be different from traditional face­to­face tutoring? Regarding the paper of personality adaptation, the scholars are interested in differentiating the users' personality to better serve the objects by adopting different dialogue policy and tutoring strategy. They use the Five Factor Model questionnaire to determine the personality of each user. However, in the real world application, it is infeasible to ask a user to conduct a long lasting questionnaire before they start using the dialogue system. How reliable is the pattern discussed in the paper to be served as indicator to determine the introversion and extroversion of the user? How to measure the reliability of such indicators? How can a dialogue system learn the user's personality through the usage process? Is there any research done in learning a person's personality through a dialogue? What kind of approach is reasonable? The scholar used a supervised dialogue act classifier(a J48 decision tree) to annotate the dialogue. How reliable is the classification? How to mitigate the error rate of the unnecessary uncertainty of the annotation? The scholars hypothesize that "tutors adapt differently to introverted and extroverted students, and that students of different extroverted or introverted tendencies learn more effectively from different dialogue policies". How valid is this hypothesize? Are tutors instructed to give different tutorial to students based on the contextual observation they gained through the mere text screen? This research is mainly focused on a human-human dialogue, how is it applicable to a machine-human dialogue system? How to leverage the lessons learned in this research to future dialogue system building? How the measurement is done for the benefit gained is questionable. Student will gain improvement on the doing the same set of questions anyway disregarding the tutoring policy. How can a conclusion to be drawn to confirm that it is the adaption of a user's personality that contributes to gained benefit rather than other factors in a tutoring dialogue context? 1) The main aim of this paper is to explore dialogue strategies, such as adapting dialogue on the basis of the personality of the human being the system is interacting with.It models changes in strategies over time, on the basis of personality traits detected to enable effective learning. 2) Big Five Factor Model ­ used to identify/gauge the five user personality traits : Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. The final dialogue act classifier includes: speaker role, two­step dialogue act history (category and tag), utterance length, existence of the ‘?’ token, existence of 160 unigrams and 150 bigrams, and existence of 31 part­of­ speech unigrams and 152 part­of­speech bigrams. 3) A tutorial dialogue corpus was used to collect interaction data between a tutor and a student. This was based on certain tasks that were provided by the tutor, in an interactive tutorial based system, where the aim was to create a text­based adventure game. ********************************************************** Adapting to Multiple Affective States in Spoken Dialogue This article was a lot harder to follow than the other one and so there are a few more things I am confused about. 1. They mention a pre/post motivation survey which they gave their experiment subjects. What types of questions are asked on it? How exactly does it fit into their system? 2. I'm confused about what an ANOVA is? They mentioned it when describing the global performance evaluation but don't go into any detail to what it does. 3. What are the differences between global performance and local performance in their model? They mention what they mean by local performance but don't really specify what they mean by global performance. It is interesting to see that although user motivation increased. I had always assumed that motivation was internal, but this study shows that adapting the dialogue can make users more motivated. Another interesting result is that incorrect+certain+engaged users responded negatively to adaptation. I wonder if this is because the users were so sure of their answers they attributed it to an error of the system, thus decreasing their trust of the system. Were the users allowed to use electronics during the experiment? In their description of the UNC-DISE_ADAPT system, the use of electronics was a feature of disengagement. This could be problematic because if a user recieves a message from a friend at a certain time, it does not have anything to do with the conditions of the dialogue at that time. I like how this paper is built on the previous study. I wasn’t too clear on how UNC_ADAPT classifies the answer’s correctness and would like to hear more on that. Also, I hope to hear more explanation of the transition analyses as that part slowed me down. This paper argues that the disengagement adaption can be generalized across domains, and I thought if that is actually so, it would be impactful in future development of ODMs. Performance of a trialogue-based prototype system for English language assessment for young learners I thought the trialogue was a great way to guide students and make up for what one dialogue system cannot cover. I wasn’t sure though how much the teacher would be involved in the dialogue. Also, is PocketSphinx the fastest in ASR decoding time? If so, I was wondering how it’s different from other ASR system. When dividing the responses in three different categories and designing responses accordingly, what’s the extent of “correct” answer? They mention that people have focused on the disengagement behavior of “gaming” – what do they mean by this? 2. Can disengagement be automatically detected using acoustic/prosodic features? Inter annotation agreement is not high for this task. 3. What are the tradeoffs of this approach? It might be annoying for a user if the system thinks they are disengaged, and adapts accordingly, when the user is simply speaking in a monotone. How did the hidden human wizard labeled the affection and correctness? Did he follow a set of rules or did he just give the labels based on his experience and knowledge? ● The authors have done a lot of previous works that are related to this paper. In such cases, they simply noted that the information was described in detail elsewhere. It would be better if they could briefly summarize their previous work in this paper. For example, I am interested to know how they developed different system responses for different user affective states and I would also like to learn about their disengagement annotation scheme. ● They stated that users are more responsive to the disengagement adaption when the affect detection and natural language understanding outputs are noisier. Why is that the case? Are there papers and researches supporting that statement? The main aim of iteratively adding new affect adaption to an existing system is to increase task success and user satisfaction, by handling different user moods differently, by it’s timely detection. Such as, confusion, frustration, annoyance etc. It is explored in the “Computer Tutoring domain” in this paper. 2) This paper performs multiple adaptations to user responses/moods, rather than only one, as done in related work. 3) There were essentially 2 sections ­ UNC_ADAPT i.e the adaptive tutoring system & UNC_DISE_ADAPT ITSPOKE that added a user disengagement detection component to the system, to identify user disengagement. In the prior, depending on the user response, it would try to return the user back to right path via a lengthy two way dialogue, or “bottom out” by giving the user an explanation. However, in the latter, a bottom out approach would cease to be effective, since the user will have lost interest by that point of time, and hence steps are taken to re engage the user back into the conversation by asking simple fill in the blank questions.