
SARA
Conversation Design for Socially-Aware Chatbot
SARA, the Socially-Aware Robot Assistant, was developed in Carnegie Mellon University's ArticuLab.
Team: Animator, Researchers, Developers, Conversation Designer, Technical Artist at ArticuLab, Carnegie Mellon University.
My role: Conversation Designer, Developer. I designed intuitive and informative conversation experiences based on scenarios, applying the intricacies of human speech patterns to the limitations of computers to communicate more like human.
Contribution: Designed scenarios, conversation flows, user experience. Conducted competitive research, user testings. Developed Dialogue Manager, browser-based user flow generator.
Timeline: 2016.4-2016.12
Acknowledgement: This work was supported in part by generous funding from Microsoft, LivePerson, Google, and the IT R&D program of MSIP/IITP [2017-0-00255, Autonomous Digital Companion Development].
SARA was presented at the World Economic Forum (WEF) Annual Meeting in Davos (January 17-20, 2017), and in Tianjin, China. http://articulab.hcii.cs.cmu.edu/projects/sara/
Who is SARA?
Overview
SARA is designed to collaborate with her human users. Rather than ignoring the socio-emotional bonds that form the fabric of society, SARA depends on those bonds to improve her collaboration skills.
A rapport model between the user and SARA is built. The system employs appropriate conversational strategies to raise rapport level with the human user. SARA uses the conversation to build a relationship with the user, then uses that relationship to obtain better information about user's interests and goals. In turn, that allows her to do a better job recommending sessions and people.
Demo and media coverage video:
What do users think about chatbots?
Interviews
We interviewed 10 students about their experience using popular personal assistants (Apple Siri, Amazon Alexa, etc.), here is what they reported about pain points:
- Lack of variation
- Over "task-oriented" and transactional/No small talks
- Strictly follow a fixed flow
How can they be more intelligent?
Problems
Major tech companies envision intelligent personal assistants, such as Apple Siri, Microsoft Cortana, and Amazon Alexa as the front ends to their services. They are good at doing certain tasks (check weather, calendar, play music), but users complain about their intelligence for the fact that they don't have a sense of the context and lack the empathy with the users, thus fulfill very few of the functions that a human assistant might.
Why? Because most chatbots in the market are using a Form-Based approach, following a simple, non-context-aware flowchart to communicate with people to complete tasks.
Approach
We can try to solve the problem, rather than improving the natural language capability, by improving user experience in giving the social agent the ability to maintain the rapport with user.
We believe Rapport between human and personal assistants is the social infrastructure for improved performance and experience, thus we are building a personal assistant with embodied modality, and converse based on social reasoning to help maintain rapport.
Examples of conversation modes with Siri.
Hypotheses
Mechanism
To begin the process, we created our prototype using detection of visual (head and face movement), vocal (acoustic feature) and verbal (conversational strategies) features as input to estimate the rapport between user and agent, and that uses similar visual, vocal and verbal behaviors in the output performed by an embodied conversational agent (ECA) to build and maintain rapport with that user in such a way as to maximize task performance and user satisfaction with the experience.
Architecture
Input: User's Utterance, Acoustic Features, Nonverbal Behaviors
Output: Conversational Strategy, Body Language, Facial Expression, System Intents, Unity 3D Animation
Through rapport-building conversational strategies, the agent elicits the user’s interests and preferences and uses these to improve its recommendations. Through estimating the user’s current level of rapport, and the conversational strategies the user has uttered, the agent is able to choose the right conversational strategies to respond with.
Conversational Strategies:
- self-disclosure (SD)
- elicit self-disclosure (QE)
- reference to shared experience (RSD)
- praise (PR)
- violation of social norms (VSN)
- back-channel (BC)
The implementation process.
Scenario Design
Happy Path User Flow
The first scenario we designed is an international conference (World Economic Forum), where SARA acts as a virtual personal assistant. She first introduces herself and asks questions about attendees' current feelings and mood. Then, the attendees are asked about their occupation as well as their interests and goals for attending the conference. SARA then cycles through several rounds of people and/or sessions recommendations, showing information about the recommendations as desired, and are able to leave the booth anytime they want. Finally, SARA proposes to take a "selfie" with the attendees before saying farewell.
The happy path user flow is as follows.
Interaction Flow
SARA uses microphones and video cameras to track user’s nonverbal communications.
SARA estimates rapport based on nonverval and verbel communications.
Conversation strategy and SARA's response generated based on inputs and current rapport level.
As a result, SARA recommends people/sessions users might be interested in and shows on screen.
SARA also sends a letter to the person you'd like to connect with.
Conversation Design
Design Conversation Flows
Based on the happy path user flow and scenario, I brainstormed with the team and mapped out what users can do in each state. Finally, I designed the conversation flow (see below).
The full conversation flow is nonlinear, is able to switch between states given the input of: user intent, user/system states, acoustic features, nonverbal behaviors.
Happy Path User Flow
Conversation Flow
Iteration
The full conversation flow above looks terrifying, and it's impossible to design it flawlessly. The way I did it was starting from the happy path, brainstorming about human utterances and user flows with the team and working closely with game designers.
We conducted user testings and iterated based on the errors encountered. Then we added the edge cases to the coversation flow for robustness. Researchers and developers also added NLP support to handle exceptions.
Finite-State Machine
Taking a closer look, the conversation flows are designed in Finite-State Machines. Here's an excerpt of it.
State transitions are based on: user intent, inputs (user/system states, acoustic features, nonverbal behaviors).
System is designed to handle exceptions.
SARA also answers unrelated questions and small talks to improve rapport. This will jump out of the FSM.
In order to design and visualize conversation flow more easily, I developed a front-end tool for Finite-State Machine visualization and editing.
Demo at WEF
SARA was presented at the World Economic Forum (WEF) Annual Meeting in Davos (January 17-20, 2017). The SARA booth was located right in the middle of the main corridor of the Davos Congress Center and was, in fact, the only demo in the Congress Center.
Interaction excerpt
SARA booth at WEF
Klaus Shwab, founder and president of WEF
Team Collaboration
We worked in a very cross-functional team, including developers, designers, 3D artists, psychologists, researchers.
My Contributions
- Designed the conversation scenarios based on usability contexts and device types (TV screen, tablet, mobile phones). I created storyboards and journey maps for personas, for each type of users and specific task.
- I designed the conversation flows (intents transitions and sample dialogues) and turned them into Finite-State Machines (FSM) in development (json, JAVA), incorporated in the Dialogue Manager. We tested and sent out surveys to the CMU community to gather feedback and suggestions. The FSM reflected the conversation states transition and turn-taking, connected with NLU, Social Reasoner (including rapport level, cognitive load), User Model, etc.
- We sat down with human users and conducted User Tests. We invited CMU students, faculty and random people passing by, to simulate a conference scenario, interacting with WOZ (Wizard of OZ) system.
- From the tests I collected errors in dialogues with SARA and improved the conversation flow. I also researched and redesigned the prompts in SARA's utterance based on observation and data analysis, in order to get a lower error rate.
- I created a web-based FSM visualization tool in Javascript, to make adding and editing FSM states much easier, and for future researchers to use.
Takeaways
From this project, I gained experience in Conversation Design and HCI based on NLP/AI. I designed interaction flows in certain scenarios. I learned technical knowledge in NLP and Linguistics. I also experienced success and failure in usability testings and demos (something went wrong in a big demo in the CS school). I learned to work cross-functionally by collaborating with Researchers, Designers, Developers and team leader.