"I Didn't Know I Looked Angry": Characterizing Observed Emotion and Reported Affect at WorkWith the growing prevalence of affective computing applications, Automatic Emotion Recognition (AER) technologies have garnered attention in both research and industry settings. Initially limited to speech-based applications, AER technologies now include analysis of facial landmarks to provide predicted probabilities of a common subset of emotions (e.g., anger, happiness) for faces observed in an image or video frame. In this paper, we study the relationship between AER outputs and self-reports of affect employed by prior work, in the context of information work at a technology company. We compare the continuous observed emotion output from an AER tool to discrete reported affect obtained via a one-day combined tool-use and diary study (N=15). We provide empirical evidence showing that these signals do not completely align, and find that using additional workplace context only improves alignment up to 58.6%. These results suggest affect must be studied in the context it is being expressed, and observed emotion signal should not replace internal reported affect for affective computing applications.2022HKHarmanpreet Kaur et al.University of MichiganHuman Pose & Activity RecognitionExplainable AI (XAI)CHI
Design of Digital Workplace Stress-Reduction Intervention Systems: Effects of Intervention Type and TimingWorkplace stress-reduction interventions have produced mixed results due to engagement and adherence barriers. Leveraging technology to integrate such interventions into the workday may address these barriers and help mitigate the mental, physical, and monetary effects of workplace stress. To inform the design of a workplace stress-reduction intervention system, we conducted a four-week longitudinal study with 86 participants, examining the effects of intervention type and timing on usage, stress reduction impact, and user preferences. We compared three intervention types and two delivery timing conditions: Pre-scheduled (PS) by users and Just-in-time (JIT) prompted by the system-identified user stress-levels. We found JIT participants completed significantly more interventions than PS participants, but post-intervention and study-long stress reduction was not significantly different between conditions. Participants rated low-effort interventions highest, but high-effort interventions reduced the most stress. Participants felt JIT provided accountability but desired partial agency over timing. We present type and timing implications.2022EHEsther Howe et al.Microsoft Research, University of California, BerkeleyWorkplace Wellbeing & Work StressCHI
MeetingCoach: An Intelligent Dashboard for Supporting Effective & Inclusive MeetingsVideo-conferencing is essential for many companies, but its limitations in conveying social cues can lead to ineffective meetings. We present MeetingCoach, an intelligent post-meeting feedback dashboard that summarizes contextual and behavioral meeting information. Through an exploratory survey (N=120), we identified important signals (e.g., turn taking, sentiment) and used these insights to create a wireframe dashboard. The design was evaluated with in situ participants (N=16) who helped identify the components they would prefer in a post-meeting dashboard. After recording video-conferencing meetings of eight teams over four weeks, we developed an AI system to quantify the meeting features and created personalized dashboards for each participant. Through interviews and surveys (N=23), we found that reviewing the dashboard helped improve attendees' awareness of meeting dynamics, with implications for improved effectiveness and inclusivity. Based on our findings, we provide suggestions for future feedback system designs of video-conferencing meetings.2021SSSamiha Samrose et al.University of RochesterRemote Work Tools & ExperienceNotification & Interruption ManagementCHI
AffectiveSpotlight: Facilitating the Communication of Affective Responses from Audience Members during Online PresentationsThe ability to monitor audience reactions is critical when delivering presentations. However, current videoconferencing platforms offer limited solutions to support this. This work leverages recent advances in affect sensing to capture and facilitate communication of relevant audience signals. Using an exploratory survey (N=175), we assessed the most relevant audience responses such as confusion, engagement, and head-nods. We then implemented AffectiveSpotlight, a Microsoft Teams bot that analyzes facial responses and head gestures of audience members and dynamically spotlights the most expressive ones. In a within-subjects study with 14~groups (N=117), we observed that the system made presenters significantly more aware of their audience, speak for a longer period of time, and self-assess the quality of their talk more similarly to the audience members, compared to two control conditions (randomly-selected spotlight and default platform UI). We provide design recommendations for future affective interfaces for online presentations based on feedback from the study.2021PMPrasanth Murali et al.Northeastern UniversitySocial & Collaborative VRHuman-LLM CollaborationCHI
Understanding Conversational and Expressive Style in a Multimodal Embodied Conversational AgentEmbodied conversational agents have changed the ways we can interact with machines. However, these systems often do not meet users' expectations. A limitation is that the agents are monotonic in behavior and do not adapt to an interlocutor. We present SIVA (a Socially Intelligent Virtual Agent), an expressive, embodied conversational agent that can recognize human behavior during open-ended conversations and automatically align its responses to the conversational and expressive style of the other party. SIVA leverages multimodal inputs to produce rich and perceptually valid responses (lip syncing and facial expressions) during the conversation. We conducted a user study (N=30) in which participants rated SIVA as being more empathetic and believable than the control (agent without style matching). Based on almost 10 hours of interaction, participants who preferred interpersonal involvement evaluated SIVA as significantly more animate than the participants who valued consideration and independence.2021DADeepali Aneja et al.Adobe Research, University of WashingtonFull-Body Interaction & Embodied InputConversational ChatbotsAgent Personality & AnthropomorphismCHI
"Warm Bodies'': A Post-Processing Technique for Animating Dynamic Blood Flow on Photos and AvatarsWhat breathes life into an embodied agent or avatar? While body motions such as facial expressions, speech and gestures have been well studied, relatively little attention has been applied to subtle changes due to underlying physiology. We argue that subtle pulse signals are important for creating more lifelike and less disconcerting avatars. We propose a method for animating blood flow patterns, based on a data-driven physiological model that can be used to directly augment the appearance of synthetic avatars and photo-realistic faces. While the changes are difficult for participants to "see", they significantly more frequently select faces with blood flow as more anthropomorphic and animated than faces without blood flow. Furthermore, by manipulating the frequency of the heart rate in the underlying signal we can change the perceived arousal of the character.2021DMDaniel McDuff et al.MicrosoftIdentity & Avatars in XRCHI
Optimizing for Happiness and Productivity: Modeling Opportune Moments for Transitions and Breaks at WorkInformation workers perform jobs that demand constant multitasking, leading to context switches, productivity loss, stress, and unhappiness. Systems that can mediate task transitions and breaks have the potential to keep people both productive and happy. We explore a crucial initial step for this goal: finding opportune moments to recommend transitions and breaks without disrupting people during focused states. Using affect, workstation activity, and task data from a three-week field study (N=25), we build models to predict whether a person should continue their task, transition to a new task, or take a break. The R-squared values of our models are as high as 0.7, with only 15% error cases. We ask users to evaluate the timing of recommendations provided by a recommender that relies on these models. Our study shows that users find our transition and break recommendations to be well-timed, rating them as 86% and 77% accurate, respectively. We conclude with a discussion of the implications for intelligent systems that seek to guide task transitions and manage interruptions at work.2020HKHarmanpreet Kaur et al.University of MichiganNotification & Interruption ManagementWorkplace Wellbeing & Work StressCHI
Accessible Video Calling: Enabling Nonvisual Perception of Visual Conversation CuesNonvisually Accessible Video Calling (NAVC) is a prototype that detects visual conversation cues in a video call and uses audio cues to convey them to a user who is blind or low-vision. NAVC uses audio cues inspired by movie soundtracks to convey Attention, Agreement, Disagreement, Happiness, Thinking, and Surprise. When designing NAVC, we partnered with people who are blind or low-vision through a user-centered design process that included need-finding interviews and design reviews. To evaluate NAVC, we conducted a user study with 16 participants. The study provided feedback on the NAVC prototype and showed that the participants could easily discern some cues, like Attention and Agreement, but had trouble distinguishing others. The accuracy of the prototype in detecting conversation cues emerged as a key concern, especially in avoiding false positives and in detecting negative emotions, which tend to be masked in social conversations. This research identified challenges and design opportunities in using AI models to enable accessible video calling.2019LSLei Shi et al.Accessibility and assistive technologiesCSCW
Circadian Rhythms and Physiological Synchrony: Evidence of the Impact of Diversity on Small Group CreativityCircadian rhythms determine daily sleep cycles, mood, and cognition. Depending on an individual's circadian preference, or chronotype (i.e.,``early birds'' and ``night owls''), the rhythms shift earlier or later in the day. Early birds experience circadian arousal peaks earlier in the morning than night owls. Prior work has shown that individuals are more effective at analytic tasks during their peak arousal times but are more creative during their off-peak times. We investigate if these findings hold true for small groups. We find that time of day and a group's majority chronotype impact performance on analytic and creative tasks. Physiological synchrony among group members positively predicts group satisfaction. Specifically, homogeneous groups perform worse on all tasks regardless of time of day, but they achieve greater physiological synchrony and feel more satisfied as a group. Based on these findings, we present and advocate for a temporal dimension of group diversity.2019EJEunice Jun et al.Groups and creativityCSCW
Managing Stress: The Needs of Autistic Adults in Video CallingVideo calling (VC) aims to create multi-modal, collaborative environments that are "just like being there." However, we found that autistic individuals, who exhibit atypical social and cognitive processing, may not share this goal. We interviewed autistic adults about their perceptions of VC compared to other computer-mediated communications (CMC) and face-to-face interactions. We developed a neurodiversity-sensitive model of CMC that describes how stressors such as sensory sensitivities, cognitive load, and anxiety, contribute to their preferences for CMC channels. We learned that they apply significant effort to construct coping strategies to support their sensory, cognitive, and social needs. These strategies include moderating their sensory inputs, creating mental models of conversation partners, and attempting to mask their autism by adopting neurotypical behaviors. Without effective strategies, interviewees experience more stress, have less capacity to interpret verbal and non-verbal cues, and feel less empowered to participate. Our findings reveal critical needs for autistic users. We suggest design opportunities to support their ability to comfortably use VC, and in doing so, point the way towards making VC more comfortable for all.2019AZAnnuska --- Zolyomi et al.Connecting and Reaching OutCSCW
Emotional Dialogue Generation Using Image-Grounded Language ModelsComputer-based conversational agents are becoming ubiquitous. However, for these systems to be engaging and valuable to the user, they must be able to express emotion, in addition to providing informative responses. Humans rely on much more than language during conversations; visual information is key to providing context. We present the first example of an image-grounded conversational agent using visual sentiment, facial expression and scene features. We show that key qualities of the generated dialogue can be manipulated by the features used for training the agent. We evaluate our model on a large and very challenging real-world dataset of conversations from social media (Twitter). The image-grounding leads to significantly more informative, emotional and specific responses, and the exact qualities can be tuned depending on the image features used. Furthermore, our model improves the objective quality of dialogue responses when evaluated on standard natural language metrics.2018BHBernd Huber et al.Harvard UniversityIntelligent Voice Assistants (Alexa, Siri, etc.)Agent Personality & AnthropomorphismCHI