Proactive Conversational Agents with Inner ThoughtsOne of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e. being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations.We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.2025XLXingyu "Bruce" Liu et al.UCLA, HCI ResearchConversational ChatbotsAgent Personality & AnthropomorphismHuman-LLM CollaborationCHI
Human I/O: Towards a Unified Approach to Detecting Situational ImpairmentsSituationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.2024XLXingyu Bruce Liu et al.UCLAUser Research Methods (Interviews, Surveys, Observation)Field StudiesCHI
Augmenting Pathologists with NaviPath: Design and Evaluation of a Human-AI Collaborative Navigation SystemArtificial Intelligence (AI) brings advancements to support pathologists in navigating high-resolution tumor images to search for pathology patterns of interest. However, existing AI-assisted tools have not realized this promised potential due to a lack of insight into pathology and HCI considerations for pathologists' navigation workflows in practice. We first conducted a formative study with six medical professionals in pathology to capture their navigation strategies. By incorporating our observations along with the pathologists' domain knowledge, we designed NaviPath -- a human-AI collaborative navigation system. An evaluation study with 15 medical professionals in pathology indicated that: (i) compared to the manual navigation, participants saw more than twice the number of pathological patterns in unit time with NaviPath, and (ii) participants achieved higher precision and recall against the AI and the manual navigation on average. Further qualitative analysis revealed that navigation was more consistent with NaviPath, which can improve the overall examination quality.2023HGHongyan Gu et al.UCLAExplainable AI (XAI)AI-Assisted Decision-Making & AutomationMedical & Scientific Data VisualizationCHI
AVscript: Accessible Video Editing with Audio-Visual ScriptsSighted and blind and low vision (BLV) creators alike use videos to communicate with broad audiences. Yet, video editing remains inaccessible to BLV creators. Our formative study revealed that current video editing tools make it difficult to access the visual content, assess the visual quality, and efficiently navigate the timeline. We present AVscript, an accessible text-based video editor. AVscript enables users to edit their video using a script that embeds the video's visual content, visual errors (e.g., dark or blurred footage), and speech. Users can also efficiently navigate between scenes and visual errors or locate objects in the frame or spoken words of interest. A comparison study (N=12) showed that AVscript significantly lowered BLV creators' mental demands while increasing confidence and independence in video editing. We further demonstrate the potential of AVscript through an exploratory study (N=3) where BLV creators edited their own footage.2023MHMina Huh et al.University of Texas, AustinVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Accessible GamingVideo Production & EditingCHI
Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsVideo conferencing solutions like Zoom, Google Meet, and Microsoft Teams are becoming increasingly popular for facilitating conversations, and recent advancements such as live captioning help people better understand each other. We believe that the addition of visuals based on the context of conversations could further improve comprehension of complex or unfamiliar concepts. To explore the potential of such capabilities, we conducted a formative study through remote interviews (N=10) and crowdsourced a dataset of over 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that integrates with a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We present the findings from a lab study (N=26) and an in-the-wild case study (N=10), demonstrating how Visual Captions can help improve communication through visual augmentation in various scenarios.2023XLXingyu "Bruce" Liu et al.UCLAVoice User Interface (VUI) DesignDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)CHI
GANravel: User-Driven Direction Disentanglement in Generative Adversarial NetworksGenerative adversarial networks (GANs) have many application areas including image editing, domain translation, missing data imputation, and support for creative work. However, GANs are considered `black boxes'. Specifically, the end-users have little control over how to improve editing directions through disentanglement. Prior work focused on new GAN architectures to disentangle editing directions. Alternatively, we propose GANravel --a user-driven direction disentanglement tool that complements the existing GAN architectures and allows users to improve editing directions iteratively. In two user studies with 16 participants each, GANravel users were able to disentangle directions and outperformed the state-of-the-art direction discovery baselines in disentanglement performance. In the second user study, GANravel was used in a creative task of creating dog memes and was able to create high-quality edited images and GIFs.2023NENoyan Evirgen et al.HCI GroupGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsCHI
EmoGlass: an End-to-End AI-Enabled Wearable Platform for Enhancing Self-Awareness of Emotional HealthOften, emotional disorders are overlooked due to their lack of awareness, resulting in potential mental issues. Recent advances in sensing and inference technology provide a viable path to wearable facial-expression-based emotion recognition. However, most prior work has explored only laboratory settings and few platforms are geared towards end-users in everyday lives or provide personalized emotional suggestions to promote self-regulation. We present EmoGlass, an end-to-end wearable platform that consists of emotion detection glasses and an accompanying mobile application. Our single-camera-mounted glasses can detect seven facial expressions based on partial face images. We conducted a three-day out-of-lab study (N=15) to evaluate the performance of EmoGlass. We iterated on the design of the EmoGlass application for effective self-monitoring and awareness of users' daily emotional states. We report quantitative and qualitative findings, based on which we discuss design recommendations for future work on sensing and enhancing awareness of emotional health.2022ZYZihan Yan et al.Zhejiang University, UCLASleep & Stress MonitoringBiosensors & Physiological MonitoringCHI
Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision UsersOnline shopping has become a valuable modern convenience, but blind or low vision (BLV) users still face significant challenges using it, because of: 1) inadequate image descriptions and 2) the inability to filter large amounts of information using screen readers. To address those challenges, we propose Revamp, a system that leverages customer reviews for interactive information retrieval. Revamp is a browser integration that supports review-based question-answering interactions on a reconstructed product page. From our interview, we identified four main aspects (color, logo, shape, and size) that are vital for BLV users to understand the visual appearance of a product. Based on the findings, we formulated syntactic rules to extract review snippets, which were used to generate image descriptions and responses to users’ queries. Evaluations with eight BLV users showed that Revamp 1) provided useful descriptive information for understanding product appearance and 2) helped the participants locate key information efficiently.2021RWRuolin Wang et al.UCLAVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Universal & Inclusive DesignCHI
What Makes Videos Accessible to Blind and Visually Impaired People?Videos on sites like YouTube have become a primary source for information online. User-generated videos almost universally lack audio descriptions, making most videos inaccessible to blind and visually impaired (BVI) consumers. Our formative studies with BVI people revealed that they used a time-consuming trial-and-error approach when searching for videos: clicking on a video, watching a portion, leaving the video, and repeating the process to find videos that would be accessible — or understandable without additional description of the visual content. BVI people also reported video accessibility heuristics that characterize accessible and inaccessible videos. We instantiate 7 of the identified heuristics (2 audio-related, 2 video-related, and 3 audio-visual) as automated metrics to assess video accessibility. Our automated video accessibility metrics correlate with BVI people’s perception of video accessibility (Adjusted R-squared = 0.642). We augment a video search interface with our video accessibility metrics and find that our system improves BVI peoples’ efficiency in finding accessible videos. With accessibility metrics, participants found videos 40% faster and clicked 54% less videos in our user study. By integrating video accessibility metrics, video hosting platforms could help people surface accessible videos and encourage content creators to author more accessible products, improving video accessibility for all.2021XLXingyu Liu et al.Carnegie Mellon UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging AnalysisThe recent development of data-driven AI promises to automate medical diagnosis; however, most AI functions as 'black boxes' to physicians with limited computational knowledge. Using medical imaging as a point of departure, we conducted three iterations of design activities to formulate CheXplain a system that enables physicians to explore and understand AI-enabled chest X-ray analysis: (i) a paired survey between referring physicians and radiologists reveals whether, when, and what kinds of explanations are needed; (ii) a low-fidelity prototype co-designed with three physicians formulates eight key features; and (iii) a high-fidelity prototype evaluated by another six physicians provides detailed summative insights on how each feature enables the exploration and understanding of AI. We summarize by discussing recommendations for future work to design and implement explainable medical AI systems that encompass four recurring themes: motivation, constraint, explanation, and justification.2020YXYao Xie et al.University of California, Los AngelesExplainable AI (XAI)Medical & Scientific Data VisualizationTelemedicine & Remote Patient MonitoringCHI
OralCam: Enabling Self-Examination and Awareness of Oral Health Using a Smartphone CameraDue to a lack of medical resources or oral health awareness, oral diseases are often left unexamined and untreated, affecting a large population worldwide. With the advent of low-cost, sensor-equipped smartphones, mobile apps offer a promising possibility for promoting oral health. However, to the best of our knowledge, no mobile health (mHealth) solutions can directly support a user to self-examine their oral health condition. This paper presents OralCam, the first interactive app that enables end-users' self-examination of five common oral conditions (diseases or early disease signals) by taking smartphone photos of one's oral cavity. OralCam allows a user to annotate additional information (e.g. living habits, pain, and bleeding) to augment the input image, and presents the output hierarchically, probabilistically and with visual explanations to help a laymen user understand examination results. Developed on our in-house dataset that consists of 3,182 oral photos annotated by dental experts, our deep learning based framework achieved an average detection sensitivity of 0.787 over five conditions with high localization accuracy. In a week-long in-the-wild user study (N=18), most participants had no trouble using OralCam and interpreting the examination results. Two expert interviews further validate the feasibility of OralCam for promoting users' awareness of oral health.2020YLYuan Liang et al.University of California, Los AngelesMental Health Apps & Online Support CommunitiesTelemedicine & Remote Patient MonitoringCHI