CoLyricist: Enhancing Lyric Writing with AI through Workflow-Aligned SupportWe propose CoLyricist, an AI-assisted lyric writing tool designed to support the typical workflows of experienced lyricists and enhance their creative efficiency. While lyricists have unique processes, many follow common stages. Tools that fail to accommodate these stages challenge integration into creative practices. Existing research and tools lack sufficient understanding of these songwriting stages and their associated challenges, resulting in ineffective designs. Through a formative study involving semi-structured interviews with 10 experienced lyricists, we identified four key stages: Theme Setting, Ideation, Drafting Lyrics, and Melody Fitting. CoLyricist addresses these needs by incorporating tailored AI-driven support for each stage, optimizing the lyric writing process to be more seamless and efficient. To examine whether this workflow-aligned design also benefits those without prior experience, we conducted a user study with 16 participants, including both experienced and novice lyricists. Results showed that CoLyricist enhances the songwriting experience across skill levels. Novice users especially appreciated the Melody-Fitting feature, while experienced users valued the Ideation support.2026MYMasahiro Yoshida et al.University of California, Los AngelesGenerative AI (Text, Image, Music, Video)AI-Assisted Creative WritingCreative Collaboration & Feedback SystemsIUI
CoSight: Exploring Viewer Contributions to Online Video Accessibility Through Descriptive CommentingThe rapid growth of online video content has outpaced efforts to make visual information accessible to blind and low vision (BLV) audiences. While professional Audio Description (AD) remains the gold standard, it is costly and difficult to scale across the vast volume of online media. In this work, we explore a complementary approach to broaden participation in video accessibility: engaging everyday video viewers at their watching and commenting time. We introduce CoSight, a Chrome extension that augments YouTube with lightweight, in-situ nudges to support descriptive commenting. Drawing from Fogg’s Behavior Model, CoSight provides visual indicators of accessibility gaps, pop-up hints for what to describe, reminders to clarify vague comments, and related captions and comments as references. In an exploratory study with 48 sighted users, CoSight helped integrate accessibility contribution into natural viewing and commenting practices, resulting in 89% of comments including grounded visual descriptions. Follow-up interviews with four BLV viewers and four professional AD writers suggest that while such comments do not match the rigor of professional AD, they can offer complementary value by conveying visual context and emotional nuance for understanding the videos.2025RWRuolin Wang et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Augmentative & Alternative Communication (AAC)Universal & Inclusive DesignUIST
The GenUI Study: Exploring the Design of Generative UI Tools to Support UX Practitioners and BeyondAI can now generate high-fidelity UI mock-up screens from a high-level textual description, promising to support UX practitioners' work. However, it remains unclear how UX practitioners would adopt such Generative UI (GenUI) models in a way that is integral and beneficial to their work. To answer this question, we conducted a formative study with 37 UX-related professionals that consisted of four roles: UX designers, UX researchers, software engineers, and product managers. Using a state-of-the-art GenUI tool, each participant went through a week-long, individual mini-project exercise with role-specific tasks, keeping a daily journal of their usage and experiences with GenUI, followed by a semi-structured interview. We report findings on participants' workflow using the GenUI tool, how GenUI can support all and each specific roles, and existing gaps between GenUI and users' needs and expectations, which lead to design implications to inform future work on GenUI development.2025XCXiang 'Anthony' Chen et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationAI-Assisted Creative WritingDIS
Empowering Medical Data Labeling for Non-Experts with DANNY: Enhancing Accuracy and Mitigating Over-Reliance on AIEconomic constraints on recruiting experts hinder efforts to build qualified datasets for utilizing AI in professional domains (e.g., medical diagnosis), which could provide societal benefits. To solve this issue, previous studies introduced crowdsourcing and AI to enable non-experts to perform expert-level data labeling. Yet, they encountered three challenges: 1) the limited applicability of crowdsourcing in less specialized domains (e.g., identifying animal species); 2) the chicken-and-egg problem, a paradox where high-performance AI is required to build a dataset to train such AI; and 3) over-reliance on AI, where non-experts, lacking expertise, may incorrectly label data when guided by sub-optimal AI. To address this, we introduce DANNY (Data ANnotation for Non-experts made easY), an AI-based tool designed to help non-experts label an arthritis dataset, aiming to increase labeling accuracy and mitigate over-reliance on AI. By externalizing a cognitive forcing intervention to foster critical thinking, DANNY provides two visualizations: 1) the Criteria phase, where non-experts define criteria across four arthritis features, and 2) the Correction phase, where they refine these criteria by comparing them to AI suggestions. In a study with 28 participants, DANNY users achieved higher accuracy and a more appropriate reliance on AI dependency than control groups. A follow-up study with 12 participants demonstrates how DANNY can be used to improve AI with an ensemble method. Our findings contribute new insights into using AI to support non-experts in labeling domain-specific data when expert resources are limited.2025YJYoungseung Jeon et al.Explainable AI (XAI)Medical & Scientific Data VisualizationMental Health Apps & Online Support CommunitiesIUI
Proactive Conversational Agents with Inner ThoughtsOne of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e. being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations.We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.2025XLXingyu "Bruce" Liu et al.UCLA, HCI ResearchConversational ChatbotsAgent Personality & AnthropomorphismHuman-LLM CollaborationCHI
Human I/O: Towards a Unified Approach to Detecting Situational ImpairmentsSituationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.2024XLXingyu Bruce Liu et al.UCLAUser Research Methods (Interviews, Surveys, Observation)Field StudiesCHI
From Text to Pixels: Enhancing User Understanding through Text-to-Image Model ExplanationsRecent progress in Text-to-Image (T2I) models promises transformative applications in art, design, education, medicine, and entertainment. These models, exemplified by Dall-e, Imagen, and Stable Diffusion, have the potential to revolutionize various industries. However, a primary concern is their operation as a 'black-box' for many users. Without understanding the underlying mechanics, users are unable to harness the full potential of these models. This study focuses on bridging this gap by developing and evaluating explanation techniques for T2I models, targeting inexperienced end users. While prior works have delved into Explainable AI (XAI) methods for classification or regression tasks, T2I generation poses distinct challenges. Through formative studies with experts, we identified unique explanation goals and subsequently designed tailored explanation strategies. We then empirically evaluated these methods with a cohort of 473 participants from Amazon Mechanical Turk (AMT) across three tasks. Our results highlight users' ability to learn new keywords through explanations, a preference for example-based explanations, and challenges in comprehending explanations that significantly shift the image's theme. Moreover, findings suggest users benefit from a limited set of concurrent explanations. Our main contributions include a curated dataset for evaluating T2I explainability techniques, insights from a comprehensive AMT user study, and observations critical for future T2I model explainability research.2024NENoyan Evirgen et al.Generative AI (Text, Image, Music, Video)Explainable AI (XAI)IUI
XCreation: A Graph-Based Crossmodal Generative Creativity Support ToolCreativity Support Tools (CSTs) aid in the efficient and effective composition of creative content, such as picture books. However, many existing CSTs only allow for mono-modal creation, whereas previous studies have become theoretically and technically mature to support multi-modal innovative creations. To overcome this limitation, we introduce XCreation, a novel CST that leverages generative AI to support cross-modal storybook creation. Nevertheless, directly deploying AI models to CSTs can still be problematic as they are mostly black-box architectures that are not comprehensible to human users. Therefore, we integrate an interpretable entity-relation graph to intuitively represent picture elements and their relations, improving the usability of the underlying generative structures. Our between-subject user study demonstrates that XCreation supports continuous plot creation with increased creativity, controllability, usability, and interpretability. XCreation is applicable to various scenarios, including interactive storytelling and picture book creation, thanks to its multimodal nature.2023ZYZihan Yan et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationExplainable AI (XAI)UIST
NaCanva: Exploring and Enabling the Nature-Inspired Creativity for ChildrenNature has been a bountiful source of materials, replenishment, inspiration, and creativity. Nature collage, as a crafting technique, offers children a fun and educational way to explore nature and express their creativity. However, the collection of raw material has been limited to static objects like leaves, ignoring inspiration from nature’s sounds and dynamic elements such as babbling creeks. To address this limitation, we have developed a mobile application with the aim of encouraging children’s creativity through renewed material collection and careful observation in nature. To explore the possibility of this approach, we conducted a formative study with children (N=20) and a design workshop with experts (N=6). With the results of these studies, we formulate NaCanva, an AI-assisted multi-modal collage creation system for children. Drawing upon the interactive relationship between children and nature, NaCanva facillitates a multi-modal material collection, including images, sound, and videos, which differs our system from traditional collages. We validated this system with a between-subject user study (N =30), and the results indicated that NaCanva enhances children’s multidimensional observation and engagement with nature, thereby unleashing their creativity in the creation of nature collages.2023ZYZihan Yan et al.Generative AI (Text, Image, Music, Video)Digital Art Installations & Interactive PerformanceFood Culture & Food InteractionMobileHCI
Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsVideo conferencing solutions like Zoom, Google Meet, and Microsoft Teams are becoming increasingly popular for facilitating conversations, and recent advancements such as live captioning help people better understand each other. We believe that the addition of visuals based on the context of conversations could further improve comprehension of complex or unfamiliar concepts. To explore the potential of such capabilities, we conducted a formative study through remote interviews (N=10) and crowdsourced a dataset of over 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that integrates with a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We present the findings from a lab study (N=26) and an in-the-wild case study (N=10), demonstrating how Visual Captions can help improve communication through visual augmentation in various scenarios.2023XLXingyu "Bruce" Liu et al.UCLAVoice User Interface (VUI) DesignDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)CHI
Mobiot: Augmenting Everyday Objects into Moving IoT Devices Using 3D Printed Attachments Generated by DemonstrationRecent advancements in personal fabrication have brought novices closer to a reality, where they can automate routine tasks with mobilized everyday objects. However, the overall process remains challenging- from capturing design requirements and motion planning to authoring them to creating 3D models of mechanical parts to programming electronics, as it demands expertise. We introduce Mobiot, an end-user toolkit to help non-experts capture the design and motion requirements of legacy objects by demonstration. It then automatically generates 3D printable attachments, programs to operate assembled modules, a list of off-the-shelf electronics, and assembly tutorials. The authoring feature further assists users to fine-tune as well as to reuse existing motion libraries and 3D printed mechanisms to adapt to other real-world objects with different motions. We validate Mobiot through application examples with 8 everyday objects with various motions applied, and through technical evaluation to measure the accuracy of motion reconstruction.2022AAJiahao Li et al.Texas A&M UniversityDesktop 3D Printing & Personal FabricationCircuit Making & Hardware PrototypingCustomizable & Personalized ObjectsCHI
Lessons Learned from Designing an AI-Enabled Diagnosis Tool for PathologistsDespite the promises of data-driven artificial intelligence (AI), little is known about how we can bridge the gulf between traditional physician-driven diagnosis and a plausible future of medicine automated by AI. Specifically, how can we involve AI usefully in physicians’ diagnosis workflow given that most AI is still nascent and error-prone (e.g., in digital pathology)? To explore this question, we first propose a series of collaborative techniques to engage human pathologists with AI given AI’s capabilities and limitations, based on which we prototype Impetus—a tool where an AI takes various degrees of initiatives to provide various forms of assistance to a pathologist in detecting tumors from histological slides. We summarize observations and lessons learned from a study with eight pathologists and discuss recommendations for future work on human-centered medical AI systems.2021HGHongyan Gu et al.Human-AI CollaborationCSCW
OralViewer: 3D Demonstration of Dental Surgeries for Patient Education with Oral Cavity Reconstruction from a 2D Panoramic X-rayIn this paper, we present OralViewer---the first interactive application that enables dentist's demonstration of dental surgeries in 3D to promote patients' understanding. OralViewer takes a single 2D panoramic dental X-ray to reconstruct patient-specific 3D teeth structures, which are then assembled with registered gum and jaw bone models for complete oral cavity modeling. During the demonstration, OralViewer enables dentists to show surgery steps with virtual dental instruments that can animate effects on a 3D model in real-time. A technical evaluation shows our deep learning based model achieves a mean Intersection over Union (IoU) of 0.771 for 3D teeth reconstruction. A patient study with 12 participants shows OralViewer can improve patients' understanding of surgeries. An expert study with 3 board-certified dentists further verifies the clinical validity of our system.2021YLYuan Liang et al.VR Medical Training & RehabilitationMedical & Scientific Data VisualizationSurgical Assistance & Medical TrainingIUI
OralCam: Enabling Self-Examination and Awareness of Oral Health Using a Smartphone CameraDue to a lack of medical resources or oral health awareness, oral diseases are often left unexamined and untreated, affecting a large population worldwide. With the advent of low-cost, sensor-equipped smartphones, mobile apps offer a promising possibility for promoting oral health. However, to the best of our knowledge, no mobile health (mHealth) solutions can directly support a user to self-examine their oral health condition. This paper presents OralCam, the first interactive app that enables end-users' self-examination of five common oral conditions (diseases or early disease signals) by taking smartphone photos of one's oral cavity. OralCam allows a user to annotate additional information (e.g. living habits, pain, and bleeding) to augment the input image, and presents the output hierarchically, probabilistically and with visual explanations to help a laymen user understand examination results. Developed on our in-house dataset that consists of 3,182 oral photos annotated by dental experts, our deep learning based framework achieved an average detection sensitivity of 0.787 over five conditions with high localization accuracy. In a week-long in-the-wild user study (N=18), most participants had no trouble using OralCam and interpreting the examination results. Two expert interviews further validate the feasibility of OralCam for promoting users' awareness of oral health.2020YLYuan Liang et al.University of California, Los AngelesMental Health Apps & Online Support CommunitiesTelemedicine & Remote Patient MonitoringCHI