CoLyricist: Enhancing Lyric Writing with AI through Workflow-Aligned SupportWe propose CoLyricist, an AI-assisted lyric writing tool designed to support the typical workflows of experienced lyricists and enhance their creative efficiency. While lyricists have unique processes, many follow common stages. Tools that fail to accommodate these stages challenge integration into creative practices. Existing research and tools lack sufficient understanding of these songwriting stages and their associated challenges, resulting in ineffective designs. Through a formative study involving semi-structured interviews with 10 experienced lyricists, we identified four key stages: Theme Setting, Ideation, Drafting Lyrics, and Melody Fitting. CoLyricist addresses these needs by incorporating tailored AI-driven support for each stage, optimizing the lyric writing process to be more seamless and efficient. To examine whether this workflow-aligned design also benefits those without prior experience, we conducted a user study with 16 participants, including both experienced and novice lyricists. Results showed that CoLyricist enhances the songwriting experience across skill levels. Novice users especially appreciated the Melody-Fitting feature, while experienced users valued the Ideation support.2026MYMasahiro Yoshida et al.University of California, Los AngelesGenerative AI (Text, Image, Music, Video)AI-Assisted Creative WritingCreative Collaboration & Feedback SystemsIUI
Behavioral Indicators of Overreliance During Interaction with Conversational Language ModelsLLMs are now embedded in a wide range of everyday scenarios. However, their inherent hallucinations risk hiding misinformation in fluent responses, raising concerns about overreliance on AI. Detecting overreliance is challenging, as it often arises in complex, dynamic contexts and cannot be easily captured by post-hoc task outcomes. In this work, we aim to investigate how users' behavioral patterns correlate with overreliance. We collected interaction logs from 77 participants working with an LLM injected plausible misinformation across three real-world tasks and we assessed overreliance by whether participants detected and corrected these errors. By semantically encoding and clustering segments of user interactions, we identified five behavioral patterns linked to overreliance: users with low overreliance show careful task comprehension and fine-grained navigation; users with high overreliance show frequent copy-paste, skipping initial comprehension, repeated LLM references, coarse locating, and accepting misinformation despite hesitation. We discuss design implications for mitigation.2026CLChang Liu et al.Tsinghua UniversityHuman-LLM CollaborationExplainable AI (XAI)AI Ethics, Fairness & AccountabilityCHI
CoSight: Exploring Viewer Contributions to Online Video Accessibility Through Descriptive CommentingThe rapid growth of online video content has outpaced efforts to make visual information accessible to blind and low vision (BLV) audiences. While professional Audio Description (AD) remains the gold standard, it is costly and difficult to scale across the vast volume of online media. In this work, we explore a complementary approach to broaden participation in video accessibility: engaging everyday video viewers at their watching and commenting time. We introduce CoSight, a Chrome extension that augments YouTube with lightweight, in-situ nudges to support descriptive commenting. Drawing from Fogg’s Behavior Model, CoSight provides visual indicators of accessibility gaps, pop-up hints for what to describe, reminders to clarify vague comments, and related captions and comments as references. In an exploratory study with 48 sighted users, CoSight helped integrate accessibility contribution into natural viewing and commenting practices, resulting in 89% of comments including grounded visual descriptions. Follow-up interviews with four BLV viewers and four professional AD writers suggest that while such comments do not match the rigor of professional AD, they can offer complementary value by conveying visual context and emotional nuance for understanding the videos.2025RWRuolin Wang et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Augmentative & Alternative Communication (AAC)Universal & Inclusive DesignUIST
The GenUI Study: Exploring the Design of Generative UI Tools to Support UX Practitioners and BeyondAI can now generate high-fidelity UI mock-up screens from a high-level textual description, promising to support UX practitioners' work. However, it remains unclear how UX practitioners would adopt such Generative UI (GenUI) models in a way that is integral and beneficial to their work. To answer this question, we conducted a formative study with 37 UX-related professionals that consisted of four roles: UX designers, UX researchers, software engineers, and product managers. Using a state-of-the-art GenUI tool, each participant went through a week-long, individual mini-project exercise with role-specific tasks, keeping a daily journal of their usage and experiences with GenUI, followed by a semi-structured interview. We report findings on participants' workflow using the GenUI tool, how GenUI can support all and each specific roles, and existing gaps between GenUI and users' needs and expectations, which lead to design implications to inform future work on GenUI development.2025XCXiang 'Anthony' Chen et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationAI-Assisted Creative WritingDIS
Empowering Medical Data Labeling for Non-Experts with DANNY: Enhancing Accuracy and Mitigating Over-Reliance on AIEconomic constraints on recruiting experts hinder efforts to build qualified datasets for utilizing AI in professional domains (e.g., medical diagnosis), which could provide societal benefits. To solve this issue, previous studies introduced crowdsourcing and AI to enable non-experts to perform expert-level data labeling. Yet, they encountered three challenges: 1) the limited applicability of crowdsourcing in less specialized domains (e.g., identifying animal species); 2) the chicken-and-egg problem, a paradox where high-performance AI is required to build a dataset to train such AI; and 3) over-reliance on AI, where non-experts, lacking expertise, may incorrectly label data when guided by sub-optimal AI. To address this, we introduce DANNY (Data ANnotation for Non-experts made easY), an AI-based tool designed to help non-experts label an arthritis dataset, aiming to increase labeling accuracy and mitigate over-reliance on AI. By externalizing a cognitive forcing intervention to foster critical thinking, DANNY provides two visualizations: 1) the Criteria phase, where non-experts define criteria across four arthritis features, and 2) the Correction phase, where they refine these criteria by comparing them to AI suggestions. In a study with 28 participants, DANNY users achieved higher accuracy and a more appropriate reliance on AI dependency than control groups. A follow-up study with 12 participants demonstrates how DANNY can be used to improve AI with an ensemble method. Our findings contribute new insights into using AI to support non-experts in labeling domain-specific data when expert resources are limited.2025YJYoungseung Jeon et al.Explainable AI (XAI)Medical & Scientific Data VisualizationMental Health Apps & Online Support CommunitiesIUI
HEPHA: A Mixed-Initiative Image Labeling Tool for Specialized DomainsImage labeling is an important task for training computer vision models. In specialized domains, such as healthcare, it is expensive and challenging to recruit specialists for image labeling. We propose HEPHA, a mixed-initiative image labeling tool that elicits human expertise via inductive logic learning to infer and refine labeling rules. Each rule comprises visual predicates that describe the image. HEPHA enables users to iteratively refine the rules by either direct manipulation through a visual programming interface or by labeling more images. To facilitate rule refinement, HEPHA recommends which rule to edit and which predicate to update. For users unfamiliar with visual programming, HEPHA suggests diverse and informative images to users for further labeling. We conducted a within-subjects user study with 16 participants and compared HEPHA with a variant of HEPHA and a deep learning-based approach. We found that HEPHA outperforms the two baselines in both specialized-domain and general-domain image labeling tasks. Our code is available at https://github.com/Neural-Symbolic-Image-Labeling/NSILWeb.2025SZShiyuan Zhou et al.Explainable AI (XAI)Interactive Data VisualizationMedical & Scientific Data VisualizationIUI
Mentigo: An Intelligent Agent for Mentoring Students in the Creative Problem Solving ProcessCreative Problem-Solving (CPS) promotes creative and critical thinking while enhancing real-world problem-solving skills, making it essential for middle school education. However, providing personalized mentorship in CPS projects at scale is challenging due to resource constraints and diverse student needs. To address this, we developed Mentigo, an AI-driven mentor agent designed to guide middle school students through the CPS process. Using a dataset of real classroom interactions, we encoded CPS task stages, adaptive guidance strategies, and personalized feedback mechanisms to inform Mentigo`s dynamic mentoring framework powered by large language models (LLMs). A comparative experiment with 12 students and evaluations from five expert educators demonstrated improved student engagement, creativity, and task performance. Our findings highlight design implications for using LLM-based AI mentors to enhance CPS learning in educational environments.2025SZSiyu Zha et al.Tsinghua UniversityHuman-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsCHI
Proactive Conversational Agents with Inner ThoughtsOne of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e. being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations.We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.2025XLXingyu "Bruce" Liu et al.UCLA, HCI ResearchConversational ChatbotsAgent Personality & AnthropomorphismHuman-LLM CollaborationCHI
"Pinocchio had a Nose, You have a Network!": On Characterizing Fake News Spreaders on Arabic Social MediaThe detection and analysis of fake news and its origins has become a main task associated with the overall objective of social media regulation in recent years. The majority of work was dedicated towards detecting misinformation with some focus on analyzing the flow of fake news over social networks. However, there is less attention on understanding the characteristics of social media users who consume this fake news. In this work, we investigate the possibility of predicting users’ reactions towards fake news and defining some network characteristics for each users' group. We utilized a set of fact-checking websites in the Arab world that report social media posts spreading fake news and the interactions with them. We defined three sets of users: 1) Spreaders, who spread fake news, 2) Checkers, who constantly share fact-checked news, and 3) Refuters, who respond to fake-news posts declaring their inaccuracy. We build a classifier that uses users’ network graph to predict their reactions with an accuracy exceeding 93\%. We applied further analysis for the most effective features of each users group and noticed that spreaders interact with more accounts that use their mother tongue, a considerable number of famous state-sponsored accounts, and accounts that get suspended while checkers and refuters interact with more foreign accounts and news-reporting entities. Central nodes in the networks of spreaders were found to be linked with state-sponsored media, whereas central nodes in the networks of checkers included organizations with a cross-cultural nature.2024MFMahmoud Fawzi et al.Session 2e: Echo Chambers and Fake News in FocusCSCW
Parent and Educator Concerns on the Pedagogical Use of AI-Equipped Social RobotsPerella-Holfeld 等人研究家长和教育者对课堂中 AI 社交机器人应用的担忧,探讨其对教学实践和儿童发展的潜在影响。2024FPFrancisco Perella-Holfeld et al.Mental Health Apps & Online Support CommunitiesSocial Robot InteractionInclusive DesignUbiComp
Human I/O: Towards a Unified Approach to Detecting Situational ImpairmentsSituationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.2024XLXingyu Bruce Liu et al.UCLAUser Research Methods (Interviews, Surveys, Observation)Field StudiesCHI
From Text to Pixels: Enhancing User Understanding through Text-to-Image Model ExplanationsRecent progress in Text-to-Image (T2I) models promises transformative applications in art, design, education, medicine, and entertainment. These models, exemplified by Dall-e, Imagen, and Stable Diffusion, have the potential to revolutionize various industries. However, a primary concern is their operation as a 'black-box' for many users. Without understanding the underlying mechanics, users are unable to harness the full potential of these models. This study focuses on bridging this gap by developing and evaluating explanation techniques for T2I models, targeting inexperienced end users. While prior works have delved into Explainable AI (XAI) methods for classification or regression tasks, T2I generation poses distinct challenges. Through formative studies with experts, we identified unique explanation goals and subsequently designed tailored explanation strategies. We then empirically evaluated these methods with a cohort of 473 participants from Amazon Mechanical Turk (AMT) across three tasks. Our results highlight users' ability to learn new keywords through explanations, a preference for example-based explanations, and challenges in comprehending explanations that significantly shift the image's theme. Moreover, findings suggest users benefit from a limited set of concurrent explanations. Our main contributions include a curated dataset for evaluating T2I explainability techniques, insights from a comprehensive AMT user study, and observations critical for future T2I model explainability research.2024NENoyan Evirgen et al.Generative AI (Text, Image, Music, Video)Explainable AI (XAI)IUI
Fingerprinting IoT Devices Using Latent Physical Side-ChannelsThe proliferation of low-end low-power internet-of-things (IoT) devices in "smart" environments necessitates secure identification and authentication of these devices via low-overhead fingerprinting methods. Previous work typically utilizes characteristics of the device's wireless modulation (WiFi, BLE, etc.) in the spectrum, or more recently, electromagnetic emanations from the device's DRAM to perform fingerprinting. The problem is that many devices, especially low-end IoT/embedded systems, may not have transmitter modules, DRAM, or other complex components, therefore making fingerprinting infeasible or challenging. To address this concern, we utilize electromagnetic emanations derived from the processor's clock to fingerprint. We present Digitus, an emanations-based fingerprinting system that can authenticate IoT devices at range. The advantage of Digitus is that we can authenticate low-power IoT devices using features intrinsic to their normal operation without the need for additional transmitters and/or other complex components such as DRAM. Our experiments demonstrate that we achieve ≥ 95% accuracy on average, applicability in a wide range of IoT scenarios (range ≥ 5m, non-line-of-sight, etc.), as well as support for IoT applications such as finding hidden devices. Digitus represents a low-overhead solution for the authentication of low-end IoT devices. https://dl.acm.org/doi/10.1145/35962472023JFJUSTIN FENG et al.Passwords & AuthenticationIoT Device PrivacyUbiComp
XCreation: A Graph-Based Crossmodal Generative Creativity Support ToolCreativity Support Tools (CSTs) aid in the efficient and effective composition of creative content, such as picture books. However, many existing CSTs only allow for mono-modal creation, whereas previous studies have become theoretically and technically mature to support multi-modal innovative creations. To overcome this limitation, we introduce XCreation, a novel CST that leverages generative AI to support cross-modal storybook creation. Nevertheless, directly deploying AI models to CSTs can still be problematic as they are mostly black-box architectures that are not comprehensible to human users. Therefore, we integrate an interpretable entity-relation graph to intuitively represent picture elements and their relations, improving the usability of the underlying generative structures. Our between-subject user study demonstrates that XCreation supports continuous plot creation with increased creativity, controllability, usability, and interpretability. XCreation is applicable to various scenarios, including interactive storytelling and picture book creation, thanks to its multimodal nature.2023ZYZihan Yan et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationExplainable AI (XAI)UIST
NaCanva: Exploring and Enabling the Nature-Inspired Creativity for ChildrenNature has been a bountiful source of materials, replenishment, inspiration, and creativity. Nature collage, as a crafting technique, offers children a fun and educational way to explore nature and express their creativity. However, the collection of raw material has been limited to static objects like leaves, ignoring inspiration from nature’s sounds and dynamic elements such as babbling creeks. To address this limitation, we have developed a mobile application with the aim of encouraging children’s creativity through renewed material collection and careful observation in nature. To explore the possibility of this approach, we conducted a formative study with children (N=20) and a design workshop with experts (N=6). With the results of these studies, we formulate NaCanva, an AI-assisted multi-modal collage creation system for children. Drawing upon the interactive relationship between children and nature, NaCanva facillitates a multi-modal material collection, including images, sound, and videos, which differs our system from traditional collages. We validated this system with a between-subject user study (N =30), and the results indicated that NaCanva enhances children’s multidimensional observation and engagement with nature, thereby unleashing their creativity in the creation of nature collages.2023ZYZihan Yan et al.Generative AI (Text, Image, Music, Video)Digital Art Installations & Interactive PerformanceFood Culture & Food InteractionMobileHCI
Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsVideo conferencing solutions like Zoom, Google Meet, and Microsoft Teams are becoming increasingly popular for facilitating conversations, and recent advancements such as live captioning help people better understand each other. We believe that the addition of visuals based on the context of conversations could further improve comprehension of complex or unfamiliar concepts. To explore the potential of such capabilities, we conducted a formative study through remote interviews (N=10) and crowdsourced a dataset of over 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that integrates with a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We present the findings from a lab study (N=26) and an in-the-wild case study (N=10), demonstrating how Visual Captions can help improve communication through visual augmentation in various scenarios.2023XLXingyu "Bruce" Liu et al.UCLAVoice User Interface (VUI) DesignDeaf & Hard-of-Hearing Support (Captions, Sign Language, Vibration)CHI
ForceSight: Non-Contact Force Sensing with Laser Speckle ImagingForce sensing has been a key enabling technology for a wide range of interfaces such as digitally enhanced body and world surfaces for touch interactions. Additionally, force often contains rich contextual information about user activities and can be used to enhance machine perception for improved user and environment awareness. To sense force, conventional approaches rely on contact sensors made of pressure-sensitive materials such as piezo films/discs or force-sensitive resistors. We present ForceSight, a non-contact force sensing approach using laser speckle imaging. Our key observation is that object surfaces deform in the presence of force. This deformation, though very minute, manifests as observable and discernible laser speckle shifts, which we leverage to sense the applied force. This non-contact force-sensing capability opens up new opportunities for rich interactions and can be used to power user-/environment-aware interfaces. We first built and verified the model of laser speckle shift with surface deformations. To investigate the feasibility of our approach, we conducted studies on metal, plastic, wood, along with a wide variety of materials. Additionally, we included supplementary tests to fully tease out the performance of our approach. Finally, we demonstrated the applicability of ForceSight with several demonstrative example applications.2022SPSiyou Pei et al.Force Feedback & Pseudo-Haptic WeightUIST
TypeOut: Leveraging Just-in-Time Self-Affirmation for Smartphone Overuse ReductionSmartphone overuse is related to a variety of issues such as lack of sleep and anxiety. We explore the application of Self-Affirmation Theory on smartphone overuse intervention in a just-in-time manner. We present \projectname{}, a just-in-time intervention technique that integrates two components: an in-situ typing-based unlock process to improve user engagement, and self-affirmation-based typing content to enhance effectiveness. We hypothesize that the integration of typing and self-affirmation content can better reduce smartphone overuse. We conducted a 10-week within-subject field experiment (N=54) and compared \projectname{} against two baselines: one only showing the self-affirmation content (a common notification-based intervention), and one only requiring typing non-semantic content (a state-of-the-art method). \projectname{} reduces app usage by over 50\%, and both app opening frequency and usage duration by over 25\%, all significantly outperforming baselines. \projectname{} can potentially be used in other domains where an intervention may benefit from integrating self-affirmation exercises with an engaging just-in-time mechanism.2022XXXuhai Xu et al.University of WashingtonMental Health Apps & Online Support CommunitiesNotification & Interruption ManagementCHI
Mobiot: Augmenting Everyday Objects into Moving IoT Devices Using 3D Printed Attachments Generated by DemonstrationRecent advancements in personal fabrication have brought novices closer to a reality, where they can automate routine tasks with mobilized everyday objects. However, the overall process remains challenging- from capturing design requirements and motion planning to authoring them to creating 3D models of mechanical parts to programming electronics, as it demands expertise. We introduce Mobiot, an end-user toolkit to help non-experts capture the design and motion requirements of legacy objects by demonstration. It then automatically generates 3D printable attachments, programs to operate assembled modules, a list of off-the-shelf electronics, and assembly tutorials. The authoring feature further assists users to fine-tune as well as to reuse existing motion libraries and 3D printed mechanisms to adapt to other real-world objects with different motions. We validate Mobiot through application examples with 8 everyday objects with various motions applied, and through technical evaluation to measure the accuracy of motion reconstruction.2022AAJiahao Li et al.Texas A&M UniversityDesktop 3D Printing & Personal FabricationCircuit Making & Hardware PrototypingCustomizable & Personalized ObjectsCHI
Lessons Learned from Designing an AI-Enabled Diagnosis Tool for PathologistsDespite the promises of data-driven artificial intelligence (AI), little is known about how we can bridge the gulf between traditional physician-driven diagnosis and a plausible future of medicine automated by AI. Specifically, how can we involve AI usefully in physicians’ diagnosis workflow given that most AI is still nascent and error-prone (e.g., in digital pathology)? To explore this question, we first propose a series of collaborative techniques to engage human pathologists with AI given AI’s capabilities and limitations, based on which we prototype Impetus—a tool where an AI takes various degrees of initiatives to provide various forms of assistance to a pathologist in detecting tumors from histological slides. We summarize observations and lessons learned from a study with eight pathologists and discuss recommendations for future work on human-centered medical AI systems.2021HGHongyan Gu et al.Human-AI CollaborationCSCW