Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and ReasoningMultimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "let me know if my toddler is getting into mischief in the living room"), with the MLLM analyzing the camera feed and responding within just seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements to the model and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a 12-participant user study, users reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' own "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we describe insights into how unique characteristics of MLLMs–such as hallucinations and inconsistent responses–can impact the sensor-creation process. Together, these findings contribute to the design of future MLLM-powered sensing systems that are intuitive and customizable by everyday users.2025MLMichael Xieyang Liu et al.Eye Tracking & Gaze InteractionContext-Aware ComputingUbiquitous ComputingIUI
Beyond Code Generation: LLM-supported Exploration of the Program Design SpaceIn this work, we explore explicit Large Language Model (LLM)-powered support for the iterative design of computer programs. Program design, like other design activity, is characterized by navigating a space of alternative problem formulations and associated solutions in an iterative fashion. LLMs are potentially powerful tools in helping this exploration; however, by default, code-generation LLMs deliver code that represents a particular point solution. This obscures the larger space of possible alternatives, many of which might be preferable to the LLM’s default interpretation and its generated code. We contribute an IDE that supports program design through generating and showing new ways to frame problems alongside alternative solutions, tracking design decisions, and identifying implicit decisions made by either the programmer or the LLM. In a user study, we find that with our IDE, users combine and parallelize design phases to explore a broader design space---but also struggle to keep up with LLM-originated changes to code and other information overload. These findings suggest a core challenge for future IDEs that support program design through higher-level instructions given to LLM-based agents: carefully managing attention and deciding what information agents should surface to program designers and when.2025JZJ.D. Zamfirescu-Pereira et al.UC Berkeley, Computer ScienceGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationComputational Methods in HCICHI
PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ WorkflowsPrototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers’ current AI-prototyping workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototyping, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototyping AI applications.2024SPSavvas Petridis et al.Human-LLM CollaborationPrototyping & User TestingDIS
Farsight: Fostering Responsible AI Awareness During AI Application PrototypingPrompt-based interfaces for Large Language Models (LLMs) have made prototyping and building AI-powered applications easier than ever before. However, identifying potential harms that may arise from AI applications remains a challenge, particularly during prompt-based prototyping. To address this, we present Farsight, a novel in situ interactive tool that helps people identify potential harms from the AI applications they are prototyping. Based on a user's prompt, Farsight highlights news articles about relevant AI incidents and allows users to explore and edit LLM-generated use cases, stakeholders, and harms. We report design insights from a co-design study with 10 AI prototypers and findings from a user study with 42 AI prototypers. After using Farsight, AI prototypers in our user study are better able to independently identify potential harms associated with a prompt and find our tool more useful and usable than existing resources. Their qualitative feedback also highlights that Farsight encourages them to focus on end-users and think beyond immediate harms. We discuss these findings and reflect on their implications for designing AI prototyping experiences that meaningfully engage with AI harms. Farsight is publicly accessible at: https://pair-code.github.io/farsight.2024ZWZijie J. Wang et al.Georgia TechExplainable AI (XAI)AI-Assisted Decision-Making & AutomationAI Ethics, Fairness & AccountabilityCHI
ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into PrinciplesLarge language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot’s response; each mode of feedback automatically generates a principle that is inserted into the chatbot’s prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.2024SPSavvas Petridis et al.Intelligent Voice Assistants (Alexa, Siri, etc.)Human-LLM CollaborationExplainable AI (XAI)IUI
“The less I type, the better”: How AI Language Models can Enhance or Impede Communication for AAC UsersUsers of augmentative and alternative communication (AAC) devices sometimes find it difficult to communicate in real time with others due to the time it takes to compose messages. AI technologies such as large language models (LLMs) provide an opportunity to support AAC users by improving the quality and variety of text suggestions. However, these technologies may fundamentally change how users interact with AAC devices as users transition from typing their own phrases to prompting and selecting AI-generated phrases. We conducted a study in which 12 AAC users tested live suggestions from a language model across three usage scenarios: extending short replies, answering biographical questions, and requesting assistance. Our study participants believed that AI-generated phrases could save time, physical and cognitive effort when communicating, but felt it was important that these phrases reflect their own communication style and preferences. This work identifies opportunities and challenges for future AI-enhanced AAC devices.2023SVStephanie Valencia et al.Carnegie Mellon University, Google ResearchHuman-LLM CollaborationAugmentative & Alternative Communication (AAC)CHI
Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI ChallengesTechnology companies continue to invest in efforts to incorporate responsibility in their Artificial Intelligence (AI) advancements, while efforts to audit and regulate AI systems expand. This shift towards Responsible AI (RAI) in the tech industry necessitates new practices and adaptations to roles—undertaken by a variety of practitioners in more or less formal positions, many of whom focus on the user-centered aspects of AI. To better understand practices at the intersection of user experience (UX) and RAI, we conducted an interview study with industrial UX practitioners and RAI subject matter experts, both of whom are actively involved in addressing RAI concerns throughout the early design and development of new AI-based prototypes, demos, and products, at a large technology company. Many of the specifc practices and their associated challenges have yet to be surfaced in the literature, and distilling them offers a critical view into how practitioners’ roles are adapting to meet present-day RAI challenges. We present and discuss three emerging practices in which RAI is being enacted and reifed in UX practitioners’ everyday work. We conclude by arguing that the emerging practices, goals, and types of expertise that surfaced in our study point to an evolution in praxis, with associated challenges that suggest important areas for further research in HCI.2023QWQiaosi Wang et al.Georgia Institute of TechnologyHuman-LLM CollaborationExplainable AI (XAI)AI Ethics, Fairness & AccountabilityCHI
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model PromptsAlthough large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step. We first define a set of LLM primitive operations useful for Chain construction, then present an interactive system where users can modify these Chains, along with their intermediate results, in a modular way. In a 20-person user study, we found that Chaining not only improved the quality of task outcomes, but also significantly enhanced system transparency, controllability, and sense of collaboration. Additionally, we saw that users developed new ways of interacting with LLMs through Chains: they leveraged sub-tasks to calibrate model expectations, compared and contrasted alternative strategies by observing parallel downstream effects, and debugged unexpected model outputs by “unit-testing” sub-components of a Chain. In two case studies, we further explore how LLM Chains may be used in future applications2022TWTongshuang Wu et al.University of Washington, University of WashingtonHuman-LLM CollaborationAlgorithmic Transparency & AuditabilityCHI
Discovering the Syntax and Strategies of Natural Language Programming with Generative Language ModelsIn this paper, we present a natural language code synthesis tool, GenLine, backed by 1) a large generative language model and 2) a set of task-specific prompts that create or change code. To understand the user experience of natural language code synthesis with these new types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s ``syntax,'' despite their input being natural language. Participants also struggled to form an accurate mental model of the types of requests the model can reliably translate and developed a set of strategies to debug model input. From these findings, we discuss design implications for future natural language code synthesis tools built using large generative language models.2022EJEllen Jiang et al.GoogleGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
Breakdowns and Breakthroughs: Observing Musicians' Responses to the COVID-19 PandemicThis paper describes results from a study examining how musicians have been affected by the restrictive environments imposed by the COVID-19 pandemic, and the corresponding technological implications. Through a survey of 29 musicians and additional interviews with 7 professional improvisation musicians, we observe the challenges musicians face in 1) finding the technological infrastructure and shared spaces necessary for dynamic, remote creative collaborations, 2) producing the mindset, free time, and sources of inspiration necessary for initiating creativity during times of stress, and 3) maintaining (and potentially expanding) the social connections and creative community vital to creative practice. We also report on how some musicians have creatively leveraged the radical changes in their lives to drive new artistic practices and styles of music. Collectively, these data illustrate the resilience of certain existing remote collaboration and creative tools, while suggesting ways other digital tools could more flexibly accommodate new use cases. The data also suggest the value of tools and services that help people creatively cope with radical changes to their lives and livelihoods.2021CCCarrie J Cai et al.GoogleConversational ChatbotsCreative Collaboration & Feedback SystemsCHI
AI as Social Glue: Uncovering the Roles of Deep Generative AI during Social Music CompositionRecent advances in deep generative neural networks have made it possible for artificial intelligence to actively collaborate with human beings in co-creating novel content (e.g. music, art). While substantial research focuses on (individual) human-AI collaborations, comparatively less research examines how AI can play a role in human-human collaborations during co-creation. In a qualitative lab study, we observed 30 participants (15 pairs) compose a musical phrase in pairs, both with and without AI. Our findings reveal that AI may play important roles in influencing human social dynamics during creativity, including: 1) implicitly seeding a common ground at the start of collaboration, 2) acting as a psychological safety net in creative risk-taking, 3) providing a force for group progress, 4) mitigating interpersonal stalling and friction, and 5) altering users' collaborative and creative roles. This work contributes to the future of generative AI in social creativity by providing implications for how AI could enrich, impede, or alter creative social dynamics in the years to come.2021MSMinhyang (Mia) Suh et al.Google , GoogleGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsCHI
Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative ModelsWhile generative deep neural networks (DNNs) have demonstrated their capacity for creating novel musical compositions, less attention has been paid to the challenges and potential of co-creating with these musical AIs, especially for novices. In a needfinding study with a widely used, interactive musical AI, we found that the AI can overwhelm users with the amount of musical content it generates, and frustrate them with its non-deterministic output. To better match co-creation needs, we developed AI-steering tools, consisting of Voice Lanes that restrict content generation to particular voices; Example-Based Sliders to control the similarity of generated content to an existing example; Semantic Sliders to nudge music generation in high-level directions (happy/sad, conventional/surprising); and Multiple Alternatives of generated content to audition and choose from. In a summative study (N=21), we discovered the tools not only increased users' trust, control, comprehension, and sense of collaboration with the AI, but also contributed to a greater sense of self-efficacy and ownership of the composition relative to the AI.2020RLRyan Louie et al.Northwestern UniversityGenerative AI (Text, Image, Music, Video)AI-Assisted Creative WritingMusic Composition & Sound Design ToolsCHI
Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-MakingMachine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.2019CCCarrie J. Cai et al.Google BrainGenerative AI (Text, Image, Music, Video)Explainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
``Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-MakingAlthough rapid advances in machine learning have made it increasingly applicable to expert decision-making, the delivery of accurate algorithmic predictions alone is insufficient for effective human--AI collaboration. In this work, we investigate the key types of information medical experts desire when they are first introduced to a diagnostic AI assistant. In a qualitative lab study, we interviewed 21 pathologists before, during, and after being presented deep neural network (DNN) predictions for prostate cancer diagnosis, to learn the types of information that they desired about the AI assistant. Our findings reveal that, far beyond understanding the local, case-specific reasoning behind any model decision, clinicians desired upfront information about basic, global properties of the model, such as its known strengths and limitations, its subjective point-of-view, and its overall design objective---what it's designed to be optimized for. Participants compared these information needs to the collaborative mental models they develop of their medical colleagues when seeking a second opinion: the medical perspectives and standards that those colleagues embody, and the compatibility of those perspectives with their own diagnostic patterns. These findings broaden and enrich discussions surrounding AI transparency for collaborative decision-making, providing a richer understanding of what experts find important in their introduction to AI assistants before integrating them into routine practice.2019CCCarrie J Cai et al.AICSCW