Interactive Debugging and Steering of Multi-Agent AI SystemsFully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.2025WEWill Epperson et al.Carnegie Mellon University, Human-Computer Interaction InstituteHuman-LLM CollaborationExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted ProgrammingCode-recommendation systems, such as Copilot and CodeWhisperer, have the potential to improve programmer productivity by suggesting and auto-completing code. However, to fully realize their potential, we must understand how programmers interact with these systems and identify ways to improve that interaction. To seek insights about human-AI collaboration with code recommendations systems, we studied GitHub Copilot, a code-recommendation system used by millions of programmers daily. We developed CUPS, a taxonomy of common programmer activities when interacting with Copilot. Our study of 21 programmers, who completed coding tasks and retrospectively labeled their sessions with CUPS, showed that CUPS can help us understand how programmers interact with code-recommendation systems, revealing inefficiencies and time costs. Our insights reveal how programmers interact with Copilot and motivate new interface designs and metrics.2024HMHussein Mozannar et al.MITHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationRecommender System UXCHI
Understanding Questions that Arise When Working with Business DocumentsWhile digital assistants are increasingly used to help with various productivity tasks, less attention has been paid to employing them in the domain of business documents. To build an agent that can handle users' information needs in this domain, we must first understand the types of assistance that users desire when working on their documents. In this work, we present results from two user studies that characterize the information needs and queries of authors, reviewers, and readers of business documents. In the first study, we used experience sampling to collect users’ questions in-situ as they were working with their documents, and in the second, we built a human-in-the-loop document Q&A system which rendered assistance with a variety of users' questions. Our results have implications for the design of document assistants that complement AI with human intelligence including whether particular skillsets or roles within the document are needed from human respondents, as well as the challenges around such systems.2022FJFarnaz Jahanbakhsh et al.Business Processes & Worker Advocacy; Business Processes & Worker AdvocacyCSCW
Exploring Interactive Sound Design for Auditory WebsitesAuditory interfaces increasingly support access to website content, through recent advances in voice interaction. Typically, however, these interfaces provide only limited audio styling, collapsing rich visual design into a static audio output style with a single synthesized voice. To explore the potential for more aesthetic and intuitive sound design for websites, we prompted 14 professional sound designers to create auditory website mockups and interviewed them about their designs and rationale. Our findings reveal their prioritized design considerations (aesthetics and emotion, user engagement, audio clarity, information dynamics, and interactivity), specific sound design ideas to support each consideration (e.g., replacing spoken labels with short, memorable audio expressions), and challenges with applying sound design practices to auditory websites. These findings provide promising direction for how to support designers in creating richer auditory website experiences.2022LZMingrui Ray Zhang et al.University of WashingtonVoice User Interface (VUI) DesignMusic Composition & Sound Design ToolsCHI
Social Media through Voice: Synthesized Voice Qualities and Self-presentationWith advances in expressive speech synthesis and conversational understanding, an ever-increasing amount of digital content---including social and personal content---can be consumed through voice. Voice has long been known to convey personal characteristics and emotional states, both of which are prominent aspects of social media. Yet, no study has investigated voice design requirements for social media platforms. We interviewed 15 active social media users about their preferences on using synthesized voices to represent their profiles. Our findings show that participants want to have control over how a voice delivers their content, such as the personality and emotion with which the voice speaks, because these prosodic variations can impact users' online persona and interfere with impression management. We report motivations behind customizing or not customizing voice characteristics in different scenarios, and uncover key challenges around usability and the potential for stereotyping. We argue that synthesized speech for social media should be evaluated not only on listening experience and voice quality but also on its expressivity, degree of customizability, and ability to adapt to contexts (e.g., social media platforms, groups, individual posts). We discuss how our contribution confirms and extends knowledge of voice technology design and online self-presentation, and offer design considerations for voice personalization related to social interactions.2021LZMingrui Ray Zhang et al.Voice and SpeechCSCW
DIY: Helping People Assess the Correctness of Natural Language to SQL SystemsDesigning natural language interfaces for querying databases remains an important goal pursued by researchers in natural language processing, databases, and HCI. These systems receive natural language as input, translate it into a formal database query, and execute the query to compute a result. Because the responses from these systems are not always correct, it is important to provide people with mechanisms to assess the correctness of the generated query and computed result. However, this assessment can be challenging for people who lack expertise in query languages. We present Debug-It-Yourself (DIY), an interactive technique that enables users to assess the responses from a state-of-the-art NL2SQL system for correctness and, if possible, fix errors. DIY provides users with a sandbox where they can interact with (1) the mappings between the question and the generated query, (2) a small-but-relevant subset of the underlying database, and (3) a multi-modal explanation of the generated query by employing a back-of-the-envelope calculation, end-user debugging strategy on the system's responses. Through an exploratory study with 12 users, we investigate how DIY helps users assess the correctness of the system’s answers and detect & fix errors. Our observations reveal the benefits of DIY while providing insights about end-user debugging strategies and underscore opportunities for further improving the user experience.2021ANArpit Narechania et al.Human-LLM CollaborationExplainable AI (XAI)AI-Assisted Decision-Making & AutomationIUI
Planning for Natural Language Failures with the AI PlaybookPrototyping AI user experiences is challenging due in part to probabilistic AI models making it difficult to anticipate, test, and mitigate AI failures before deployment. In this work, we set out to support practitioners with early AI prototyping, with a focus on natural language (NL)-based technologies. Our interviews with 12 NL practitioners from a large technology company revealed that, in addition to challenges prototyping AI, prototyping was often not happening at all or focused only on idealized scenarios due to a lack of tools and tight timelines. These findings informed our design of the AI Playbook, an interactive and low-cost tool we developed to encourage proactive and systematic consideration of AI errors before deployment. Our evaluation of the AI Playbook demonstrates its potential to 1) encourage product teams to prioritize both ideal and failure scenarios, 2) standardize the articulation of AI failures from a user experience perspective, and 3) act as a boundary object between user experience designers, data scientists, and engineers.2021MHMatthew K. Hong et al.University of WashingtonHuman-LLM CollaborationExplainable AI (XAI)AI Ethics, Fairness & AccountabilityCHI
"Am I doing this all wrong?" Challenges and Opportunities in Family Information ManagementRunning a household requires a large amount of labor, from ensuring multiple bills are paid to organizing important documents. Failure to manage such information can have critical consequences for the financial and psychological well-being of the family; however, little is known about how families manage the full scale of information in the home. We introduce family information management (FIM) as a set of overarching practices involved in managing and coordinating household-related information. To understand how families engage in FIM, we conducted in-depth interviews with members of ten families, which included guided tours of their information archives. We found that families struggle to organize, store, retrieve, and share information, and that there are significant socioemotional costs to this work. We propose opportunities for designing technologies to support FIM and argue that, given the numerous challenges and unmet needs, the understudied area of FIM deserves further investment of research and design efforts.2020SSShruti Sannon et al.Family, Home, and Aging with TechnologyCSCW
The Impact of Web Browser Reader Views on Reading Speed and User ExperienceAs reading increasingly shifts from paper to online media, many web browsers now provide a "Reader View,'' which modifies web page layout and design for better readability. However, research has yet to establish whether Reader Views are effective in improving readability and how they might change the user experience. We characterize how Mozilla Firefox's Reader View significantly reduces the visual complexity of websites by excluding menus, images, and content. We then conducted an online study with 391 participants (including 42 who self-reported having been diagnosed with dyslexia), showing that compared to standard websites the Reader View increased reading speed by 5% for readers on average, and significantly improved perceived readability and visual appeal. We suggest guidelines for the design of websites and browsers that better support people with varying reading skills.2019QLQisheng Li et al.University of WashingtonUniversal & Inclusive DesignCHI
Guidelines for Human-AI InteractionAdvances in artificial intelligence (AI) frame opportunities and challenges for user interface design. Principles for human-AI interaction have been discussed in the human-computer interaction community for over two decades, but more study and innovation are needed in light of advances in AI and the growing uses of AI technologies in human-facing applications. We propose 18 generally applicable design guidelines for human-AI interaction. These guidelines are validated through multiple rounds of evaluation including a user study with 49 design practitioners who tested the guidelines against 20 popular AI-infused products. The results verify the relevance of the guidelines over a spectrum of interaction scenarios and reveal gaps in our knowledge, highlighting opportunities for further research. Based on the evaluations, we believe the set of design guidelines can serve as a resource to practitioners working on the design of applications and features that harness AI technologies, and to researchers interested in the further development of human-AI interaction design principles.2019SASaleema Amershi et al.MicrosoftVoice User Interface (VUI) DesignAI-Assisted Decision-Making & AutomationAlgorithmic Fairness & BiasCHI
Exploring the Role of Conversational Cues in Guided Task Support with Virtual AssistantsVoice-based conversational assistants are growing in popularity on ubiquitous mobile and stationary devices. Cortana, as well as Google Home, Amazon Echo, and others, can provide support for various tasks from managing reminders to booking a hotel. However, with few exceptions, user input is limited to explicit queries or commands. In this work, we explore the role of implicit conversational cues in guided task completion scenarios. In a Wizard of Oz study, we found that, for the task of cooking a recipe, nearly one-quarter of all user-assistant exchanges were initiated from implicit conversational cues rather than from plain questions. Given that these implicit cues occur in such high frequency, we conclude by presenting a set of design implications for the design of guided task experiences in contemporary conversational assistants.2018AVAlexandra Vtyurina et al.University of Waterloo, MicrosoftVoice User Interface (VUI) DesignAgent Personality & AnthropomorphismCHI
Understanding the Needs of Searchers with DyslexiaAs many as 20% of English speakers have dyslexia, a language disability that impacts reading and spelling. Web search is an important modern literacy skill, yet the accessibility of this language-centric endeavor to people with dyslexia is largely unexplored. We interviewed ten adults with dyslexia and conducted an online survey with 81 dyslexic and 80 non-dyslexic adults, in which participants described challenges they face in various stages of web search (query formulation, search result triage, and information extraction). We also report the findings of an online study in which 174 adults with dyslexia and 172 without dyslexia rated the readability and relevance of sets of search query results. Our findings demonstrate differences in behaviors and preferences between dyslexic and non-dyslexic searchers, and indicate that factoring readability into search engine rankings and/or interfaces may benefit both dyslexic and non-dyslexic users.2018MMMeredith Ringel Morris et al.Microsoft ResearchCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Universal & Inclusive DesignPrivacy by Design & User ControlCHI