Controlling AI Agent Participation in Group Conversations: A Human-Centered ApproachConversational AI agents are commonly applied within single-user, turn-taking scenarios. The interaction mechanics of these scenarios are trivial: when the user enters a message, the AI agent produces a response. However, the interaction dynamics are more complex within group settings. How should an agent behave in these settings? We report on two experiments aimed at uncovering users' experiences of an AI agent's participation within a group, in the context of group ideation (brainstorming). In the first study, participants benefited from and preferred having the AI agent in the group, but participants disliked when the agent seemed to dominate the conversation and they desired various controls over its interactive behaviors. In the second study, we created functional controls over the agent's behavior, operable by group members, to validate their utility and probe for additional requirements. Integrating our findings across both studies, we developed a taxonomy of controls for when, what, and where a conversational AI agent in a group should respond, who can control its behavior, and how those controls are specified and implemented. Our taxonomy is intended to aid AI creators to think through important considerations in the design of mixed-initiative conversational agents.2025SHStephanie Houde et al.Conversational ChatbotsAgent Personality & AnthropomorphismIUI
Design Principles for Generative AI ApplicationsGenerative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and safe use. We present six principles for the design of generative AI applications that address unique characteristics of generative AI UX and offer new interpretations and extensions of known issues in the design of AI applications. Each principle is coupled with a set of design strategies for implementing that principle via UX capabilities or through the design process. The principles and strategies were developed through an iterative process involving literature review, feedback from design practitioners, validation against real-world generative AI applications, and incorporation into the design process of two generative AI applications. We anticipate the principles to usefully inform the design of generative AI applications by driving actionable design recommendations.2024JWJustin D. Weisz et al.IBM Research AIGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationPrototyping & User TestingCHI
Investigating Explainability of Generative Models for Code through Scenario-based DesignWhat does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using a scenario-based design approach, we explore users' explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work begins to identify explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.2022JSJiao Sun et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationExplainable AI (XAI)IUI
Better Together? An Evaluation of AI-Supported Code TranslationGenerative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.2022JWJustin D. Weisz et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationIUI
Model LineUpper: Supporting Interactive Model Comparison at Multiple Levels for AutoMLAutomated Machine Learning (AutoML) is a rapidly growing set of technology to automate the model development pipeline, by automatically searching the model space and generating candidate models. A critical final step of AutoML is to have the users, often data scientists, selecting the final model from dozens of candidates. In current AutoML systems the selection is supported by performance metrics. Prior work has shown that in practices people make choice of ML model based on many criteria beyond prediction accuracy, including whether the way a model makes decision is reasonable or reliable. It is possible that AutoML users are interested in further understanding and comparing how these candidate models work. We also hypothesize the comparison may happen at various levels of granularity, from prediction distribution, feature importance to how the models judge selected instances. Based on these hypotheses, we developed Model LineUpper, supporting interactive model comparison for AutoML users by integrating multiple explainable AI (XAI) and visualization techniques. We conducted a user study with 14 data scientists, both to evaluate the design of Model LineUpper, and to use it as a design probe to understand how users perform model comparison with an AutoML system. We discuss the design implications for utilizing explainable AI techniques for model comparison, and supporting the unique user needs to compare candidate models generated by AutoML.2021SNShweta Narkar et al.Explainable AI (XAI)AutoML InterfacesInteractive Data VisualizationIUI
Perfection Not Required? Human-AI Partnerships in Code TranslationGenerative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, such as compilation or logical errors. We examine the extent to which software engineers would tolerate such imperfections and explore ways to aid the detection and correction of those errors. Using a design scenario approach, we interviewed 11 software engineers to understand their reactions to the use of an NMT model in the context of application modernization, focusing on the task of translating source code from one language to another. Our three-stage scenario sparked discussions about the utility and desirability of working with an imperfect AI system, how acceptance of that system's outputs would be established, and future opportunities for generative AI in application modernization. Our study highlights how UI features such as confidence highlighting and alternate translations help software engineers work with and better understand generative NMT models.2021JWJustin D. Weisz et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationAI-Assisted Decision-Making & AutomationIUI
Expanding Explainability: Towards Social Transparency in AI systemsAs AI-powered systems increasingly mediate consequential decision-making, their explainability is critical for end-users to take informed and accountable actions. Explanations in human-human interactions are socially-situated. AI systems are often socio-organizationally embedded. However, Explainable AI (XAI) approaches have been predominantly algorithm-centered. We take a developmental step towards socially-situated XAI by introducing and exploring Social Transparency (ST), a sociotechnically informed perspective that incorporates the socio-organizational context into explaining AI-mediated decision-making. To explore ST conceptually, we conducted interviews with 29 AI users and practitioners grounded in a speculative design scenario. We suggested constitutive design elements of ST and developed a conceptual framework to unpack ST’s effect and implications at the technical, decision-making, and organizational level. The framework showcases how ST can potentially calibrate trust in AI, improve decision-making, facilitate organizational collective actions, and cultivate holistic explainability. Our work contributes to the discourse of Human-Centered XAI by expanding the design space of XAI.2021UEUpol Ehsan et al.Georgia Institute of TechnologyExplainable AI (XAI)AI Ethics, Fairness & AccountabilityAlgorithmic Transparency & AuditabilityCHI
AutoDS: Towards Human-Centered Automation of Data ScienceData science (DS) projects often follow a \textit{lifecycle} that consists of laborious \textit{tasks} for data scientists and domain experts (e.g., data exploration, model training, etc.). Only till recently, machine learning(ML) researchers have developed promising automation techniques to aid data workers in these tasks. This paper introduces \textbf{AutoDS}, an automated machine learning (AutoML) system that aims to leverage the latest ML automation techniques to support data science projects. Data workers only need to upload their dataset, then the system can automatically suggest ML configurations, preprocess data, select algorithm, and train the model. These suggestions are presented to the user via a web-based graphical user interface and a notebook-based programming user interface. We studied AutoDS with 30 professional data scientists, where one group used AutoDS, and the other did not, to complete a data science project. As expected, AutoDS improves productivity; Yet surprisingly, we find that the models produced by the AutoDS group have \textbf{higher quality} and \textbf{less errors}, but \textbf{lower human confidence scores}. We reflect on the findings by presenting design implications for incorporating automation techniques into human work in the data science lifecycle.2021DWDakuo Wang et al.IBM ResearchHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationAutoML InterfacesCHI
BigBlueBot: Teaching Strategies for Successful Human-Agent InteractionsChatbots are becoming quite popular, with many brands developing conversational experiences using platforms such as IBM's Watson Assistant and Facebook Messenger. However, previous research reveals that users' expectations of what conversational agents can understand and do far outpace their actual technical capabilities. Our work seeks to bridge the gap between these expectations and reality by designing a fun learning experience with several goals: explaining how chatbots work by mapping utterances to a set of intents, teaching strategies for avoiding conversational breakdowns, and increasing desire to use chatbots by creating feelings of empathy toward them. Our experience, called BigBlueBot, consists of interactions with two chatbots in which breakdowns occur and the user (or chatbot) must recover using one or more repair strategies. In a Mechanical Turk evaluation (N=88), participants learned strategies for having successful human-agent interactions, reported feelings of empathy toward the chatbots, and expressed a desire to interact with chatbots in the future.2019JWJustin D. Weisz et al.Conversational ChatbotsAgent Personality & AnthropomorphismIUI
Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AIThe rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.2019DWDakuo Wang et al.AICSCW
Thinking Too Classically: Toward a Research Agenda for Human-Quantum Computer InteractionQuantum computing is a fundamentally different way of performing computation than classical computing. Many problems that are considered hard for classical computers may have efficient solutions using quantum computers. Recently, technology companies including IBM, Microsoft, and Google have invested in developing both quantum computing hardware and software to explore the potential of quantum computing. Because of the radical shift in computing paradigms that quantum represents, we see an opportunity to study the unique needs people have when interacting with quantum systems, what we call Quantum HCI (QHCI). Based on interviews with experts in quantum computing, we identify four areas in which HCI researchers can contribute to the field of quantum computing. These areas include understanding current and future quantum users, tools for programming and debugging quantum algorithms, visualizations of quantum states, and educational materials to train the first generation of "quantum native" programmers.2019ZAZahra Ashktorab et al.IBM Research AIComputational Methods in HCICHI
Resilient Chatbots: Repair Strategy Preferences for Conversational BreakdownsText-based conversational systems, also referred to as chatbots, have grown widely popular. Current natural language understanding technologies are not yet ready to tackle the complexities in conversational interactions. Breakdowns are common, leading to negative user experiences. Guided by communication theories, we explore user preferences for eight repair strategies, including ones that are common in commercially-deployed chatbots (e.g., confirmation, providing options), as well as novel strategies that explain characteristics of the underlying machine learning algorithms. We conducted a scenario-based study to compare repair strategies with Mechanical Turk workers (N=203). We found that providing options and explanations were generally favored, as they manifest initiative from the chatbot and are actionable to recover from breakdowns. Through detailed analysis of participants' responses, we provide a nuanced understanding on the strengths and weaknesses of each repair strategy.2019ZAZahra Ashktorab et al.IBM Research AIConversational ChatbotsCHI