Human-AI Collaboration for UX Evaluation: Effects of Explanation and SynchronizationAnalyzing usability test videos is arduous. Although recent research showed the promise of AI in assisting with such tasks, it remains largely unknown how AI should be designed to facilitate effective collaboration between user experience (UX) evaluators and AI. Inspired by the concepts of agency and work context in human and AI collaboration literature, we studied two corresponding design factors for AI-assisted UX evaluation: explanations and synchronization. Explanations allow AI to further inform humans how it identifies UX problems from a usability test session; synchronization refers to the two ways humans and AI collaborate: synchronously and asynchronously. We iteratively designed a tool—AI Assistant—with four versions of UIs corresponding to the two levels of explanations (with/without) and synchronization (sync/async). By adopting a hybrid wizard-of-oz approach to simulating an AI with reasonable performance, we conducted a mixed-method study with 24 UX evaluators identifying UX problems from usability test videos using AI Assistant. Our quantitative and qualitative results show that AI with explanations, regardless of being presented synchronously or asynchronously, provided better support for UX evaluators’ analysis and was perceived more positively;when without explanations, synchronous AI better improved UX evaluators’ performance and engagement compared to the asynchronous AI. Lastly, we present the design implications for AI-assisted UX evaluation and facilitating more effective human-AI collaboration.2022MFMingming Fan et al.Human-AI collaboration; Human-AI collaborationCSCW
Investigating Explainability of Generative Models for Code through Scenario-based DesignWhat does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using a scenario-based design approach, we explore users' explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work begins to identify explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.2022JSJiao Sun et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationExplainable AI (XAI)IUI
Human-AI Collaboration via Conditional Delegation: A Case Study of Content ModerationDespite impressive performance in many benchmark datasets, AI models can still make mistakes, especially among out-of-distribution examples. It remains an open question how such imperfect models can be used effectively in collaboration with humans. Prior work has focused on AI assistance that helps people make individual high-stakes decisions, which is not scalable for a large amount of relatively low-stakes decisions, e.g., moderating social media comments. Instead, we propose conditional delegation as an alternative paradigm for human-AI collaboration where humans create rules to indicate trustworthy regions of a model. Using content moderation as a testbed, we develop novel interfaces to assist humans in creating conditional delegation rules and conduct a randomized experiment with two datasets to simulate in-distribution and out-of-distribution scenarios. Our study demonstrates the promise of conditional delegation in improving model performance and provides insights into design for this novel paradigm, including the effect of AI explanations.2022VLVivian Lai et al.University of Colorado BoulderAI-Assisted Decision-Making & AutomationContent Moderation & Platform GovernanceCHI
Model LineUpper: Supporting Interactive Model Comparison at Multiple Levels for AutoMLAutomated Machine Learning (AutoML) is a rapidly growing set of technology to automate the model development pipeline, by automatically searching the model space and generating candidate models. A critical final step of AutoML is to have the users, often data scientists, selecting the final model from dozens of candidates. In current AutoML systems the selection is supported by performance metrics. Prior work has shown that in practices people make choice of ML model based on many criteria beyond prediction accuracy, including whether the way a model makes decision is reasonable or reliable. It is possible that AutoML users are interested in further understanding and comparing how these candidate models work. We also hypothesize the comparison may happen at various levels of granularity, from prediction distribution, feature importance to how the models judge selected instances. Based on these hypotheses, we developed Model LineUpper, supporting interactive model comparison for AutoML users by integrating multiple explainable AI (XAI) and visualization techniques. We conducted a user study with 14 data scientists, both to evaluate the design of Model LineUpper, and to use it as a design probe to understand how users perform model comparison with an AutoML system. We discuss the design implications for utilizing explainable AI techniques for model comparison, and supporting the unique user needs to compare candidate models generated by AutoML.2021SNShweta Narkar et al.Explainable AI (XAI)AutoML InterfacesInteractive Data VisualizationIUI
Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP ModelsData scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, an interfaces framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to distill and share their domain knowledge using domain concepts extractors and five types of label justification over a representative data sample. Ziva is especially useful in cold-start situations (no training data available), and lowers the communication barrier between domain experts and data scientists. The design of Ziva is informed by preliminary interviews with data scientists and ML engineers, in order to understand current practices of knowledge sharing from domain experts for ML development projects. To assess our design, we run a mix-method case-study to evaluate how Ziva can automatically facilitate interaction of domain experts and data scientists. Our results highlight that (1) domain experts are able to use Ziva provide rich domain knowledge, while maintaining low mental load and stress levels; and (2) data scientists find Ziva's output helpful for learning essential information about the domain, offering scalability of information, and lowering the burden on domain experts to share knowledge directly. We conclude this work by experimenting with building NLP models using the domain-knowledge concepts and justifications output by our case study.2021SPSoya Park et al.Human-LLM CollaborationComputational Methods in HCIIUI
Expanding Explainability: Towards Social Transparency in AI systemsAs AI-powered systems increasingly mediate consequential decision-making, their explainability is critical for end-users to take informed and accountable actions. Explanations in human-human interactions are socially-situated. AI systems are often socio-organizationally embedded. However, Explainable AI (XAI) approaches have been predominantly algorithm-centered. We take a developmental step towards socially-situated XAI by introducing and exploring Social Transparency (ST), a sociotechnically informed perspective that incorporates the socio-organizational context into explaining AI-mediated decision-making. To explore ST conceptually, we conducted interviews with 29 AI users and practitioners grounded in a speculative design scenario. We suggested constitutive design elements of ST and developed a conceptual framework to unpack ST’s effect and implications at the technical, decision-making, and organizational level. The framework showcases how ST can potentially calibrate trust in AI, improve decision-making, facilitate organizational collective actions, and cultivate holistic explainability. Our work contributes to the discourse of Human-Centered XAI by expanding the design space of XAI.2021UEUpol Ehsan et al.Georgia Institute of TechnologyExplainable AI (XAI)AI Ethics, Fairness & AccountabilityAlgorithmic Transparency & AuditabilityCHI
Human-AI Collaboration in a Cooperative Game Setting: Measuring Social Perception and OutcomesHuman-AI interaction is pervasive across many areas of our day to day lives. In this paper, we investigate human-AI collaboration in the context of an AI-driven word association game. In our experiments, we test various dimensions of subjective social perceptions (rapport, intelligence, creativity and likeability) of participants towards their partners when participants believe they are playing with an AI or with a human. We also test subjective social perceptions of participants towards their partners when participants are presented with a variety of confidence levels. We ran a large scale study on Mechanical Turk (n=164) of this collaborative game. Our results show that when participants believe their partners were human, they found their partners to be more likeable, intelligent, creative and having more rapport and more use positive words to describe their partner's attributes. Drawing on both quantitative and qualitative findings, we discuss AI agent transparency, include design implications for tools incorporating or supporting human-AI collaboration, and lay out directions for future research. Our findings lead to implications for other forms of human-AI interaction and communication.2020ZAZahra Ashktorab et al.Human-AI Collaboration / Images in AICSCW
Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine TeachersThe wide adoption of Machine Learning (ML) technologies has created a growing demand for people who can train ML models. Some advocated the term "machine teacher" to refer to the role of people who inject domain knowledge into ML models. This "teaching" perspective emphasizes supporting the productivity and mental wellbeing of machine teachers through efficient learning algorithms and thoughtful design of human-AI interfaces. One promising learning paradigm is Active Learning (AL), by which the model intelligently selects instances to query a machine teacher for labels, so that the labeling workload could be largely reduced. However, in current AL settings, the human-AI interface remains minimal and opaque. A dearth of empirical studies further hinders us from developing teacher-friendly interfaces for AL algorithms. In this work, we begin considering AI explanations as a core element of the human-AI interface for teaching machines. When a human student learns, it is a common pattern to present one's own reasoning and solicit feedback from the teacher. When a ML model learns and still makes mistakes, the teacher ought to be able to understand the reasoning underlying its mistakes. When the model matures, the teacher should be able to recognize its progress in order to trust and feel confident about their teaching outcome. Toward this vision, we propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the surging field of explainable AI (XAI) into an AL setting. We conducted an empirical study comparing the model learning outcomes, feedback content and experience with XAL, to that of traditional AL and coactive learning (providing the model's prediction without explanation). Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and additional cognitive workload. Our study also reveals important individual factors that mediate a machine teacher's reception to AI explanations, including task knowledge, AI experience and Need for Cognition. By reflecting on the results, we suggest future directions and design implications for XAL, and more broadly, machine teaching through AI explanations.2020BGBhavya Ghai et al.Interpreting and Explaining AICSCW
Questioning the AI: Informing Design Practices for Explainable AI User ExperiencesA surge of interest in explainable AI (XAI) has led to a vast collection of algorithmic work on the topic. While many recognize the necessity to incorporate explainability features in AI systems, how to address real-world user needs for understanding AI remains an open question. By interviewing 20 UX and design practitioners working on various AI products, we seek to identify gaps between the current XAI algorithmic work and practices to create explainable AI products. To do so, we develop an algorithm-informed XAI question bank in which user needs for explainability are represented as prototypical questions users might ask about the AI, and use it as a study probe. Our work contributes insights into the design space of XAI, informs efforts to support design practices in this space, and identifies opportunities for future XAI work. We also provide an extended XAI question bank and discuss how it can be used for creating user-centered XAI.2020QLQ. Vera Liao et al.IBM Research AIExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, CreationWith the rise of big data, there has been an increasing need for practitioners in this space and an increasing opportunity for researchers to understand their workflows and design new tools to improve it. Data science is often described as data-driven, comprising unambiguous data and proceeding through regularized steps of analysis. However, this view focuses more on abstract processes, pipelines, and workflows, and less on how data science workers engage with the data. In this paper, we build on the work of other CSCW and HCI researchers in describing the ways that scientists, scholars, engineers, and others work with their data, through analyses of interviews with 21 data science professionals. We set five approaches to data along a dimension of interventions: Data as given; as captured; as curated; as designed; and as created. Data science workers develop an intuitive sense of their data and processes, and actively shape their data. We propose new ways to apply these interventions analytically, to make sense of the complex activities around data practices.2019MMMichael Muller et al.IBM ResearchInteractive Data VisualizationComputational Methods in HCICHI
Resilient Chatbots: Repair Strategy Preferences for Conversational BreakdownsText-based conversational systems, also referred to as chatbots, have grown widely popular. Current natural language understanding technologies are not yet ready to tackle the complexities in conversational interactions. Breakdowns are common, leading to negative user experiences. Guided by communication theories, we explore user preferences for eight repair strategies, including ones that are common in commercially-deployed chatbots (e.g., confirmation, providing options), as well as novel strategies that explain characteristics of the underlying machine learning algorithms. We conducted a scenario-based study to compare repair strategies with Mechanical Turk workers (N=203). We found that providing options and explanations were generally favored, as they manifest initiative from the chatbot and are actionable to recover from breakdowns. Through detailed analysis of participants' responses, we provide a nuanced understanding on the strengths and weaknesses of each repair strategy.2019ZAZahra Ashktorab et al.IBM Research AIConversational ChatbotsCHI
Explaining Models: An Empirical Study of How Explanations Impact Fairness JudgmentEnsuring fairness of machine learning systems is a human-in-the-loop process. It relies on developers, users, and the general public to identify fairness problems and make improvements. To facilitate the process we need effective, unbiased, and user-friendly explanations that people can confidently rely on. Towards that end, we conducted an empirical study with four types of programmatically generated explanations to understand how they impact people's fairness judgments of ML systems. With an experiment involving more than 160 Mechanical Turk workers, we show that: 1) Certain explanations are considered inherently less fair, while others can enhance people's confidence in the fairness of the algorithm; 2) Different fairness problems--such as model-wide fairness issues versus case-specific fairness discrepancies--may be more effectively exposed through different styles of explanation; 3) Individual differences, including prior positions and judgment criteria of algorithmic fairness, impact how people react to different styles of explanation. We conclude with a discussion on providing personalized and adaptive explanations to support fairness judgments of ML systems.2019JDJonathan Dodge et al.Explainable AI (XAI)AI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasIUI
Towards an optimal dialog strategy for information retrieval using both open-ended and close-ended questionsThe emerging paradigm of dialogue interfaces for information retrieval systems opens new opportunities for interactively narrowing down users' information query and improving search results. Prior research has largely focused on methods that use a set of close-ended questions, such as decision tree, to learn about the user's search target. However, when there is a myriad of documents or items to search, solely relying on close-ended questions can lead to long and undesirable dialogues. We propose an adaptive dialogue strategy framework that incorporates open-ended questions at the optimal timing to reduce the length of the dialogues. We propose a method to estimate the information gain of open-ended questions, and in each dialog turn, we compare it with that of close-ended questions to decide which question to ask. We present experiments using several synthetic datasets designed to explore the behavior of such an adaptive dialogue strategy under different environments, and compare the system's performance with that of a decision-tree only strategy.2018YZYunfeng Zhang et al.Conversational ChatbotsAI-Assisted Decision-Making & AutomationIUI
Face Value? Exploring the Effects of Embodiment for a Group Facilitation AgentWe are interested in increasing the ability of groups to collaborate efficiently by leveraging new advances in AI and Conversational Agent (CA) technology. Given the longstanding debate on the necessity of embodiment for CAs, bringing them to groups requires answering the questions of whether and how providing a CA with a face affects its interaction with the humans in a group. We explored these questions by comparing group decision-making sessions facilitated by an embodied agent, versus a voice-only agent. Results of an experiment with 20 user groups revealed that while the embodiment improved various aspects of group's social perception of the agent (e.g., rapport, trust, intelligence, and power), its impact on the group-decision process and outcome was nuanced. Drawing on both quantitative and qualitative findings, we discuss the pros and cons of embodiment, argue that the value of having a face depends on the types of assistance the agent provides, and lay out directions for future research.2018ASAmeneh Shamekhi et al.Northeastern UniversityVoice User Interface (VUI) DesignAgent Personality & AnthropomorphismCHI
All Work and No Play? Conversations with a Question-and-Answer Chatbot in the WildMany conversational agents (CAs) are developed to answer users' questions in a specialized domain. In everyday use of CAs, user experience may extend beyond satisfying information needs to the enjoyment of conversations with CAs, some of which represent playful interactions. By studying a field deployment of a Human Resource chatbot, we report on users' interest areas in conversational interactions to inform the development of CAs. Through the lens of statistical modeling, we also highlight rich signals in conversational interactions for inferring user satisfaction with the instrumental usage and playful interactions with the agent. These signals can be utilized to develop agents that adapt functionality and interaction styles. By contrasting these signals, we shed light on the varying functions of conversational interactions. We discuss design implications for CAs, and directions for developing adaptive agents based on users' conversational behaviors.2018QLQ. Vera Liao et al.IBM T.J. Watson Research CenterConversational ChatbotsAgent Personality & AnthropomorphismCHI