Current and Future Use of Large Language Models for Knowledge WorkLarge Language Models (LLMs) have introduced a paradigm shift in interaction with AI technology, enabling knowledge workers to complete tasks by specifying their desired outcome in natural language. LLMs have the potential to increase productivity and reduce tedious tasks in an unprecedented way. A systematic study of LLM adoption for work can provide insight into how LLMs can best support these workers. To explore knowledge workers' current and desired usage of LLMs, we ran a survey (n=216). Workers described tasks they already used LLMs for, like generating code or improving text, but imagined a future with LLMs integrated into their workflows and data. We ran a second survey (n=107) a year later that validated our initial findings and provides insight into up-to-date LLM use by knowledge workers. We discuss implications for adoption and design of generative AI technologies for knowledge work.2025MBMichelle Brachman et al.Working with AICSCW
Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and FeedbackMillions of users come to online peer counseling platforms to seek support. However, studies show that online peer support groups are not always as effective as expected largely due to users' negative experiences with unhelpful counselors. Peer counselors are key to the success of online peer counseling platforms, but most often do not receive appropriate training. Hence, we introduce CARE: an AI-based tool to empower and train peer counselors through practice and feedback. Concretely, CARE helps diagnose which counseling strategies are needed in a given situation and suggests example responses to counselors during their practice sessions. Building upon the Motivational Interviewing framework, CARE utilizes large-scale counseling conversation data with text generation techniques to enable these functionalities. We demonstrate the efficacy of CARE by performing quantitative evaluations and qualitative user studies through simulated chats and semi-structured interviews, finding that CARE especially helps novice counselors in challenging situations. The code is available at https://app.box.com/s/z3a4dwgmeqfy8vbzi9cgmg0yhn6t4j53.2025SHShang-Ling (Kate) Hsu et al.Caring at a DistanceCSCW
EvalAssist: Insights on Task-Specific Evaluations and AI-Assisted Judgment Strategy PreferencesWith the broad availability of large language models and their ability to generate vast outputs using varied prompts and configurations, determining the best output for a given task requires an intensive evaluation process, one where machine learning practitioners must decide how to assess the outputs and then carefully carry out the evaluation. This process is both time-consuming and costly. As practitioners work with an increasing number of models, they must now evaluate outputs to determine which model performs best for a given task. LLMs are increasingly used as evaluators to filter training data, evaluate model performance or assist human evaluators with detailed assessments. Our application, EvalAssist, supports this process by aiding users in interactively refining evaluation criteria. In our study with machine learning practitioners (n=15), each completing 6 tasks yielding 131 evaluations, we explore how task-related factors and judgment strategies influence criteria refinement and user perceptions. Findings show that users performed more evaluations with direct assessment by making criteria task-specific, modifying judgments, and changing the AI evaluator model. We conclude with recommendations for how systems can better support practitioners with AI-assisted evaluations.2025ZAZahra Ashktorab et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationUIST
"The Diagram is like Guardrails": Structuring GenAI-assisted Hypotheses Exploration with an Interactive Shared RepresentationData analysis encompasses a spectrum of tasks, from high-level conceptual reasoning to lower-level execution. While AI-powered tools increasingly support execution tasks, there remains a need for intelligent assistance in conceptual tasks. This paper investigates the design of an ordered node-link tree interface augmented with AI-generated information hints and visualizations, as a potential shared representation for hypothesis exploration. Through a design probe (n=22), participants generated diagrams averaging 21.82 hypotheses. Our findings showed that the node-link diagram acts as "guardrails" for hypothesis exploration, facilitating structured workflows, providing comprehensive overviews, and enabling efficient backtracking. The AI-generated information hints, particularly visualizations, aided users in transforming abstract ideas into data-backed concepts while reducing cognitive load. We further discuss how node-link diagrams can support both parallel exploration and iterative refinement in hypothesis formulation, potentially enhancing the breadth and depth of human-AI collaborative data analysis.2025ZDZijian Ding et al.Human-LLM CollaborationInteractive Data VisualizationC&C
Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI ChatbotAgentic systems aim to handle complex problems with increasing system autonomy using generative AI. These new agentic systems are becoming more feasible and easier to build. Yet we know little about what end-users need to know to use these systems appropriately. We study one such agentic system, "Gent," which can break down complex problems into a set of actions, provide a rationale for each action, interact with external information, and cite its sources. Our goals were to understand users' mental models of the agentic system, the information users leveraged to evaluate the accuracy of the system, and users' information needs. In our study (N=24), participants interacted with Gent for four information seeking tasks where they could see Gent’s actions, rationale, and sources. Participants' mental models centered around the search-like qualities of the system, with their confidence impacted by the website sources. Participants' mental models often lacked insight into the workings of the generative AI model and agentic framework that impact the actions the system takes. Participants used the descriptions of the system's actions to support their evaluation of the accuracy of the system and wanted to know more about how the system got to its answers. Participants also relied on their own personal knowledge and the style or length of Gent's responses to evaluate the accuracy. Our results highlight the need for further transparency in agentic AI systems to support end-users in evaluating system outputs and help them build effective mental models.2025MBMichelle Brachman et al.Conversational ChatbotsAgent Personality & AnthropomorphismExplainable AI (XAI)IUI
Design Principles for Generative AI ApplicationsGenerative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and safe use. We present six principles for the design of generative AI applications that address unique characteristics of generative AI UX and offer new interpretations and extensions of known issues in the design of AI applications. Each principle is coupled with a set of design strategies for implementing that principle via UX capabilities or through the design process. The principles and strategies were developed through an iterative process involving literature review, feedback from design practitioners, validation against real-world generative AI applications, and incorporation into the design process of two generative AI applications. We anticipate the principles to usefully inform the design of generative AI applications by driving actionable design recommendations.2024JWJustin D. Weisz et al.IBM Research AIGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationPrototyping & User TestingCHI
Fairness Evaluation in Text Classification: Machine Learning Practitioner Perspectives of Individual and Group FairnessMitigating algorithmic bias is a critical task in the development and deployment of machine learning models. While several toolkits exist to aid machine learning practitioners in addressing fairness issues, little is known about the strategies practitioners employ to evaluate model fairness and what factors influence their assessment, particularly in the context of text classification. Two common approaches of evaluating the fairness of a model are group fairness and individual fairness. We run a study with Machine Learning practitioners (n=24) to understand the strategies used to evaluate models. Metrics presented to practitioners (group vs. individual fairness) impact which models they consider fair. Participants focused on risks associated with underpredicting/overpredicting and model sensitivity relative to identity token manipulations. We discover fairness assessment strategies involving personal experiences or how users form groups of identity tokens to test model fairness. We provide recommendations for interactive tools for evaluating fairness in text classification.2023ZAZahra Ashktorab et al.IBM ResearchExplainable AI (XAI)Algorithmic Fairness & BiasCHI
AI-Assisted Human Labeling: Batching for Efficiency without OverrelianceHuman labeling of training data is often a time-consuming, expensive part of machine learning. In this paper, we study "batch labeling", an AI-assisted UX paradigm, that aids data labelers by allowing a single labeling action to apply to multiple records. We ran a large scale study on Mechanical Turk with 156 participants to investigate labeler-AI-batching system interaction. We investigate the efficacy of the system when compared to a single-item labeling interface (i.e., labeling one record at-a-time), and evaluate the impact of batch labeling on accuracy, and time. We further investigate the impact of AI algorithm quality and its effects on the labelers' overreliance, as well as potential mechanisms for mitigating it. Our work offers implications for the design of batch labeling systems and for work practices focusing on labeler-AI-batching system interaction.2021ZAZahra Ashktorab et al.Data Work and AICSCW
The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor GameRecently, AI Security researchers have identified a new way crowdsourced data can be intentionally compromised. Backdoor attacks are a process through which an adversary creates a vulnerability in a machine learning model by “poisoning’' the training set by selectively mislabelling images containing a backdoor object. The model continues to perform well on standard testing data but misclassifies on the inputs that contain the backdoor chosen by the adversary. In this paper, we present the design and development of the Backdoor Game, the first game in which users can interact with different poisoned classifiers and upload their own images containing backdoor objects in an engaging way. We conduct semi-structured interviews with eight different participants who interacted with a first version of the Backdoor Game and deploy the game to Mechanical Turk users (N=68) to demonstrate how users interacted with the backdoor objects. We present results including novel types of interactions that emerged as a result of game play and design recommendations for the improvement of the system. The combined design, development and deployment of our system can help AI Security researchers to study this emerging concept, from determining the effectiveness of different backdoor objects to the collection of diverse and unique backdoor objects from the public, increasing the safety of future AI systems.2021ZAZahra Ashktorab et al.Explainable AI (XAI)Online Harassment & Counter-ToolsCrowdsourcing Task Design & Quality ControlIUI
Mental Models of AI Agents in a Cooperative Game SettingAs more and more forms of AI become prevalent, it becomes increasingly important to understand how people develop mental models of these systems. In this work we study people's mental models of AI in a cooperative word guessing game. We run think-aloud studies in which people play the game with an AI agent; through thematic analysis we identify features of the mental models developed by participants. In a large-scale study we have participants play the game with the AI agent online and use a post-game survey to probe their mental model. We find that those who win more often have better estimates of the AI agent's abilities. We present three components for modeling AI systems, propose that understanding the underlying technology is insufficient for developing appropriate conceptual models (analysis of behavior is also necessary), and suggest future work for studying the revision of mental models over time.2020KGKaty Ilonka Gero et al.Columbia UniversityAgent Personality & AnthropomorphismAI-Assisted Decision-Making & AutomationCHI
Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AIThe rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.2019DWDakuo Wang et al.AICSCW
All Work and No Play? Conversations with a Question-and-Answer Chatbot in the WildMany conversational agents (CAs) are developed to answer users' questions in a specialized domain. In everyday use of CAs, user experience may extend beyond satisfying information needs to the enjoyment of conversations with CAs, some of which represent playful interactions. By studying a field deployment of a Human Resource chatbot, we report on users' interest areas in conversational interactions to inform the development of CAs. Through the lens of statistical modeling, we also highlight rich signals in conversational interactions for inferring user satisfaction with the instrumental usage and playful interactions with the agent. These signals can be utilized to develop agents that adapt functionality and interaction styles. By contrasting these signals, we shed light on the varying functions of conversational interactions. We discuss design implications for CAs, and directions for developing adaptive agents based on users' conversational behaviors.2018QLQ. Vera Liao et al.IBM T.J. Watson Research CenterConversational ChatbotsAgent Personality & AnthropomorphismCHI