Outcomes, Perceptions, and Interaction Strategies of Novice Programmers Studying with ChatGPTLarge Language Model (LLM) conversational agents are increasingly used in programming education, yet we still lack insight into how novices engage with them for conceptual learning compared with human tutoring. This mixed-methods study compared learning outcomes and interaction strategies of novices using ChatGPT or human tutors. A controlled lab study with 20 students enrolled in introductory programming courses revealed that students employ markedly different interaction strategies with AI versus human tutors: ChatGPT users relied on brief, zero-shot prompts and received lengthy, context-rich responses but showed minimal prompt refinement, while those working with human tutors provided more contextual information and received targeted explanations. Although students distrusted ChatGPT’s accuracy, they paradoxically preferred it for basic conceptual questions due to reduced social anxiety. We offer empirically grounded recommendations for developing AI literacy in computer science education and designing learning-focused conversational agents that balance trust-building with maintaining the social safety that facilitates uninhibited inquiry.2025JPJacob Penney et al.Human-LLM CollaborationProgramming Education & Computational ThinkingCUI
Is Innovation Shaped by Masculine Norms? A Longitudinal Case Study of a Consumer ProductIn theories and metrics of product innovation, gender is invisible or ignored, and innovative products are presumed to be gender-neutral or agnostic. Yet, many ostensibly-innovative consumer products overlook the needs of women and gender non-conforming individuals, suggesting an implicit masculine framing. This research introduces a mixed-methods approach for analyzing gender scripts in product features and marketing, applied to a case study of the Apple Watch (2015–2024). Findings reveal a sustained reinforcement of gender norms: masculine-coded language and industrial design dominate how innovation is presented, even as objective technical improvements decline. In contrast, feminine-coded features, especially relational or user-centered ones, receive less emphasis in innovation framing. This work demonstrates how masculine value systems shape perceptions and theories of innovation and offers opportunities for future research on gender and design.2025CBCaseysimone Ballestas et al.Inclusive DesignGender & Race Issues in HCITechnology Ethics & Critical HCIDIS
Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving StylesMotivations: Recent research has emerged on generally how to improve AI products’ human-AI interaction (HAI) user experience (UX), but relatively little is known about HAI-UX inclusivity. For example, what kinds of users are supported, and who are left out? What product changes would make it more inclusive? Objectives: To help fill this gap, we present an approach to measuring what kinds of diverse users an AI product leaves out and how to act upon that knowledge. To bring actionability to the results, the approach focuses on users’ problem-solving diversity. Thus, our specific objectives were (1) to show how the measure can reveal which participants with diverse problem-solving styles were left behind in a set of AI products and (2) to relate participants’ problem-solving diversity to their demographic diversity, specifically gender and age. Methods: We performed 18 experiments, discarding two that failed manipulation checks. Each experiment was a 2x2 factorial experiment with online participants, comparing two AI products: one deliberately violating 1 of 18 HAI guidelines and the other applying the same guideline. For our first objective, we used our measure to analyze how much each AI product gained/lost HAI-UX inclusivity compared to its counterpart, where inclusivity meant supportiveness to participants with particular problem-solving styles. For our second objective, we analyzed how participants’ problem-solving styles aligned with their gender identities and ages. Results and Implications: Participants’ diverse problem-solving styles revealed six types of inclusivity results: (1) the AI products that followed an HAI guideline were almost always more inclusive across diversity of problem-solving styles than the products that did not follow that guideline—but “who” got most of the inclusivity varied widely by guideline and by problem-solving style; (2) when an AI product had risk implications, four variables’ values varied in tandem: participants’ feelings of control, their (lack of) suspicion, their trust in the product, and their certainty while using the product; (3) the more control an AI product offered users, the more inclusive it was; (4) whether an AI product was learning from “my” data or other people’s affected how inclusive that product was; (5) participants’ problem-solving styles skewed differently by gender and age group; and (6) almost all of the results suggested actions that HAI practitioners could take to improve their products’ inclusivity further. Together, these results suggest that a key to improving the demographic inclusivity of an AI product (e.g., across a wide range of genders, ages) can often be obtained by improving the product’s support of diverse problem-solving styles.2025AAAndrew Anderson et al.AI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasInclusive DesignIUI
Analyzing the Shifts in Users Data Focus in Exploratory Visual AnalysisUsers often begin exploratory visual analysis (EVA) without clear analysis goals but iteratively refine them as they learn more about their data. As an essential step in data science, researchers want to aid EVA by developing responsive and personalized visualization tools. For this, accurate models of users’ exploration behavior are becoming increasingly vital. However, many computational models assume that the human exploration behavior is static, which goes against the dynamic nature of EVA. In this benchmark study, we investigate how users dynamically shift their data focus in EVA and seek to find the best online learning methods for modeling users’ data focus shifts. Through empirical analyses, we find reinforcement learning algorithms are better in this regard than existing approaches from visualization research. Furthermore, we discuss our findings and their impact on the future of user modeling for visualization system design.2025SSSanad Saha et al.Interactive Data VisualizationVisualization Perception & CognitionIUI
Incorporating Sustainability in Electronics Design: Obstacles and OpportunitiesLife cycle assessment (LCA) is a methodology for holistically measuring the environmental impact of a product from initial manufacturing to end-of-life disposal. However, the extent to which LCA informs the design of computing devices remains unclear. To understand how this information is collected and applied, we interviewed 17 industry professionals with experience in LCA or electronics design, systematically coded the interviews, and investigated common themes. These themes highlight the challenge of LCA data collection and reveal distributed decision-making processes where responsibility for sustainable design choices—and their associated costs—is often ambiguous. Our analysis identifies opportunities for HCI technologies to support LCA computation and its integration into the design process to facilitate sustainability-oriented decision-making. While this work provides a nuanced discussion about sustainable design in the information and communication technologies (ICT) hardware industry, we hope our insights will also be valuable to other sectors.2025ZEZachary Englhardt et al.University of Washington, Computer Science and EngineeringSustainable HCIEcological Design & Green ComputingCHI
The Matchmaker Inclusive Design Curriculum: A Faculty-Enabling Curriculum to Teach Inclusive Design Throughout Undergraduate CSDespite efforts to raise awareness of societal and ethical issues in CS education, research shows students often do not act upon their new awareness (Problem 1). One such issue, well-established by HCI research, is that much of technology contains barriers impacting numerous populations—such as minoritized genders, races, ethnicities, and more. HCI has inclusive design methods that help—but these skills are rarely taught, even in HCI classes (Problem 2). To address Problems 1 and 2, we created the Matchmaker Curriculum to pair CS faculty—including non-HCI faculty—with inclusive design elements to allow for inclusive design skill-building throughout their CS program. We present the curriculum and a field study, in which we followed 18 faculty along their journey. The results show how the Matchmaker Curriculum equipped 88% of these faculty with enough inclusive design teaching knowledge to successfully embed actionable inclusive design skill-building into 13 CS courses.2024RGRosalinda Garcia et al.Oregon State UniversityCollaborative Learning & Peer TeachingSpecial Education TechnologyInclusive DesignCHI
Modelling Experts' Sampling Strategy to Balance Multiple Objectives During Scientific ExplorationsDuring scientific explorations, scientists often hold multiple and often conflicting objectives. Understanding how scientists prioritize and balance these objectives is crucial for developing cognitively-compatible robotic teammates and fostering effective human-robot collaboration. In this study, we seek to improve the cognitive compatibility of robotic algorithms by modelling human' decision making processes under multiple objectives. Collected human decision data from 141 sampling steps indicate that the majority of scientists adopt one of the following objective balancing strategies: (i) A Focus mode, where experts select sampling location to primarily optimize their primary objective; (ii) A Hierarchy mode, where experts hierarchically satisfy foremost their primary objective, then, to a lesser extent, their secondary objective; and (iii) A Trade-off mode, where experts select sampling locations to satisfy all objectives, even the location was not ideal for either objective. To understand how experts choose among the different modes, we quantitatively characterize the three types of strategies, by representing the decision data from each sampling step in an objective function space. Analysis of the strategy types reveals that, experts' adaptation of multi-objective coordinating strategies are primarily governed by two key decision factors: current stages of sampling, and outstanding reward values. This discovery allows the robot to use an extremely simple decision algorithm to connect experts' high-level objectives to desired sampling locations when balancing multiple objectives. Deployment of this algorithm at a planetary-analogue field exploration mission on Mt. Hood demonstrates the potential for robots to use cognitively-compatible algorithms to participate in decision making and aid with the adaptation of sampling plans that align with scientists' high-level goals.2024SLShipeng Liu et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationComputational Methods in HCIHRI
Iterative Robot Waiter Algorithm Design: Service Expectations and Social FactorsMobile robots carrying food in restaurants are here. What service behavior norms do people expect them to follow? This paper evaluates robot waiter algorithms and service parameters for scenarios with two participants at a simulated cocktail event. Varying body-storming inspired context variables such as: “hunger level" and “relationship to each other," robot delivery algorithms (lead, follow, ambient), and participant pose (standing, seated). Due to increasing deployment of robotic systems, companies may need to rapidly iterate on situated, human-cognizant robotic behaviors that take functional and social considerations into account. We utilized a within-subjects design and improvisational methods, in which pairs of people were given a series of context prompts, and told to participate as felt natural. Output variables included whether they took food and post-trial survey ratings of the robot. The results show a positive correlation between food taking (or feelings of obligation to take food) and human or robot initiative, and negative correlation in the mixed-ambient algorithm with no explicit leader. The robot waiter that comes to the table is the clearest and most noticeable. Bringing food one person ordered to the other person was unforgivable. When in doubt, head to the center-point.2024HKHeather Knight et al.Head-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)Social Robot InteractionHuman-Robot Collaboration (HRC)HRI
Zeno: An Interactive Framework for Behavioral Evaluation of Machine LearningMachine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners discover real-world patterns and validate systematic failures. We conducted 18 semi-structured interviews with ML practitioners to better understand the challenges of behavioral evaluation and found that it is a collaborative, use-case-first process that is not adequately supported by existing task- and domain-specific tools. Using these findings, we designed Zeno, a general-purpose framework for visualizing and testing AI systems across diverse use cases. In four case studies with participants using Zeno on real-world models, we found that practitioners were able to reproduce previous manual analyses and discover new systematic failures.2023ÁCÁngel Alexander Cabrera et al.Carnegie Mellon UniversityExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
Perceptual Pat: A Virtual Human Visual System for Iterative Visualization DesignDesigning a visualization is often a process of iterative refinement where the designer improves a chart over time by adding features, improving encodings, and fixing mistakes. However, effective design requires external critique and evaluation. Unfortunately, such critique is not always available on short notice and evaluation can be costly. To address this need, we present Perceptual Pat, an extensible suite of AI and computer vision techniques that forms a virtual human visual system for supporting iterative visualization design. The system analyzes snapshots of a visualization using an extensible set of filters—including gaze maps, text recognition, color analysis, etc—and generates a report summarizing the findings. The web-based Pat Design Lab provides a version tracking system that enables the designer to track improvements over time. We validate Perceptual Pat using a longitudinal qualitative study involving 4 professional visualization designers that used the tool over a few days to design a new visualization.2023SSSungbok Shin et al.University of MarylandInteractive Data VisualizationUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI
TiiS: After-Action Review for AI (AAR/AI)Dodge 等人提出 AAR/AI 行动后复盘框架,旨在通过人工智能辅助技术改进决策回顾与反思过程。2022JDJonathan Dodge et al.AI-Assisted Decision-Making & AutomationIUI
How Do People Rank Multiple Mutant Agents?How might a person decide on which of several AI-powered sequential decision-making systems to rely? For example, imagine car buyer Blair shopping for a self-driving car, or developer Dillon trying to choose an appropriate ML model to use in their application. Their first choice might be infeasible (e.g., too expensive in money or execution time), so they may need to select their second or third choice. To address this question, this paper presents: 1) a new XAI empirical task to measure explanations: "the Ranking Task"; 2) a new strategy for inducing controllable agent variations---Mutant Agent Generation; 3) novel explanations for sequential decision-making agents; 4) an adaptation to the AAR/AI assessment process; and 5) a qualitative study around these devices with 10 participants to investigate how they performed the Ranking Task task on our mutant agents, using our explanations, and structured by AAR/AI. From an XAI researcher perspective, just as mutation testing can be applied to any code, mutant agent generation can be applied to essentially any neural network for which one wants to evaluate an assessment process or explanation type. As to an XAI user's perspective, the participants ranked the agents well overall, but showed the importance of high explanation resolution for close differences between agents. The participants also revealed the importance of supporting a wide diversity of explanation diets and agent "test selection" strategies.2022JDJonathan Dodge et al.Explainable AI (XAI)AI-Assisted Decision-Making & AutomationIUI
FitVid: Responsive and Flexible Video Content AdaptationMobile video-based learning attracts many learners with its mobility and ease of access. However, most lectures are designed for desktops. Our formative study reveals mobile learners' two major needs: more readable content and customizable video design. To support mobile-optimized learning, we present FitVid, a system that provides responsive and customizable video content. Our system consists of (1) an adaptation pipeline that reverse-engineers pixels to retrieve design elements (e.g., text, images) from videos, leveraging deep learning with a custom dataset, which powers (2) a UI that enables resizing, repositioning, and toggling in-video elements. The content adaptation improves the guideline compliance rate by 24% and 8% for word count and font size. The content evaluation study (n=198) shows that the adaptation significantly increases readability and user satisfaction. The user study (n=31) indicates that FitVid significantly improves learning experience, interactivity, and concentration. We discuss design implications for responsive and customizable video adaptation.2022JKJeongyeon Kim et al.KAISTInteractive Data VisualizationOnline Learning & MOOC PlatformsCHI
The Long Road Ahead: Ongoing Challenges in Contributing to Large OSS Organizations and What to DoOpen source communities hosted in large foundations operate in a complex socio-technical ecosystem, which includes a heterogeneous mix of projects and stakeholders. Previous work has thus far investigated the challenges faced in OSS communities from the point of view of specific stakeholders, primarily at the level of individual projects. None have yet studied the challenges faced within a large, federated open source organization. In this paper, we aim to bridge this gap to identify ongoing challenges contributors face in a mature OSS organization. To do so, we surveyed 624 contributors at the Apache Software Foundation (ASF) and ran 11 semi-structured follow up interviews. We validated our findings through member checking with the interviewees as well as the ASF Diversity and Inclusion (D&I) committee. The contributions of this paper include: (1) an empirically-evidenced conceptual model of the 88 challenges that contributors face in a mature OSS foundation and (2) a set of 48 community-recommended strategies for alleviating these challenges. Our results show that even well-established and mature organizations still face a variety of individual and project-specific challenges and that it is difficult to design a comprehensive set of processes and guidelines to match the needs and expectations of a diverse and large federated community. Our conceptual challenges model and associated strategies to mitigate them can provide guidance to other OSS foundations and projects helping them in building better support processes and tools to create a successful, thriving community of contributors.2021MGMariam Guizani et al.Open CollaborationCSCW
Developers Who Vlog: Dismantling Stereotypes through Community and Identity"Developers are more than ``nerds behind computers all day'', they lead a normal life, and not all take the traditional path to learn programming. However, the public still sees software development as a profession for ``math wizards''. To learn more about this special type of knowledge worker from their first-person perspective, we conducted three studies to learn how developers describe a day in their life through vlogs on YouTube and how these vlogs were received by the broader community. We first interviewed 16 developers who vlogged to identify their motivations for creating this content and their intention behind what they chose to portray. Second, we analyzed 130 vlogs (video blogs) to understand the range of the content conveyed through videos. Third, we analyzed 1176 comments from the 130 vlogs to understand the impact the vlogs have on the audience. We found that developers were motivated to promote and build a diverse community, by sharing different aspects of life that define their identity, and by creating awareness about learning and career opportunities in computing. They used vlogs to share a variety of how software developers work and live---showcasing often unseen experiences, including intimate moments from their personal life. From our comment analysis, we found that the vlogs were valuable to the audience to find information and seek advice. Commenters sought opportunities to connect with others over shared triumphs and trials they faced that were also shown in the vlogs. As a central theme, we found that developers use vlogs to challenge the misconceptions and stereotypes around their identity, work-life, and well-being. These social stigmas are obstacles to an inclusive and accepting community and can deter people from choosing software development as a career. We also discuss the implications of using vlogs to support developers, researchers, and beyond."2021SCSouti Chattopadhyay et al.Online IdentitiesCSCW
Wikipedia ORES Explorer: Visualizing Trade-offs For Designing Applications With Machine Learning APIWith the growing industry applications of Artificial Intelligence (AI) systems, pre-trained models and APIs have emerged and greatly lowered the barrier of building AI-powered products. However, novice AI application designers often struggle to recognize the inherent algorithmic trade-offs and evaluate model fairness before making informed design decisions. In this study, we examined the Objective Revision Evaluation System (ORES), a machine learning (ML) API in Wikipedia used by the community to build anti-vandalism tools. We designed an interactive visualization system to communicate model threshold trade-offs and fairness in ORES. We evaluated our system by conducting 10 in-depth interviews with potential ORES application designers. We found that our system helped application designers who have limited ML backgrounds learn about in-context ML knowledge, recognize inherent value trade-offs, and make design decisions that aligned with their goals. By demonstrating our system in a real-world domain, this paper presents a novel visualization approach to facilitate greater accessibility and human agency in AI application design.2021ZYZining Ye et al.Explainable AI (XAI)Interactive Data VisualizationDIS
Hidden Figures: Roles and Pathways of Successful OSS ContributorsOpen Source Software (OSS) development is a collaborative endeavor where expert developers, who are spread around the globe create software solutions. Given this characteristic, OSS communities have been studied as technical communities, where stakeholders join and evolve in their careers based on their (often voluntary) code contributions to the project. However, the OSS landscape is now changing with more people and companies getting involved in OSS. This means that projects now need people in non-technical roles and activities to keep the project sustainable and evolving. In this paper, we focus on understanding the roles and activities that are part of the current OSS landscape and the different career pathways in OSS. By conducting and analyzing 17 interviews with OSS contributors who are well known in the community, we provide empirical evidence of the existence and importance of community-centric roles (e.g advocate, license manager, community founder) in addition to the well-known project-centric ones (e.g maintainer, core member). However, the community-centric roles typically remain hidden, since these roles may not leave traces in software repositories typically analyzed by researchers. We found that people can build a career in OSS through different roles and activities, with different backgrounds, including those not related to software development. Further, people’s career pathways are fluid, moving between project and community-centric roles. Our work highlights that communities and researchers need to take action to acknowledge the importance of these varied roles, making these roles visible and well-recognized, which can ultimately help attract and retain more people in the OSS projects.2020BTBianca Trinkenreich et al.Collaboration: Creating and Writing TogetherCSCW
"Would You Please Buy Me a Coffee?'': How Microcultures Impact People's Helpful Actions Toward RobotsRobots sometimes face hardware and algorithmic challenges that exceed their capabilities, e.g., an armless robot pressing an elevator button. Previous work suggests that rather than augmenting the robot capabilities, sometimes robots can simply ask for help. A central contribution of this paper is the discovery of how people’s helping behaviors vary within local microcultures, i.e., shared patterns of behaviors and norms linked to local atmospheric conditions and situations. Our methods combine techniques from both social robotics research and ethnography to investigate how people's helping behaviors toward robots vary across six cafes on a single college campus. We deploy a simple robot to request help ordering items, analyzing the 268 interaction instances to find significant variations in both help and care behaviors toward the robot. Microcultural and situational factors influence this help, motivating the inclusion of cultural criteria into the behavioral predictions of human-robot interaction systems.2020AFAbrar Fallatah et al.Social Robot InteractionEmpowerment of Marginalized GroupsDIS
Closeness is Key over Long Distances: Effects of Interpersonal Closeness on Telepresence ExperienceTelepresence robots act as the remote embodiments of human operators, enabling people to stay connected to friends, family, and coworkers over lengthy physical separations. However, the factors affecting how humans can best make use of such systems are not yet well understood. This paper explores the effects of personalization and relationship closeness on telepresence via two studies. Study 1 was a between-participants experiment that investigated telepresence robot personalization. 32 pairs of friends (N = 64) participated in the study’s team-building-style activities and answered questions about robot operator presence. The results unexpectedly indicated that relationship closeness influenced the interaction experience more than any other considered predictor variable. To study closeness more rigorously as the central manipulation, we conducted Study 2, a between-participants experiment with 24 pairs (N = 48) and a similar procedure. Robot operators who reported a closer relationship with their teammate felt more present in this investigation. These findings can inform the design and application of telepresence robot systems to increase a remote operator’s feelings of presence via robot.2020NFNaomi T. Fitter et al.Teleoperation & TelepresenceHRI
What's Wrong with Computational Notebooks? Pain Points, Needs, and Design OpportunitiesComputational notebooks — such as Azure, Databricks, and Jupyter — are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists incorporate more of their activities into notebooks, they encounter unexpected difficulties, or pain points, that impact their productivity and disrupt their workflow. Through a systematic, mixed-methods study using semi-structured interviews (n=20) and survey (n=156) with data scientists, we catalog nine pain points when working with notebooks. Our findings suggest that data scientists face numerous pain points throughout the entire workflow — from setting up notebooks to deploying to production — across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.2020SCSouti Chattopadhyay et al.Oregon State UniversityIdentity & Avatars in XRInteractive Data VisualizationComputational Methods in HCICHI