What do people want to know about Artificial Intelligence (AI)? The Importance of Answering End-user Questions to Explain Autonomous Vehicle (AV) DecisionsImproving end-users’ understanding of decisions made by autonomous vehicles (AVs) driven by artificial intelligence (AI) can improve utilization and acceptance of AVs. However, current explanation mechanisms primarily help AI researchers and engineers in debugging and monitoring their AI systems, and may not address the specific questions of end-users, such as passengers, about AVs in various scenarios. In this paper, we conducted two user studies to investigate questions that potential AV passengers might pose while riding in an AV and evaluate how well answers to those questions improve their understanding of AI-driven AV decisions. Our initial formative study identified a range of questions about AI in autonomous driving that existing explanation mechanisms do not readily address. Our second study demonstrated that interactive text-based explanations effectively improved participants’ comprehension of AV decisions compared to simply observing AV decisions. These findings inform the design of interactions that motivate end-users to engage with and inquire about the reasoning behind AI-driven AV decisions.2025SMSomayeh Molaei et al.Explainable AI (XAI)CSCW
Playing ``Google's Game'': How Educational YouTubers Manage Tensions Between Education and MonetizationYouTube has become an important part of the educational ecosystem, with millions of viewers seeking informative videos and help with coursework. Educational YouTubers create this content, often balancing pedagogical rigor and entertainment value. However, creators need not only to promote their content to find viewers, but also to monetize. In this study, we explore the tensions educational YouTubers face when making monetized educational content. We conduct a qualitative interview study with 12 popular educational YouTubers about their monetization strategies, perceptions of YouTube's algorithmic promotion of their content, and conception of their audience. We find that educational YouTubers are largely driven by a desire to share free and high-quality educational content, and that common monetization strategies like sponsorships and clickbait sometimes interfere with this mission. We describe the careful strategies our participants use to maintain educational integrity while making a living on an algorithmically-driven platform. We then use these findings to draw parallels between YouTubers' challenges with monetizing educational content and the history of educational public broadcast in the United States, which has followed a similar trajectory. In closing, we offer several recommendations for supporting educational YouTubers in creating the high-quality, publicly accessible educational content that is appreciated by a worldwide audience.2025TETess Eschebach et al.Content Creation & CreatorsCSCW
"Here the GPT made a choice, and every choice can be biased": How Students Critically Engage with LLMs through End-User Auditing ActivityDespite recognizing that Large Language Models (LLMs) can generate inaccurate or unacceptable responses, universities are increasingly making such models available to their students. Existing university policies defer the responsibility of checking for correctness and appropriateness of LLM responses to students and assume that they will have the required knowledge and skills to do so on their own. In this work, we conducted a series of user studies with students (N=47) from a large North American public research university to understand if and how they critically engage with LLMs. Our participants evaluated an LLM provided by the university in a quasi-experimental setup; first by themselves, and then with a scaffolded design probe that guided them through an end-user auditing exercise. Qualitative analysis of participant think-aloud and LLM interaction data showed that students without basic AI literacy skills struggle to conceptualize and evaluate LLM biases on their own. However, they transition to focused thinking and purposeful interactions when provided with structured guidance. We highlight areas where current university policies may fall short and offer policy and design recommendations to better support students.2025SPSnehal Prabhudesai et al.University of Michigan, Computer Science and EngineeringHuman-LLM CollaborationAlgorithmic Transparency & AuditabilityCHI
Code-ifying the Law: How Disciplinary Divides Afflict the Development of Legal SoftwareProponents of legal automation believe that translating the law into code can improve the legal system. However, research and reporting suggest that legal software systems often contain flawed translations of the law, resulting in serious harms such as terminating children’s healthcare and charging innocent people with fraud. Efforts to identify and contest these mistranslations after they arise treat the symptoms of the problem, but fail to prevent these mistranslations from emerging. Meanwhile, recommendations to improve the development of legal software remain speculative, as there is little empirical evidence about the translation process itself. In this paper, we investigate the behavior of fifteen teams---nine composed of only computer scientists and six of computer scientists and legal experts---as they attempt to translate a bankruptcy statute into software. Through an interpretative qualitative analysis, we characterize a significant epistemic divide between computer science and law and demonstrate that this divide contributes to errors, misunderstandings, and policy distortions in the development of legal software. Even when development teams included legal experts, communication breakdowns meant that the resulting tools predominantly presented incorrect legal advice and adopted inappropriately harsh legal standards. Participants did not recognize the errors in the tools they created. We encourage policymakers and researchers to approach legal software with greater skepticism, as the disciplinary divide between computer science and law creates an endemic source of error and mistranslation in the production of legal software.2024NENel Escher et al.Session 2b: Algorithms in the WorkplaceCSCW
VIME: Visual Interactive Model Explorer for Identifying Capabilities and Limitations of Machine Learning Models for Sequential Decision-MakingEnsuring that Machine Learning (ML) models make correct and meaningful inferences is necessary for the broader adoption of such models into high-stakes decision-making scenarios. Thus, ML model engineers increasingly use eXplainable AI (XAI) tools to investigate the capabilities and limitations of their ML models before deployment. However, explaining sequential ML models, which make a series of decisions at each timestep, remains challenging. We present Visual Interactive Model Explorer (VIME), an XAI toolbox that enables ML model engineers to explain decisions of sequential models in different ``what-if'' scenarios. Our evaluation with 14 ML experts, who investigated two existing sequential ML models using VIME and a baseline XAI toolbox to explore ``what-if'' scenarios, showed that VIME made it easier to identify and explain instances when the models made wrong decisions compared to the baseline. Our work informs the design of future interactive XAI mechanisms for evaluating sequential ML-based decision support systems.2024AAAnindya Das Antar et al.Eye Tracking & Gaze InteractionExplainable AI (XAI)AI-Assisted Decision-Making & AutomationUIST
"I know even if you don't tell me": Understanding Users' Privacy Preferences Regarding AI-based Inferences of Sensitive Information for PersonalizationPersonalization improves user experience by tailoring interactions relevant to each user's background and preferences. However, personalization requires information about users that platforms often collect without their awareness or their enthusiastic consent. Here, we study how the transparency of AI inferences on users' personal data affects their privacy decisions and sentiments when sharing data for personalization. We conducted two experiments where participants (N=877) answered questions about themselves for personalized public arts recommendations. Participants indicated their consent to let the system use their inferred data and explicitly provided data after awareness of inferences. Our results show that participants chose restrictive consent decisions for sensitive and incorrect inferences about them and for their answers that led to such inferences. Our findings expand existing privacy discourse to inferences and inform future directions for shaping existing consent mechanisms in light of increasingly pervasive AI inferences.2024SASumit Asthana et al.University of MichiganAI Ethics, Fairness & AccountabilityAlgorithmic Transparency & AuditabilityPrivacy by Design & User ControlCHI
Behavior Modeling Approach for Forecasting Physical Functioning of People with Multiple Sclerosis"Forecasting physical functioning of people with Multiple Sclerosis (MS) can inform timely clinical interventions and accurate ""day planning"" to improve their well-being. However, people's physical functioning often remains unchecked in between infrequent clinical visits, leading to numerous negative healthcare outcomes. Existing Machine Learning (ML) models trained on in-situ data collected outside of clinical settings (e.g., in people's homes) predict which people are currently experiencing low functioning. However, they do not forecast if and when people's symptoms and behaviors will negatively impact their functioning in the future. Here, we present a computational behavior model that formalizes clinical knowledge about MS to forecast people's end-of-day physical functioning in advance to support timely interventions. Our model outperformed existing ML baselines in a series of quantitative validation experiments. We showed that our model captured clinical knowledge about MS using qualitative visual model exploration in different ""what-if"" scenarios. Our work enables future behavior-aware interfaces that deliver just-in-time clinical interventions and aid in ""day planning"" and ""activity pacing"". https://dl.acm.org/doi/10.1145/3580887"2023AAAnindya Das Antar et al.Human Pose & Activity RecognitionMental Health Apps & Online Support CommunitiesTelemedicine & Remote Patient MonitoringUbiComp
StructureSense: Inferring Constructive Assembly Structures from User BehaviorsRecent advancements in object-tracking technologies can turn mundane constructive assemblies into Tangible User Interfaces (TUI) media. Users rely on instructions or their own creativity to build both permanent and temporary structures out of such objects. However, most existing object-tracking technologies focus on tracking structures as monoliths, making it impossible to infer and track the user's assembly process and the resulting structures. Technologies that can track the assembly process often rely on specially fabricated assemblies, limiting the types of objects and structures they can track. Here, we present StructureSense, a tracking system based on passive UHF-RFID sensing that infers constructive assembly structures from object motion. We illustrated StructureSense in two use cases (as guided instructions and authoring tool) on two different constructive sets (wooden lamp and Jumbo Blocks), and evaluated system performance and usability. Our results showed the feasibility of using StructureSense to track mundane constructive assembly structures. https://dl.acm.org/doi/10.1145/35703432023XHXincheng Huang et al.Customizable & Personalized ObjectsMakerspace CultureUbiComp
Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their TrustTrustworthy Artificial Intelligence (AI) is characterized, among other things, by: 1) competence, 2) transparency, and 3) fairness. However, end-users may fail to recognize incompetent AI, allowing untrustworthy AI to exaggerate its competence under the guise of transparency to gain unfair advantage over other trustworthy AI. Here, we conducted an experiment with 120 participants to test if untrustworthy AI can deceive end-users to gain their trust. Participants interacted with two AI-based chess engines, trustworthy (competent, fair) and untrustworthy (incompetent, unfair), that coached participants by suggesting chess moves in three games against another engine opponent. We varied coaches' transparency about their competence (with the untrustworthy one always exaggerating its competence). We quantified and objectively measured participants' trust based on how often participants relied on coaches' move recommendations. Participants showed inability to assess AI competence by misplacing their trust with the untrustworthy AI, confirming its ability to deceive. Our work calls for design of interactions to help end-users assess AI trustworthiness.2023NBNikola Banovic et al.AI and TrustCSCW
Ludification as a Lens for Algorithmic Management: A Case Study of Gig-Workers' Experiences of Ambiguity in Instacart WorkOn-demand work platforms are attractive alternatives to traditional employment arrangements. However, several questions around employment classification, compensation, data privacy, and equitable outcomes remain open. Fraught regulatory debates are compounded by the abilities of algorithmic management to structure different forms of platform-worker relationships. Understanding the conditions of algorithmic management that result in these variations could point us towards better worker futures. In this work, we studied the platform-worker relationships in Instacart work through the accounts of its workers. From a qualitative analysis of 400 Reddit posts by Instacart's workers, we identified sources of ambiguity that gave rise to open-ended experiences for workers. Ambiguities supplemented gamification mechanisms to regulate worker behaviors. Yet, they also generated positive affective experiences for workers and enabled their playful participation in the Reddit community. We propose the frame of ludification to explain these seemingly contradicting findings and conclude with implications for accountability in on-demand work platforms.2023DRDivya Ramesh et al.Gamification DesignGig Economy PlatformsDIS
Less is Not More: Improving Findability and Actionability of Privacy Controls for Online Behavioral AdvertisingTech companies that rely on ads for business argue that users have control over their data via ad privacy settings. However, these ad settings are often hidden. This work aims to inform the design of findable ad controls and study their impact on users’ behavior and sentiment. We iteratively designed ad control interfaces that varied in the setting's (1) entry point (within ads, at the feed’s top) and (2) level of actionability, with high actionability directly surfacing links to specific advertisement settings, and low actionability pointing to general settings pages (which is reminiscent of companies' current approach to ad controls). We built a Chrome extension that augments Facebook with our experimental ad control interfaces and conducted a between-subjects online experiment with 110 participants. Results showed that entry points within ads or at the feed’s top, and high actionability interfaces, both increased Facebook ad settings’ findability and discoverability, as well as participants' perceived usability of them. High actionability also reduced users' effort in finding ad settings. Participants perceived high and low actionability as equally usable, which shows it is possible to design more actionable ad controls without overwhelming users. We conclude by emphasizing the importance of regulation to provide specific and research-informed requirements to companies on how to design usable ad controls.2023JIJane Im et al.University of MichiganPrivacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Automatically Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing Behaviors Wikipedia articles aim to be definitive sources of encyclopedic content. Yet, only 0.6% of Wikipedia articles have high quality according to its quality scale due to insufficient number of Wikipedia editors and enormous number of articles. Supervised Machine Learning (ML) quality improvement approaches that can automatically identify and fix content issues rely on manual labels of individual Wikipedia sentence quality. However, current labeling approaches are tedious and produce noisy labels. Here, we propose an automated labeling approach that identifies the semantic category (e.g., adding citations, clarifications) of historic Wikipedia edits and uses the modified sentences prior to the edit as examples that require that semantic improvement. Highest-rated article sentences are examples that no longer need semantic improvements. We show that training existing sentence quality classification algorithms on our labels improves their performance compared to training them on existing labels. Our work shows that editing behaviors of Wikipedia editors provide better labels than labels generated by crowdworkers who lack the context to make judgments that the editors would agree with.2021SASumit Asthana et al.Data Work and AICSCW
Method for Exploring Generative Adversarial Networks (GANs) via Automatically Generated Image GalleriesGenerative Adversarial Networks (GANs) can automatically generate quality images from learned model parameters. However, it remains challenging to explore and objectively assess the quality of all possible images generated using a GAN. Currently, model creators evaluate their GANs via tedious visual examination of generated images sampled from narrow prior probability distributions on model parameters. Here, we introduce an interactive method to explore and sample quality images from GANs. Our first two user studies showed that participants can use the tool to explore a GAN and select quality images. Our third user study showed that images sampled from a posterior probability distribution using a Markov Chain Monte Carlo (MCMC) method on parameters of images collected in our first study resulted in on average higher quality and more diverse images than existing baselines. Our work enables principled qualitative GAN exploration and evaluation.2021EZEnhao Zhang et al.University of MichiganGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationInteractive Data VisualizationCHI
Exposing Error in Poverty Management Technology: A Method for Auditing Government Benefits Screening Tools Public benefits programs help people afford necessities like food, housing, and healthcare. In the US, applicants must complete long forms to prove financial distress before receiving aid. Online benefits screening tools provide a gloss of such forms, advising households about their eligibility prior to completing full applications. If incorrectly implemented, they may discourage qualified households from applying for benefits. Unfortunately, errors are difficult to detect because they surface one at a time and difficult to contest because unofficial determinations do not generate a paper trail. We introduce a method for auditing such tools. We generate test households that automatically populate screening questions. To detect errors, we compare the returned determinations to predictions from our formal eligibility guidelines model. Illustrated on a real screening tool with households modeled from census data, our method exposes major errors with corresponding examples to reproduce them. Our work provides a necessary corrective to an already arduous benefits application process.2020NENel Escher et al.Misinformation and TrustCSCW