"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practicesLarge language models are increasingly applied in real-world scenarios, including research and education. These models, however, come with well-known ethical issues, which may manifest in unexpected ways in human-computer interaction research due to the extensive engagement with human subjects. This paper reports on research practices related to LLM use, drawing on 16 semi-structured interviews and a survey conducted with 50 HCI researchers. We discuss the ways in which LLMs are already being utilized throughout the entire HCI research pipeline, from ideation to system development and paper writing. While researchers described nuanced understandings of ethical issues, they were rarely or only partially able to identify and address those ethical concerns in their own projects. This lack of action and reliance on workarounds was explained through the perceived lack of control and distributed responsibility in the LLM supply chain, the conditional nature of engaging with ethics, and competing priorities. Finally, we reflect on the implications of our findings and present opportunities to shape emerging norms of engaging with large language models in HCI research.2025SKShivani Kapania et al.Responsible and Ethical AICSCW
Why am I seeing this: Democratizing End User Auditing for Online Content RecommendationsPersonalized recommendation systems tailor content based on user attributes, which are either provided or inferred from private data. Research suggests that users often hypothesize about reasons behind contents they encounter (e.g., ``I see this jewelry ad because I am a woman''), but they lack the means to confirm these hypotheses due to the opaqueness of these systems. This hinders informed decision-making about privacy and system use and contributes to the lack of algorithmic accountability. To address these challenges, we introduce a new interactive sandbox approach. This approach creates sets of synthetic user personas and corresponding personal data that embody realistic variations in personal attributes, allowing users to test their hypotheses by observing how a website's algorithms respond to these personas. We tested the sandbox in the context of targeted advertisement. Our user study demonstrates its usability, usefulness, and effectiveness in empowering end-user auditing in a case study of targeting ads.2025CCChaoran Chen et al.Explainable AI (XAI)AI Ethics, Fairness & AccountabilityAlgorithmic Transparency & AuditabilityUIST
GLITTER: An AI-assisted Platform for Material-Grounded Asynchronous Discussion in Flipped LearningFlipped classrooms promote active learning by having students engage with materials independently before class, allowing in-class time for collaborative problem-solving. During this pre-class phase, asynchronous online discussions help students build knowledge and clarify concepts with peers. However, it remains difficult to engage with temporally dispersed peer contributions, connect discussions with static learning materials, and prepare for in-class sessions based on their self-learning outcome. Our formative study identified cognitive challenges students encounter, including navigation barriers, reflection gaps, and contribution difficulty and anxiety. We present GLITTER, an AI-assisted discussion platform for pre-class learning in flipped classrooms. GLITTER helps students identify posts with shared conceptual dimensions, scaffold knowledge integration through conceptual blending, and enhance metacognition via personalized reflection reports. A lab study within subjects (n = 12) demonstrates that GLITTER improves discussion engagement, sparks new ideas, supports reflection, and increases preparedness for in-class activities.2025WPWeirui Peng et al.K-12 Digital Education ToolsOnline Learning & MOOC PlatformsCollaborative Learning & Peer TeachingUIST
AROMA: Mixed-Initiative AI Assistance for Non-Visual Cooking by Grounding Multimodal Information Between Reality and VideosVideos offer rich audiovisual information that can support people in performing activities of daily living (ADLs), but they remain largely inaccessible to blind or low-vision (BLV) individuals. In cooking, BLV people often rely on non-visual cues---such as touch, taste, and smell---to navigate their environment, making it difficult to follow the predominantly audiovisual instructions found in video recipes. To address this problem, we introduce AROMA, an AI system that provides timely responses to the user based on real-time, context-aware assistance by integrating non-visual cues perceived by the user, a wearable camera feed, and video recipe content. AROMA uses a mixed-initiative approach: it responds to user requests while also proactively monitoring the video stream to offer timely alerts and guidance. This collaborative design leverages the complementary strengths of the user and AI system to align the physical environment with the video recipe, helping the user interpret their current cooking state and make sense of the steps. We evaluated AROMA through a study with eight BLV participants and offered insights for designing interactive AI systems to support BLV individuals in performing ADLs.2025ZNZheng Ning et al.Conversational ChatbotsVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Context-Aware ComputingUIST
Express What I Think: The Impact of External Human-Machine Interfaces on the Performance of Lane Change ManeuversLane change is a complex behaviour involving subtle interactions among road users. Providing external human-machine interfaces (eHMIs) may improve the safety of lane-changing events. However, previous studies on eHMI mostly focused on the interaction between autonomous vehicles and pedestrians. As a first attempt, we investigated the yielding behaviour of drivers in lane-changing scenarios when different kinds of eHMIs regarding the intentions of the cutting-in vehicles (i.e., command, polite and explanatory) are provided. In a driving simulation experiment with 32 participants, we found that all three eHMIs increased yielding rates and minimum time to collision (minTTC) compared to the baseline condition without eHMI, with the polite eHMI yielding the best results. Regarding subjective evaluation, polite eHMIs were also perceived as having the highest usability. This study underscores the effectiveness of explicitly expressing lane-changing intentions through eHMIs and demonstrates that the eHMI design can influence driver behaviour, usability perception, and traffic safety.2025RLRuolan LI et al.External HMI (eHMI) — Communication with Pedestrians & CyclistsV2X (Vehicle-to-Everything) Communication DesignAutoUI
CLEAR: Towards Contextual LLM-Empowered Privacy Policy Analysis and Risk Generation for Large Language Model ApplicationsThe rise of end-user applications powered by large language models (LLMs), including both conversational interfaces and add-ons to existing graphical user interfaces (GUIs), introduces new privacy challenges. However, many users remain unaware of the risks. This paper explores methods to increase user awareness of privacy risks associated with LLMs in end-user applications. We conducted five co-design workshops to uncover user privacy concerns and their demand for contextual privacy information within LLMs. Based on these insights, we developed CLEAR (Contextual LLM-Empowered Privacy Policy Analysis and Risk Generation), a just-in-time contextual assistant designed to help users identify sensitive information, summarize relevant privacy policies, and highlight potential risks when sharing information with LLMs. We evaluated the usability and usefulness of CLEAR across two example domains: ChatGPT and the Gemini plugin in Gmail. Our findings demonstrated that CLEAR is easy to use and improves users' understanding of data practices and privacy risks. We also discussed LLM's duality in posing and mitigating privacy risks, offering design and policy implications.2025CCChaoran Chen et al.Generative AI (Text, Image, Music, Video)AI Ethics, Fairness & AccountabilityPrivacy by Design & User ControlIUI
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language ModelsRecent advancements in Large Language Models (LLMs) have made them a popular information-seeking tool among end users. However, the statistical training methods for LLMs have raised concerns about their representation of under-represented topics, potentially leading to biases that could influence real-world decisions and opportunities. These biases could have significant economic, social, and cultural impacts as LLMs become more prevalent, whether through direct interactions—such as when users engage with chatbots or automated assistants—or through their integration into third-party applications (as agents), where the models influence decision-making processes and functionalities behind the scenes. Our study examines the biases present in LLMs recommendations of U.S. cities and towns across three domains: relocation, tourism, and starting a business. We explore two key research questions: (i) How similar LLMs responses are, and (ii) How this similarity might favor areas with certain characteristics over others, introducing biases. We focus on the consistency of LLMs responses and their tendency to over-represent or under-represent specific locations. Our findings point to consistent demographic biases in these recommendations, which could perpetuate a "rich-get-richer" effect that widens existing economic disparities.2025SDShiran Dudy et al.Human-LLM CollaborationAI Ethics, Fairness & AccountabilityAlgorithmic Transparency & AuditabilityIUI
Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge TasksThe potential of using Large Language Models (LLMs) themselves to evaluate LLM outputs offers a promising method for assessing model performance across various contexts. Previous research indicates that LLM-as-a-judge exhibits a strong correlation with human judges in the context of general instruction following. However, for instructions that require specialized knowledge, the validity of using LLMs as judges remains uncertain. In our study, we applied a mixed-methods approach, conducting pairwise comparisons in which both subject matter experts (SMEs) and LLMs evaluated outputs from domain-specific tasks. We focused on two distinct fields: dietetics, with registered dietitian experts, and mental health, with clinical psychologist experts. Our results showed that SMEs agreed with LLM judges 68% of the time in the dietetics domain and 64% in mental health when evaluating overall preference. Additionally, the results indicated variations in SME-LLM agreement across domain-specific aspect questions. Our findings emphasize the importance of keeping human experts in the evaluation process, as LLMs alone may not provide the depth of understanding required for complex, knowledge specific tasks. We also explore the implications of LLM evaluations across different domains and discuss how these insights can inform the design of evaluation workflows that ensure better alignment between human experts and LLMs in interactive systems.2025ASAnnalisa Szymanski et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationIUI
LADICA: A Large Shared Display Interface for Generative AI Cognitive Assistance in Co-located Team CollaborationLarge shared displays, such as digital whiteboards, are useful for supporting co-located team collaborations by helping members perform cognitive tasks such as brainstorming, organizing ideas, and making comparisons. While recent advancement in Large Language Models (LLMs) has catalyzed AI support for these displays, most existing systems either only offer limited capabilities or diminish human control, neglecting the potential benefits of natural group dynamics. Our formative study identified cognitive challenges teams encounter, such as diverse ideation, knowledge sharing, mutual awareness, idea organization, and synchronization of live discussions with the external workspace. In response, we introduce LADICA, a large shared display interface that helps collaborative teams brainstorm, organize, and analyze ideas through multiple analytical lenses, while fostering mutual awareness of ideas and concepts. Furthermore, LADICA facilitates the real-time extraction of key information from verbal discussions and identifies relevant entities. A lab study confirmed LADICA's usability and usefulness.2025ZZZheng Zhang et al.University of Notre Dame, Department of Computer Science and EngineeringHuman-LLM CollaborationRemote Work Tools & ExperienceCHI
From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task AutomationTraditional Programming by Demonstration (PBD) systems primarily automate tasks by recording and replaying operations on Graphical User Interfaces (GUIs), without fully considering the cognitive processes behind operations. This limits their ability to generalize tasks with interdependent operations to new contexts (e.g. collecting and summarizing introductions depending on different search keywords from varied websites). We propose TaskMind, a system that automatically identifies the semantics of operations, and the cognitive dependencies between operations from demonstrations, building a user-interpretable task graph. Users modify this graph to define new task goals, and TaskMind executes the graph to dynamically generalize new parameters for operations, with the integration of Large Language Models (LLMs). We compared TaskMind with a baseline end-to-end LLM which automates tasks from demonstrations and natural language commands, without task graph. In studies with 20 participants on both predefined and customized tasks, TaskMind significantly outperforms the baseline in both success rate and controllability.2025YYYiwen Yin et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
Hashtag Re-Appropriation for Audience Control on Recommendation-Driven Social Media Xiaohongshu (rednote)Algorithms have played a central role in personalized recommendations on social media. However, they also present significant obstacles for content creators trying to predict and manage their audience reach. This issue is particularly challenging for marginalized groups seeking to maintain safe spaces. Our study explores how women on Xiaohongshu (rednote), a recommendation-driven social platform, proactively re-appropriate hashtags (e.g., #宝宝辅食, Baby Supplemental Food) by using them in posts unrelated to their literal meaning. The hashtags were strategically chosen from topics that would be uninteresting to the male audience they wanted to block. Through a mixed-methods approach, we analyzed the practice of hashtag re-appropriation based on 5,800 collected posts and interviewed 24 active users from diverse backgrounds to uncover users' motivations and reactions towards the re-appropriation. This practice highlights how users can reclaim agency over content distribution on recommendation-driven platforms, offering insights into self-governance within algorithmic-centered power structures.2025RWRuyuan Wan et al.Pennsylvania State University, College of Information Sciences and TechnologySocial Platform Design & User BehaviorContent Moderation & Platform GovernanceCHI
Supporting Co-Adaptive Machine Teaching through Human Concept Learning and Cognitive TheoriesAn important challenge in interactive machine learning, particularly in subjective or ambiguous domains, is fostering bi-directional alignment between humans and models. Users teach models their concept definition through data labeling, while refining their own understandings throughout the process. To facilitate this, we introduce MOCHA, an interactive machine learning tool informed by two theories of human concept learning and cognition. First, it utilizes a neuro-symbolic pipeline to support Variation Theory-based counterfactual data generation. By asking users to annotate counterexamples that are syntactically and semantically similar to already-annotated data but predicted to have different labels, the system can learn more effectively while helping users understand the model and reflect on their own label definitions. Second, MOCHA uses Structural Alignment Theory to present groups of counterexamples, helping users comprehend alignable differences between data items and annotate them in batch. We validated MOCHA's effectiveness and usability through a lab study with 18 participants.2025SGSimret Araya Gebreegziabher et al.University of Notre Dame, Department of Computer Science and EngineeringExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
Satori 悟り: Towards Proactive AR Assistant with Belief-Desire-Intention User ModelingAugmented Reality (AR) assistance is increasingly used for supporting users with physical tasks like assembly and cooking. However, most systems rely on reactive responses triggered by user input, overlooking rich contextual and user-specific information. To address this, we present Satori, a novel AR system that proactively guides users by modeling both -- their mental states and environmental contexts. Satori integrates the Belief-Desire-Intention (BDI) framework with the state-of-the-art multi-modal large language model (LLM) to deliver contextually appropriate guidance. Our system is designed based on two formative studies involving twelve experts. We evaluated the system with a sixteen within-subject study and found that Satori matches the performance of designer-created Wizard-of-Oz (WoZ) systems, without manual configurations or heuristics, thereby improving generalizability, reusability, and expanding the potential of AR assistance. Code is available at https://github.com/VIDA-NYU/satori-assistance.2025CLToby Jia-Jun Li et al.New York University, Tandon School of EngineeringAR Navigation & Context AwarenessHuman-LLM CollaborationContext-Aware ComputingCHI
From Awareness to Action: Exploring End-User Empowerment Interventions for Dark Patterns in UXThe study of UX dark patterns, i.e., UI designs that seek to manipulate user behaviors, often for the benefit of online services, has drawn significant attention in the CHI and CSCW communities in recent years. To complement previous studies in addressing dark patterns from (1) the designer’s perspective on education and advocacy for ethical designs; and (2) the policymaker’s perspective on new regulations, we propose an end-user-empowerment intervention approach that helps users (1) raise the awareness of dark patterns and understand their underlying design intents; (2) take actions to counter the effects of dark patterns using a web augmentation approach. Through a two-phase co-design study, including 5 co-design workshops (N=12) and a 2-week technology probe study (N=15), we reported findings on the understanding of users' needs, preferences, and challenges in handling dark patterns and investigated the feedback and reactions to users' awareness of and action on dark patterns being empowered in a realistic in-situ setting.2024YLYuwen Lu et al.Session 2c: Protecting Users: Legislative Insights, Dark Patterns, and CybersecurityCSCW
SQLucid: Grounding Natural Language Database Queries with Interactive ExplanationsThough recent advances in machine learning have led to significant improvements in natural language interfaces for databases, the accuracy and reliability of these systems remain limited, especially in high-stakes domains. This paper introduces SQLucid, a novel user interface that bridges the gap between non-expert users and complex database querying processes. SQLucid addresses existing limitations by integrating visual correspondence, intermediate query results, and editable step-by-step SQL explanations in natural language to facilitate user understanding and engagement. This unique blend of features empowers users to understand and refine SQL queries easily and precisely. Two user studies and one quantitative experiment were conducted to validate SQLucid’s effectiveness, showing significant improvement in task completion accuracy and user confidence compared to existing interfaces. Our code is available at https://github.com/magic-YuanTian/SQLucid.2024YTYuan Tian et al.Explainable AI (XAI)AI-Assisted Decision-Making & AutomationUIST
MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on VideosSpatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only monaural or stereo audio, MIMOSA automatically grounds each sound source to the corresponding sounding object in the visual scene and enables users to further validate and fix the errors in the locations of sounding objects. Users can also augment the spatial audio effect by flexibly manipulating the sounding source positions and creatively customizing the audio effect. The design of MIMOSA exemplifies a human-AI collaboration approach that, instead of utilizing state-of-art end-to-end "black-box" ML models, uses a multistep pipeline that aligns its interpretable intermediate results with the user’s workflow. A lab user study with 15 participants demonstrates MIMOSA’s usability, usefulness, expressiveness, and capability in creating immersive spatial audio effects in collaboration with users.2024ZNZheng Ning et al.Generative AI (Text, Image, Music, Video)Music Composition & Sound Design ToolsCreative Collaboration & Feedback SystemsC&C
Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-CreationThanks to their generative capabilities, large language models (LLMs) have become an invaluable tool for creative processes. These models have the capacity to produce hundreds and thousands of visual and textual outputs, offering abundant inspiration for creative endeavors. But are we harnessing their full potential? We argue that current interaction paradigms fall short, guiding users towards rapid convergence on a limited set of ideas, rather than empowering them to explore the vast latent design space in generative models. To address this limitation, we propose a framework that facilitates the structured generation of design space in which users can seamlessly explore, evaluate, and synthesize a multitude of responses. We demonstrate the feasibility and usefulness of this framework through the design and development of an interactive system, Luminate, and a user study with 14 professional writers. Our work advances how we interact with LLMs for creative tasks, introducing a way to harness the creative potential of LLMs.2024SSSangho Suh et al.University of California, San DiegoHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision ViewersBlind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.2024ZNZheng Ning et al.University of Notre DameVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
CollabCoder: A Lower-barrier, Rigorous Workflow for Inductive Collaborative Qualitative Analysis with Large Language ModelsCollaborative Qualitative Analysis (CQA) can enhance qualitative analysis rigor and depth by incorporating varied viewpoints. Nevertheless, ensuring a rigorous CQA procedure itself can be both complex and costly. To lower this bar, we take a theoretical perspective to design a one-stop, end-to-end workflow, CollabCoder, that integrates Large Language Models (LLMs) into key inductive CQA stages. In the independent open coding phase, CollabCoder offers AI-generated code suggestions and records decision-making data. During the iterative discussion phase, it promotes mutual understanding by sharing this data within the coding team and using quantitative metrics to identify coding (dis)agreements, aiding in consensus-building. In the codebook development phase, CollabCoder provides primary code group suggestions, lightening the workload of developing a codebook from scratch. A 16-user evaluation confirmed the effectiveness of CollabCoder, demonstrating its advantages over the existing CQA platform. All related materials of CollabCoder, including code and further extensions, will be included in: https://gaojie058.github.io/CollabCoder/.2024JGJie Gao et al.Singapore University of Technology and DesignHuman-LLM CollaborationUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI
An Empathy-Based Sandbox Approach to Bridge the Privacy Gap among Attitudes, Goals, Knowledge, and BehaviorsManaging privacy to reach privacy goals is challenging, as evidenced by the privacy attitude-behavior gap. Mitigating this discrepancy requires solutions that account for both system opaqueness and users' hesitations in testing different privacy settings due to fears of unintended data exposure. We introduce an empathy-based approach that allows users to experience how privacy attributes may alter system outcomes in a risk-free sandbox environment from the perspective of artificially generated personas. To generate realistic personas, we introduce a novel pipeline that augments the outputs of large language models (e.g., GPT-4) using few-shot learning, contextualization, and chain of thoughts. Our empirical studies demonstrated the adequate quality of generated personas and highlighted the changes in privacy-related applications (e.g., online advertising) caused by different personas. Furthermore, users demonstrated cognitive and emotional empathy towards the personas when interacting with our sandbox. We offered design implications for downstream applications in improving user privacy literacy.2024CCChaoran Chen et al.University of Notre DameExplainable AI (XAI)Algorithmic Transparency & AuditabilityPrivacy by Design & User ControlCHI