A Risk Taxonomy and Reflection Tool for LLM Adoption in Public HealthRecent breakthroughs in large language models (LLMs) have generated both interest and concern about their potential adoption as information sources or communication tools across different domains. In public health, where stakes are high and impacts extend across diverse populations, adopting LLMs poses unique challenges that require thorough evaluation. However, structured approaches for assessing potential risks in public health remain under-explored. To address this gap, we conducted focus groups with public health professionals and individuals with lived experience to unpack their concerns, situated across three distinct and critical public health issues that demand high-quality information: infectious disease prevention (vaccines), chronic and well-being care (opioid use disorder), and community health and safety (intimate partner violence). We synthesize participants' perspectives into a risk taxonomy, distinguishing and contextualizing the potential harms LLMs may introduce when positioned alongside traditional health communication. This taxonomy highlights four dimensions of risk to individuals, human-centered care, information ecosystem, and technology accountability. For each dimension, we discuss specific risks and offer example reflection questions to help practitioners adopt a risk-reflexive approach. We discuss the need to revisit pre-existing mental models of help-seeking and complement evaluations with external validity and domain expertise through lived experience and real-world practices. Together, this work contributes a shared vocabulary and reflection tool for people in both computing and public health to collaboratively anticipate, evaluate, and mitigate risks in deciding when to employ LLM capabilities (or not) and how to mitigate harm.2025JZJiawei Zhou et al.Identifying and Mitigating AI RisksCSCW
Understanding User Needs and Attitudes for Privacy Protection Tools in Online Visual Content SharingVisual content shared on social media often includes sensitive elements that can threaten personal privacy. While privacy protection tools---some of which are powered by the state-of-the-art generative AI (Gen-AI) technologies---have been increasingly developed to address such visual privacy concerns by identifying sensitive elements in visual content and suggesting or applying modifications to process the visual content, the success of these tools depends on how well they meet users' nuanced needs and preferences. In this study, we conducted semi-structured interviews with 18 individuals who have either experienced or caused privacy violations in shared visual content in the past to gather first-hand perspectives on stakeholders' privacy concerns, their preferences for how to address these concerns, and their attitude toward the use of generative AI for privacy protection. Our findings highlight that sensitive elements are often not limited to direct identifiers but include contextual combinations and external information that can lead to unintended inferences. Decisions about whether and what to modify are shaped by concerns about privacy effectiveness, content value, content meaning, and emotional or social relevance, while choices around how to modify are influenced by recognition difficulty, visual content integrity, contextual consistency, atmosphere, and usability of modification methods. Participants saw Gen-AI as a promising tool for lowering editing barriers and enhancing creative control but also raised concerns about data usage, manipulation, and transparency. Importantly, we identify tensions between uploaders and depicted individuals, emphasizing the need for shared consent mechanisms and user-centered design in privacy protection. We conclude by discussing design implications for context-aware, flexible, and ethically responsible privacy tools.2025CCChun-Wei Chiang et al.Designing for PrivacyCSCW
agentAR: Creating Augmented Reality Applications with Tool-Augmented LLM-based Autonomous AgentsCreating Augmented Reality (AR) applications requires expertise in both design and implementation, posing significant barriers to entry for non-expert users. While existing methods reduce some of this burden, they often fall short in flexibility or usability for complex or varied use cases. To address this, we introduce agentAR, an AR authoring system that leverages a tool-augmented large language model (LLM)–based autonomous agent to support end-to-end, in-situ AR application creation from natural language input. Built on an application structure and tool library derived from state-of-the-art AR research, the agent autonomously creates AR applications from natural language dialogue. We demonstrate the effectiveness of agentAR through a case study of six AR applications and a user study with twelve participants, showing that it significantly reduces user effort while supporting the creation of diverse and functional AR experiences.2025CZChenfei Zhu et al.AR Navigation & Context AwarenessGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationUIST
Coping with Uncertainty in UX Design Practice: Practitioner Strategies and JudgmentThe complexity of UX design practice extends beyond ill-structured design problems to include uncertainties shaped by shifting stakeholder priorities, team dynamics, limited resources, and implementation constraints. While prior research in related fields has addressed uncertainty in design more broadly, the specific character of uncertainty in UX practice remains underexplored. This study examines how UX practitioners experience and respond to uncertainty in real-world projects, drawing on a multi-week diary study and follow-up interviews with ten designers. We identify a range of practitioner strategies—including adaptive framing, negotiation, and judgment—that allow designers to move forward amid ambiguity. Our findings highlight the central role of design judgment in navigating uncertainty, including emergent forms such as temporal and sacrificial judgment, and extend prior understandings by showing how UX practitioners engage uncertainty as a persistent, situated feature of practice.2025PSPrakash Chandra Shukla et al.User Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingC&C
GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual RealityLarge Language Model (LLM)-based copilots have shown great potential in Extended Reality (XR) applications. However, the user faces challenges when describing the 3D environments to the copilots due to the complexity of conveying spatial-temporal information through text or speech alone. To address this, we introduce GesPrompt, a multimodal XR interface that combines co-speech gestures with speech, allowing end-users to communicate more naturally and accurately with LLM-based copilots in XR environments. By incorporating gestures, GesPrompt extracts spatial-temporal reference from co-speech gestures, reducing the need for precise textual prompts and minimizing cognitive load for end-users. Our contributions include (1) a workflow to integrate gesture and speech input in the XR environment, (2) a prototype VR system that implements the workflow, and (3) a user study demonstrating its effectiveness in improving user communication in VR environments.2025XHXiyun Hu et al.Hand Gesture RecognitionMixed Reality WorkspacesHuman-LLM CollaborationDIS
Designing with Multi-Agent Generative AI: Insights from Industry Early AdoptersIn this paper we present the results of our investigation into how employees at Microsoft, as early adopters of multi-agent generative AI systems, navigate the complexities of designing, testing, and deploying these technologies to extend the organization's product ecosystem. Through interviews with thirteen developers, we uncover the challenges, use cases, and lessons when designing with and for multi-agent AI frameworks. Our analysis reveals how participants leveraged this advanced emerging technology to enhance collaboration, productivity, customer support, creative processes, and security. Key design strategies include managing agent complexity, fostering transparency, and balancing agent autonomy with human oversight, essential considerations for human-agent interaction design. We provide empirical insights into the capabilities and limitations of multi-agent systems in real-world contexts, informing the design of future AI systems that align AI capabilities with human-centered design. By emphasizing first-person experiences and strategies, our research bridges human needs and AI potentials, advancing both the practice and theory of designing with and for AI systems.2025SNSuchismita Naik et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationHome Energy ManagementDIS
DesignFromX:Empowering Consumer-Driven Design Space Exploration through Feature Composition of Referenced ProductsIndustrial products are designed to satisfy the needs of consumers. The rise of generative artificial intelligence (GenAI) enables consumers to easily modify a product by prompting a generative model, opening up opportunities to incorporate consumers in exploring the product design space. However, consumers often struggle to articulate their preferred product features due to their unfamiliarity with terminology and their limited understanding of the structure of product features. We present DesignFromX, a system that empowers consumer-driven design space exploration by helping consumers to design a product based on their preferences. Leveraging an effective GenAI-based framework, the system allows users to easily identify design features from product images and compose those features to generate conceptual images and 3D models of a new product. A user study with 24 participants demonstrates that DesignFromX lowers the barriers and frustration for consumer-driven design space explorations by enhancing both engagement and enjoyment for the participants.2025RDRunlin Duan et al.Generative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsCustomizable & Personalized ObjectsDIS
HEPHA: A Mixed-Initiative Image Labeling Tool for Specialized DomainsImage labeling is an important task for training computer vision models. In specialized domains, such as healthcare, it is expensive and challenging to recruit specialists for image labeling. We propose HEPHA, a mixed-initiative image labeling tool that elicits human expertise via inductive logic learning to infer and refine labeling rules. Each rule comprises visual predicates that describe the image. HEPHA enables users to iteratively refine the rules by either direct manipulation through a visual programming interface or by labeling more images. To facilitate rule refinement, HEPHA recommends which rule to edit and which predicate to update. For users unfamiliar with visual programming, HEPHA suggests diverse and informative images to users for further labeling. We conducted a within-subjects user study with 16 participants and compared HEPHA with a variant of HEPHA and a deep learning-based approach. We found that HEPHA outperforms the two baselines in both specialized-domain and general-domain image labeling tasks. Our code is available at https://github.com/Neural-Symbolic-Image-Labeling/NSILWeb.2025SZXiangyu Zhou et al.Explainable AI (XAI)Interactive Data VisualizationMedical & Scientific Data VisualizationIUI
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small DevicesLarge Language Models (LLMs) have shown remarkable potential in recommending everyday actions as personal AI assistants, while Explainable AI (XAI) techniques are being increasingly utilized to help users understand why a recommendation is given. Personal AI assistants today are often located on ultra-small devices such as smartwatches, which have limited screen space. The verbosity of LLM-generated explanations, however, makes it challenging to deliver glanceable LLM explanations on such ultra-small devices. To address this, we explored 1) spatially structuring an LLM’s explanation text using defined contextual components during prompting and 2) presenting temporally adaptive explanations to users based on confidence levels. We conducted a user study to understand how these approaches impacted user experiences when interacting with LLM recommendations and explanations on ultra-small devices. The results showed that structured explanations reduced users’ time to action and cognitive load when reading an explanation. Always-on structured explanations increased users’ acceptance of AI recommendations. However, users were less satisfied with structured explanations compared to unstructured ones due to their lack of sufficient, readable details. Additionally, adaptively presenting structured explanations was less effective at improving user perceptions of the AI compared to the always-on structured explanations. Together with users' interview feedback, the results led to design implications to be mindful of when personalizing the content and timing of LLM explanations that are displayed on ultra-small devices.2025XWXinru Wang et al.Human-LLM CollaborationExplainable AI (XAI)IUI
Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data AnnotationText-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We observe that the performance of existing text-to-SQL models drops dramatically when applied to a new schema, primarily due to the lack of domain-specific data for fine-tuning. Furthermore, this lack of data for the new schema also hinders our ability to effectively evaluate the model's performance in the new domain. Nevertheless, it is expensive to continuously obtain text-to-SQL data for an evolving schema in most real-world applications. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through collaboration between humans and a large language model in a structured workflow. A within-subject user study comparing SQLsynth to manual annotation and ChatGPT reveals that SQLsynth significantly accelerates text-to-SQL data annotation, reduces cognitive load, and produces datasets that are more accurate, natural, and diverse. Our code is available at https://github.com/adobe/nl_sql_analyzer.2025YTYuan Tian et al.Human-LLM CollaborationAutoML InterfacesIUI
Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge TasksThe potential of using Large Language Models (LLMs) themselves to evaluate LLM outputs offers a promising method for assessing model performance across various contexts. Previous research indicates that LLM-as-a-judge exhibits a strong correlation with human judges in the context of general instruction following. However, for instructions that require specialized knowledge, the validity of using LLMs as judges remains uncertain. In our study, we applied a mixed-methods approach, conducting pairwise comparisons in which both subject matter experts (SMEs) and LLMs evaluated outputs from domain-specific tasks. We focused on two distinct fields: dietetics, with registered dietitian experts, and mental health, with clinical psychologist experts. Our results showed that SMEs agreed with LLM judges 68% of the time in the dietetics domain and 64% in mental health when evaluating overall preference. Additionally, the results indicated variations in SME-LLM agreement across domain-specific aspect questions. Our findings emphasize the importance of keeping human experts in the evaluation process, as LLMs alone may not provide the depth of understanding required for complex, knowledge specific tasks. We also explore the implications of LLM evaluations across different domains and discuss how these insights can inform the design of evaluation workflows that ensure better alignment between human experts and LLMs in interactive systems.2025ASAnnalisa Szymanski et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationIUI
WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive StorytellingGenerative AI significantly enhances player agency in interactive narratives (IN) by enabling just-in-time content generation that adapts to player actions. While delegating generation to AI makes IN more interactive, it becomes challenging for authors to control the space of possible narratives - within which the final story experienced by the player emerges from their interaction with AI. In this paper, we present WhatELSE, an AI-bridged IN authoring system that creates narrative possibility spaces from example stories. WhatELSE provides three views (narrative pivot, outline, and variants) to help authors understand the narrative space and corresponding tools leveraging linguistic abstraction to control the boundaries of the narrative space. Taking innovative LLM-based narrative planning approaches, WhatELSE further unfolds the narrative space into executable game events. Through a user study (N=12) and technical evaluations, we found that WhatELSE enables authors to perceive and edit the narrative space and generates engaging interactive narratives at play-time.2025ZLZhuoran Lu et al.Autodesk Research; Purdue University, Computer ScienceGenerative AI (Text, Image, Music, Video)AI-Assisted Creative WritingInteractive Narrative & Immersive StorytellingCHI
Explanations Help: Leveraging Human Capabilities to Detect Cyberattacks on Automated VehiclesExisting defense strategies against cyberattacks on automated vehicles (AVs) often overlook the great potential of humans in detecting such attacks. To address this, we identified three types of human-detectable attacks targeting transportation infrastructure, AV perception modules, and AV execution modules. We proposed two types of displays: Alert and Alert plus Explanations (AlertExp), and conducted an online video survey study involving 260 participants to systematically evaluate the effectiveness of these displays across cyberattack types. Results showed that AV execution module attacks were the hardest to detect and understand, but AlertExp displays mitigated this difficulty. In contrast, AV perception module attacks were the easiest to detect, while infrastructure attacks resulted in the highest post-attack trust in the AV system. Although participants were prone to false alarms, AlertExp displays mitigated their negative impacts, whereas Alert displays performed worse than having no display. Overall, AlertExp displays are recommended to enhance human detection of cyberattacks.2025YDYao Ding et al.University of Pittsburgh, School of Computing and InformationExplainable AI (XAI)Privacy by Design & User ControlCHI
AdaptiveSliders: User-aligned Semantic Slider-based Editing of Text-to-Image Model OutputPrecise editing of text-to-image model outputs remains challenging. Slider-based editing is a recent approach wherein the image’s semantic attributes are manipulated via sliders. However, it has significant user-centric issues. First, slider variations are often inconsistent across the sliding range. Second, the optimal slider range is unpredictable, with default values often being too large or small depending on the prompt and attribute. Third, manipulating one attribute can unintentionally alter others due to the complex entanglement of latent spaces. We introduce AdaptiveSliders, a tool that addresses these challenges by adapting to the specific attributes and prompts, generating consistent slider variations and optimal bounds while minimizing unintended changes. AdaptiveSliders also suggests initial attributes and generates initial images more aligned with prompt semantics. Through three validation studies and one end-to-end user study, we demonstrate that AdaptiveSliders significantly improves user control and experience, enabling semantic slider-based editing aligned with user needs and expectations.2025RJRahul Jain et al.Purdue University, Department of Electrical and Computer EngineeringGenerative AI (Text, Image, Music, Video)Explainable AI (XAI)CHI
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered AnalysisAI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI's decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people's AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers' appropriate reliance on AI in AI-assisted decision making.2025ZLZhuoyan Li et al.Purdue universityHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationAlgorithmic Fairness & BiasCHI
CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial IntelligenceContext-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI.2025JSJingyu Shi et al.Purdue University, Elmore Family School of Electrical and Computer EngineeringAR Navigation & Context AwarenessGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
Dango: A Mixed-Initiative Data Wrangling System using Large Language ModelData wrangling is a time-consuming and challenging task in the early stages of a data science pipeline. However, existing tools often fail to effectively interpret user intent. We propose Dango, a mixed-initiative multi-agent system that helps users generate data wrangling scripts. Compared to existing tools, Dango enhances user communication of intent by: (1) allowing users to demonstrate on multiple tables and use natural language prompts in a conversation interface, (2) enabling users to clarify their intent by answering LLM-posed multiple-choice clarification questions, and (3) providing multiple forms of feedback such as step-by-step NL explanations and data provenance to help users evaluate the data wrangling scripts. In a within-subjects, think-aloud study (n=38), the results show that Dango's features can significantly improve intent clarification, accuracy, and efficiency in data wrangling tasks.2025WCWei-Hao Chen et al.Purdue University, Computer ScienceHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationInteractive Data VisualizationCHI
"Ultimately, it's a matter of safety, and resisting ostracization": Understanding Neurodivergent Masking with Online CommunitiesNeurotypical modes of existence and interaction are enforced through traditional social norms, compelling individuals who diverge from these norms, such as those who are neurodivergent, to conform through ``masking.'' Technology research and design often also ascribe to these conventional norms, creating technology that reinforces neurodivergent people's need to mask. In this research, we turn to neurodivergent communities online to develop an understanding of masking behaviors. We adopt a two-tiered research approach consisting of a qualitative thematic analysis of TikTok videos and a survey questionnaire. Through this work, we initiate discussion on the complexities of neurodivergent masking as a pervasive social adaptation. We urge HCI researchers to critically reframe intervention design and research practices that may either perpetuate or seek to address masking.2025KKKritika Kritika et al.University of California, Santa Cruz, Computational Media DepartmentCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Empowerment of Marginalized GroupsTechnology Ethics & Critical HCICHI
"Kya family planning after marriage hoti hai?": Integrating Cultural Sensitivity in an LLM Chatbot for Reproductive HealthAccess to sexual and reproductive health information remains a challenge in many communities globally, due to cultural taboos and limited availability of healthcare providers. Public health organizations are increasingly turning to Large Language Models (LLMs) to improve access to timely and personalized information. However, recent HCI scholarship indicates that significant challenges remain in incorporating context awareness and mitigating bias in LLMs. In this paper, we study the development of a culturally-appropriate LLM-based chatbot for reproductive health with underserved women in urban India. Through user interactions, focus groups, and interviews with multiple stakeholders, we examine the chatbot’s response to sensitive and highly contextual queries on reproductive health. Our findings reveal strengths and limitations of the system in capturing local context, and complexities around what constitutes ``culture''. Finally, we discuss how local context might be better integrated, and present a framework to inform the design of culturally-sensitive chatbots for community health.2025RDRoshini Deva et al.Emory University, Biomedical InformaticsHuman-LLM CollaborationReproductive & Women's HealthCHI
Transparent Barriers: Natural Language Access Control Policies for XR-Enhanced Everyday ObjectsExtended Reality (XR)-enabled headsets that overlay digital content onto the physical world, are gradually finding their way into our daily life. This integration raises significant concerns about privacy and access control, especially in shared spaces where XR applications interact with everyday objects. Such issues remain subtle in the absence of widespread applications of XR and studies in shared spaces are required for a smooth progress. This study evaluated a prototype system facilitating natural language policy creation for flexible, context-aware access control of personal objects. We assessed its usability, focusing on balancing precision and user effort in creating access control policies. Qualitative interviews and task-based interactions provided insights into users' preferences and behaviors, informing future design directions. Findings revealed diverse user needs for controlling access to personal items in various situations, emphasizing the need for flexible, user-friendly access control in XR-enhanced shared spaces that respects boundaries and considers social contexts.2025KTKentaro Taninaka et al.Keio University, Graduate School of Media and GovernanceAR Navigation & Context AwarenessPrivacy by Design & User ControlCHI