Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning ModelsSupervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.2025LNLynnette Hui Xian Ng et al.Communicating properly, interpreting signsCSCW
PosterMate: Audience-driven Collaborative Persona Agents for Poster DesignPoster designing can benefit from synchronous feedback from target audiences. However, gathering audiences with diverse perspectives and reconciling them on design edits can be challenging. Recent generative AI models present opportunities to simulate human-like interactions, but it is unclear how they may be used for feedback processes in design. We introduce PosterMate, a poster design assistant that facilitates collaboration by creating audience-driven persona agents constructed from marketing documents. PosterMate gathers feedback from each persona agent regarding poster components, and stimulates discussion with the help of a moderator to reach a conclusion. These agreed-upon edits can then be directly integrated into the poster design. Through our user study (N=12), we identified the potential of PosterMate to capture overlooked viewpoints, while serving as an effective prototyping tool. Additionally, our controlled online evaluation (N=100) revealed that the feedback from an individual persona agent is appropriate given its persona identity, and the discussion effectively synthesizes the different persona agents' perspectives.2025DSDonghoon Shin et al.AI-Assisted Creative WritingCreative Collaboration & Feedback SystemsUIST
MapStory: Prototyping Editable Map Animations with LLM AgentsWe introduce MapStory, an LLM‑powered animation prototyping tool that generates editable map animation sequences directly from natural language text by leveraging a dual-agent LLM architecture. Given a user-written script, MapStory automatically produces a scene breakdown, which decomposes the text into key map animation primitives such as camera movements, visual highlights, and animated elements. Our system includes a researcher agent that accurately queries geospatial information by leveraging an LLM with web search, enabling automatic extraction of relevant regions, paths, and coordinates while allowing users to edit and query for changes or additional information to refine the results. Additionally, users can fine-tune parameters of these primitive blocks through an interactive timeline editor. We detail the system’s design and architecture, informed by formative interviews with professional animators and by an analysis of 200 existing map animation videos. Our evaluation, which includes expert interviews (N=5), and a usability study (N=12), demonstrates that MapStory enables users to create map animations with ease, facilitates faster iteration, encourages creative exploration, and lowers barriers to creating map-centric stories.2025AGAditya Gunturu et al.Geospatial & Map VisualizationComputational Methods in HCIUIST
Morae: Proactively Pausing UI Agents for User ChoicesUser interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.2025YPYi-Hao Peng et al.Intelligent Voice Assistants (Alexa, Siri, etc.)Voice AccessibilityUIST
Refashion: Reconfigurable Garments via Modular DesignWhile bodies change over time and trends vary, most store-bought clothing comes in fixed sizes and styles and fails to adapt to these changes. Alterations can enable small changes to otherwise static garments, but these changes often require sewing and are non-reversible. We propose a modular approach to garment design that considers resizing, restyling, and reusability earlier in the clothing design process. Our contributions include a compact set of modules and connectors that form the building blocks of modular garments, a method to decompose a garment into modules via integer linear programming, and a digital design tool that supports modular garment design and simulation. Our user evaluation suggests that our approach to modular clothing design can support the creation of a wide range of garments and can help users transform clothing into different sizes and styles while reusing the same building blocks.2025RLRebecca Lin et al.Customizable & Personalized ObjectsDesign FictionUIST
OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language ModelsAs multi-turn dialogues with large language models (LLMs) grow longer and more complex, how can users better evaluate and review progress on their conversational goals? We present OnGoal, an LLM chat interface that helps users better manage goal progress. OnGoal provides real-time feedback on goal alignment through LLM-assisted evaluation, explanations for evaluation results with examples, and overviews of goal progression over time, enabling users to navigate complex dialogues more effectively. Through a study with 20 participants on a writing task, we evaluate OnGoal against a baseline chat interface without goal tracking. Using OnGoal, participants spent less time and effort to achieve their goals while exploring new prompting strategies to overcome miscommunication, suggesting tracking and visualizing goals can enhance engagement and resilience in LLM dialogues. Our findings inspired design implications for future LLM chat interfaces that improve goal communication, reduce cognitive load, enhance interactivity, and enable feedback to improve LLM performance.2025ACAdam J Coscia et al.Human-LLM CollaborationData StorytellingUIST
Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM ServingGenerative conversational interfaces powered by large language models (LLMs) typically stream output token-by-token at a rate determined by computational budget, often neglecting actual human reading speeds and the cognitive load associated with the content. This mismatch frequently leads to inefficient use of computational resources. For example, in cloud-based services, streaming content faster than users can read appears unnecessary, resulting in wasted computational resources and potential delays for other users, particularly during peak usage periods. To address this issue, we propose an adaptive streaming method that dynamically adjusts the pacing of LLM streaming output in real-time based on inferred cognitive load. Our approach estimates the cognitive load associated with streaming content and strategically slows down the stream during complex or information-rich segments, thereby freeing computational resources for other users. We conducted a statistical analysis and simulation based on a statistical model derived from data collected in a crowdsourced user study across various types of LLM-generated content. Our results show that this adaptive method can effectively reduce computational consumption while largely maintaining streaming speed above user's normal reading speed.2025CXChang Xiao et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationUIST
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at ScaleAs AI agents increasingly rely on memory systems to align with user intent, updating these memories presents challenges of semantic conflict and ambiguity. Inspired by impact analysis in software engineering, we introduce SemanticCommit, a mixed-initiative interface to help users integrate new intent into intent specifications—natural language documents like AI memory lists, Cursor Rules, and game design documents—while maintaining consistency. SemanticCommit detects potential semantic conflicts using a knowledge graph-based retrieval-augmented generation pipeline, and assists users in resolving them with LLM support. Through a within-subjects study with 12 participants comparing SemanticCommit to a chat-with-document baseline (OpenAI Canvas), we find differences in workflow: half of our participants adopted a workflow of impact analysis when using SemanticCommit, where they would first flag conflicts without AI revisions then resolve conflicts locally, despite having access to a global revision feature. Additionally, users felt SemanticCommit offered a greater sense of control without increasing workload. Our findings indicate that AI agent interfaces should help users validate AI retrieval independently from generation, suggesting that the benefits from improved control can offset the costs of manual review. Our work speaks to the need for AI system designers to think about updating memory as a process that involves human feedback and decision-making.2025PVPriyan Vaithilingam et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationAlgorithmic Transparency & AuditabilityUIST
“It’s more of a vibe I’m going for”: Designing Text-to-Music Generation Interfaces for Video CreatorsBackground music plays a crucial role in social media videos, yet finding the right music remains a challenge for video creators. These creators, often not music experts, struggle to describe their musical goals and compare options. AI text-to-music generation presents an opportunity to address these challenges by allowing users to generate music through text prompts; however, these models often require musical expertise and are difficult to control. In this paper, we explore how to incorporate music generation into video editing workflows. A formative study with video creators revealed challenges in articulating and iterating on musical preferences, as creators described music as "vibes" rather than with explicit musical vocabulary. Guided by these insights, we developed a creative assistant for music generation using editable vibe-based recommendations and structured refinement of music output. A user study showed that the assistant supports exploration, while direct prompting is more effective for precise goals. Our findings offer design recommendations for AI music tools for video creators.2025NHNoor Hammad et al.Generative AI (Text, Image, Music, Video)Music Composition & Sound Design ToolsVideo Production & EditingDIS
Narrative Motion Blocks: Combining Direct Manipulation and Natural Language Interactions for Animation CreationAuthoring compelling animations often requires artists to come up with creative high-level ideas and translate them into precise low-level spatial and temporal properties like position, orientation, scale, and frame timing. Traditional animation tools offer direct manipulation strategies to control these properties but lack support for implementing higher-level ideas. Alternatively, AI-based tools allow animation production using natural language prompts but lack the fine-grained control over properties required for professional workflows. To bridge this gap, we propose AniMate, a hand-drawn animation system that integrates direct manipulation and natural language interaction. Central to AniMate are narrative motion blocks, clip-like components located on a timeline that let animators specify animated behaviors with a combination of textual and manual input. Through an expert evaluation and the creation of short demonstrative animations, we show how focusing on intermediate-level actions provides a common representation for animators to work across both interaction modalities.2025SBSam Bourgault et al.AI-Assisted Creative WritingMusic Composition & Sound Design Tools3D Modeling & AnimationDIS
SimTube: Simulating Audience Feedback on Videos using Generative AI and User PersonasAudience feedback is crucial for refining video content, yet it typically comes after publication, limiting creators' ability to make timely adjustments. To bridge this gap, we introduce SimTube, a generative AI system designed to simulate audience feedback in the form of video comments before a video's release. SimTube features a computational pipeline that integrates multimodal data from the video—such as visuals, audio, and metadata—with user personas derived from a broad and diverse corpus of audience demographics, generating varied and contextually relevant feedback. Furthermore, the system’s UI allows creators to explore and customize the simulated comments. Through a comprehensive evaluation—comprising quantitative analysis, crowd-sourced assessments, and qualitative user studies—we show that SimTube's generated comments are not only relevant, believable, and diverse but often more detailed and informative than actual audience comments, highlighting its potential to help creators refine their content before release.2025YHYu-Kai Hung et al.Generative AI (Text, Image, Music, Video)Live Streaming & Content CreatorsAI-Assisted Creative WritingIUI
Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D VideosThis paper introduces Video2MR, a mixed reality system that automatically generates 3D sports and exercise instructions from 2D videos. Mixed reality instructions have great potential for physical training, but existing works require substantial time and cost to create these 3D experiences. Video2MR overcomes this limitation by transforming arbitrary instructional videos available online into MR 3D avatars with AI-enabled motion capture (DeepMotion). Then, it automatically enhances the avatar motion through the following augmentation techniques: 1) contrasting and highlighting differences between the user and avatar postures, 2) visualizing key trajectories and movements of specific body parts, 3) manipulation of time and speed using body motion, and 4) spatially repositioning avatars for different perspectives. Developed on Hololens 2 and Azure Kinect, we showcase various use cases, including yoga, dancing, soccer, tennis, and other physical exercises. The study results confirm that Video2MR provides more engaging and playful learning experiences, compared to existing 2D video instructions.2025KIKeiichi Ihara et al.Full-Body Interaction & Embodied InputMixed Reality WorkspacesBiosensors & Physiological MonitoringIUI
VideoMix: Aggregating How-To Videos for Task-Oriented LearningTutorial videos are a valuable resource for people looking to learn new tasks. People often learn these skills by viewing multiple tutorial videos to get an overall understanding of a task by looking at different approaches to achieve the task. However, navigating through multiple videos can be time-consuming and mentally demanding as these videos are scattered and not easy to skim. We propose VideoMix, a system that helps users gain a holistic understanding of a how-to task by aggregating information from multiple videos on the task. Insights from our formative study (N=12) reveal that learners value understanding potential outcomes, required materials, alternative methods, and important details shared by different videos. Powered by a Vision-Language Model pipeline, VideoMix extracts and organizes this information, presenting concise textual summaries alongside relevant video clips, enabling users to quickly digest and navigate the content. A comparative user study (N=12) demonstrated that VideoMix enabled participants to gain a more comprehensive understanding of tasks with greater efficiency than a baseline video interface, where videos are viewed independently. Our findings highlight the potential of a task-oriented, multi-video approach where videos are organized around a shared goal, offering an enhanced alternative to conventional video-based learning.2025SYSaelyne Yang et al.Online Learning & MOOC PlatformsIntelligent Tutoring Systems & Learning AnalyticsIUI
Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data AnnotationText-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We observe that the performance of existing text-to-SQL models drops dramatically when applied to a new schema, primarily due to the lack of domain-specific data for fine-tuning. Furthermore, this lack of data for the new schema also hinders our ability to effectively evaluate the model's performance in the new domain. Nevertheless, it is expensive to continuously obtain text-to-SQL data for an evolving schema in most real-world applications. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through collaboration between humans and a large language model in a structured workflow. A within-subject user study comparing SQLsynth to manual annotation and ChatGPT reveals that SQLsynth significantly accelerates text-to-SQL data annotation, reduces cognitive load, and produces datasets that are more accurate, natural, and diverse. Our code is available at https://github.com/adobe/nl_sql_analyzer.2025YTYuan Tian et al.Human-LLM CollaborationAutoML InterfacesIUI
CoPrompter: User-Centric Evaluation of LM Instruction Alignment for Improved Prompt EngineeringEnsuring large language models' (LLMs) responses align with prompt instructions is crucial for application development. Based on our formative study with industry professionals, the alignment requires heavy human involvement and tedious trial-and-error especially when there are many instructions in the prompt. To address these challenges, we introduce CoPrompter, a framework that identifies misalignment based on assessing multiple LLM responses with criteria. It proposes a method to generate evaluation criteria questions derived directly from prompt requirements and an interface to turn these questions into a user-editable checklist. Our user study with industry prompt engineers shows that CoPrompter improves the ability to identify and refine instruction alignment with prompt requirements over traditional methods, helps them understand where and how frequently models fail to follow user's prompt requirements, and helps in clarifying their own requirements, giving them greater control over the response evaluation process. We also present the design lessons to underscore our system's potential to streamline the prompt engineering process.2025IJIshika Joshi et al.Human-LLM CollaborationExplainable AI (XAI)AI-Assisted Decision-Making & AutomationIUI
Guidance Source Matters: How Guidance from AI, Expert, or a Group of Analysts Impacts Visual Data Preparation and AnalysisThe progress in generative Artificial Intelligence (AI) has fueled AI-powered tools like co-pilots and assistants to provision better guidance, particularly during data analysis. However, research on guidance has not yet examined the perceived efficacy of the source from which guidance is offered and the impact of this source on the user's perception and usage of guidance. We ask whether users perceive all guidance sources as equal, with particular interest in three sources: (i) "AI," (ii) "human expert," and (iii) "a group of human analysts." As a benchmark, we consider a fourth source, (iv) "unattributed guidance," where guidance is provided without attribution to any source, enabling isolation of and comparison with the effects of source-specific guidance. We design a five-condition between-subjects study, with one condition for each of the four guidance sources and an additional (v) "no-guidance" condition, which serves as a baseline to evaluate the influence of any kind of guidance. We situate our study in a custom data preparation and analysis tool wherein we task users to select relevant attributes from an unfamiliar dataset to inform a business report. Depending on the assigned condition, users can request guidance, which the system then provides in the form of attribute suggestions. To ensure internal validity, we control for the quality of guidance across source-conditions. Through several metrics of usage and perception, we statistically test five preregistered hypotheses and report on additional analysis. We find that the source of guidance matters to users, but not in a manner that matches received wisdom. For instance, users utilize guidance differently at various stages of analysis, including expressing varying levels of regret, despite receiving guidance of similar quality. Notably, users in the AI condition reported both higher post-task benefit and regret. These findings strongly indicate the need to further understand how different guidance sources impact user behavior for designing effective guidance systems.2025ANArpit Narechania et al.Generative AI (Text, Image, Music, Video)Explainable AI (XAI)AI-Assisted Decision-Making & AutomationIUI
Toward Living Narrative Reviews: An Empirical Study of the Processes and Challenges in Updating Survey Articles in Computing ResearchSurveying prior literature to establish a foundation for new knowledge is essential for scholarly progress. However, survey articles are resource-intensive and challenging to create, and can quickly become outdated as new research is published, risking information staleness and inaccuracy. Keeping survey articles current with the latest evidence is therefore desirable, though there is a limited understanding of why, when, and how these surveys should be updated. Toward this end, through a series of in-depth retrospective interviews with 11 researchers, we present an empirical examination of the work practices in authoring and updating survey articles in computing research. We find that while computing researchers acknowledge the value in maintaining an updated survey, continuous updating remains unmanageable and misaligned with academic incentives. Our findings suggest key leverage points within current workflows that present opportunities for enabling technologies to facilitate more efficient and effective updates.2025RFRaymond Fok et al.University of Washington, Paul G. Allen School of Computer Science & EngineeringUser Research Methods (Interviews, Surveys, Observation)Computational Methods in HCICHI
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content CreationNovice content creators often invest significant time recording expressive speech for social media videos. While recent advancements in text-to-speech (TTS) technology can generate highly realistic speech in various languages and accents, many struggle with unintuitive or overly granular TTS interfaces. We propose simplifying TTS generation by allowing users to specify high-level context alongside their script. Our Wizard-of-Oz system, SpeakEasy, leverages user-provided context to inform and influence TTS output, enabling iterative refinement with high-level feedback. This approach was informed by two 8-subject formative studies: one examining content creators' experiences with TTS, and the other drawing on effective strategies from voice actors. Our evaluation shows that participants using SpeakEasy were more successful in generating performances matching their personal standards, without requiring significantly more effort than leading industry interfaces.2025SBStephen Brade et al.Massachusetts Institute of Technology, Electrical Engineering & Computer Science DepartmentVoice User Interface (VUI) DesignIntelligent Voice Assistants (Alexa, Siri, etc.)Generative AI (Text, Image, Music, Video)CHI
LogoMotion: Visually-Grounded Code Synthesis for Creating and Editing AnimationCreating animation takes time, effort, and technical expertise. To help novices with animation, we present LogoMotion, an AI code generation approach that helps users create semantically meaningful animation for logos. LogoMotion automatically generates animation code with a method called visually-grounded code synthesis and program repair. This method performs visual analysis, instantiates a design concept, and conducts visual checking to generate animation code. LogoMotion provides novices with code-connected AI editing widgets that help them edit the motion, grouping, and timing of their animation. In a comparison study on 276 animations, LogoMotion was found to produce more content-aware animation than an industry-leading tool. In a user evaluation (n=16) comparing against a prompt-only baseline, these code-connected widgets helped users edit animations with control, iteration, and creative expression.2025VLVivian Liu et al.Columbia University3D Modeling & AnimationCreative Coding & Computational ArtCHI
VideoDiff: Human-AI Video Co-Creation with AlternativesTo make an engaging video, people sequence interesting moments and add visuals such as B-rolls or text. While video editing requires time and effort, AI has recently shown strong potential to make editing easier through suggestions and automation. A key strength of generative models is their ability to quickly generate multiple variations, but when provided with many alternatives, creators struggle to compare them to find the best fit. We propose VideoDiff, an AI video editing tool designed for editing with alternatives. With VideoDiff, creators can generate and review multiple AI recommendations for each editing process: creating a rough cut, inserting B-rolls, and adding text effects. VideoDiff simplifies comparisons by aligning videos and highlighting differences through timelines, transcripts, and video previews. Creators have the flexibility to regenerate and refine AI suggestions as they compare alternatives. Our study participants (N=12) could easily compare and customize alternatives, creating more satisfying results.2025MHMina Huh et al.University of Texas, Austin, Department of Computer ScienceGenerative AI (Text, Image, Music, Video)Video Production & EditingCreative Collaboration & Feedback SystemsCHI