“It’s more of a vibe I’m going for”: Designing Text-to-Music Generation Interfaces for Video CreatorsBackground music plays a crucial role in social media videos, yet finding the right music remains a challenge for video creators. These creators, often not music experts, struggle to describe their musical goals and compare options. AI text-to-music generation presents an opportunity to address these challenges by allowing users to generate music through text prompts; however, these models often require musical expertise and are difficult to control. In this paper, we explore how to incorporate music generation into video editing workflows. A formative study with video creators revealed challenges in articulating and iterating on musical preferences, as creators described music as "vibes" rather than with explicit musical vocabulary. Guided by these insights, we developed a creative assistant for music generation using editable vibe-based recommendations and structured refinement of music output. A user study showed that the assistant supports exploration, while direct prompting is more effective for precise goals. Our findings offer design recommendations for AI music tools for video creators.2025NHNoor Hammad et al.Generative AI (Text, Image, Music, Video)Music Composition & Sound Design ToolsVideo Production & EditingDIS
VideoDiff: Human-AI Video Co-Creation with AlternativesTo make an engaging video, people sequence interesting moments and add visuals such as B-rolls or text. While video editing requires time and effort, AI has recently shown strong potential to make editing easier through suggestions and automation. A key strength of generative models is their ability to quickly generate multiple variations, but when provided with many alternatives, creators struggle to compare them to find the best fit. We propose VideoDiff, an AI video editing tool designed for editing with alternatives. With VideoDiff, creators can generate and review multiple AI recommendations for each editing process: creating a rough cut, inserting B-rolls, and adding text effects. VideoDiff simplifies comparisons by aligning videos and highlighting differences through timelines, transcripts, and video previews. Creators have the flexibility to regenerate and refine AI suggestions as they compare alternatives. Our study participants (N=12) could easily compare and customize alternatives, creating more satisfying results.2025MHMina Huh et al.University of Texas, Austin, Department of Computer ScienceGenerative AI (Text, Image, Music, Video)Video Production & EditingCreative Collaboration & Feedback SystemsCHI
PodReels: Human-AI Co-Creation of Video Podcast TeasersVideo podcast teasers are short videos that can be shared on social media platforms to capture interest in full episodes of a video podcast. These teasers enable long-form podcasters to reach new audiences and gain more followers. However, creating a compelling teaser from an hour-long episode can be challenging. Selecting interesting clips requires significant mental effort; editing the chosen clips into a cohesive, well-produced teaser is time-consuming. To support the creation of video podcast teasers, we first investigated what makes a good teaser. We combined insights from audience comments and creator interviews to identify key ingredients. We also identified a common workflow used by creators during this process. Based on these findings, we developed a human-AI co-creative tool called PodReels to assist video podcasters in crafting teasers. Our user study demonstrated that PodReels significantly reduces creators' mental demand and improves their efficiency in producing video podcast teasers.2024SWSitong Wang et al.Video Production & EditingCreative Collaboration & Feedback SystemsDIS
StreamSketch: Exploring Multi-Modal Interactions in Creative Live StreamsCreative live streams, where artists or designers demonstrate their creative process, have emerged as a unique and popular genre of live streams due to the real-time interactivity they afford. However, streamer-viewer interactions on most live streaming platforms only enable users to utilize text and emojis to communicate, which limits what viewers can convey and share in real time. To investigate the design space of potential visual and non-textual modalities within creative live streams, we first analyzed existing Twitch extensions and conducted a formative study with streamers who share creative activities to uncover key challenges that these streamers face. We then designed and implemented a prototype system, StreamSketch, which enables viewers and streamers to interact during live streams using multiple modalities, including freeform sketches and text. The prototype was evaluated by two professional artist streamers and their viewers during six streaming sessions. Overall, streamers and viewers found that StreamSketch provided increased engagement and new affordances compared to the traditional text-only modality, and highlighted how efficiency, moderation, and tool integration were continued challenges.2021ZLZhicong Lu et al.User ExperiencesCSCW
CrowdFolio: Understanding How Holistic and Decomposed Workflows Influence Feedback on Online PortfoliosFreelancers increasingly earn their livelihood through online marketplaces. To attract new clients, freelancers continuously curate their online portfolios to convey their unique skills and style. However, many lack access to rapid, regular, and inexpensive feedback needed to improve their portfolios. Existing crowd feedback systems, which collect feedback on individual creative projects (i.e., decomposed approach), could fill this need, but it is unclear how they might support feedback on multiple projects (i.e., holistic approach). In a between-subjects study with 30 freelancers, we compared decomposed and holistic feedback collection approaches using CrowdFolio, a crowd feedback system for portfolios. The holistic approach helped freelancers discover new ways to describe their work, while the decomposed approach provided detailed insight about the visual attractiveness of projects. This study contributes evidence that portfolio feedback systems, regardless of collection approach, can positively support professional development by impacting how freelancers portray themselves online and reflect on their identity.2021EFEureka Foong et al.Crowds and CollaborationCSCW
ReMap: Lowering the Barrier to Help-Seeking with Multimodal SearchPeople often seek help online while using complex software. Currently, information search takes users’ attention away from the task at hand by creating a separate search task. This paper investigates how multimodal interaction can make in-task helpseeking easier and faster. We introduce ReMap, a multimodal search interface that helps users find video assistance while using desktop and web applications. Users can speak search queries, add application-specific terms deictically (e.g., “how to erase this”), and navigate search results via speech, all without taking their hands (or mouse) off their current task. Thirteen participants who used ReMap in the lab found that it helped them stay focused on their task while simultaneously searching for and using learning videos. Users’ experiences with ReMap also raised a number of important challenges with implementing system-wide context-aware multimodal assistance.2020CFC. Ailie Fraser et al.Voice User Interface (VUI) DesignHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationUIST
Temporal Segmentation of Creative Live StreamsMany artists broadcast their creative process through live streaming platforms like Twitch and YouTube, and people often watch archives of these broadcasts later for learning and inspiration. Unfortunately, because live stream videos are often multiple hours long and hard to skim and browse, few can leverage the wealth of knowledge hidden in these archives. We present an approach for automatic temporal segmentation of creative live stream videos. Using an audio transcript and a log of software usage, the system segments the video into sections that the artist can optionally label with meaningful titles. We evaluate this approach by gathering feedback from expert streamers and comparing automatic segmentations to those made by viewers. We find that, while there is no one "correct" way to segment a live stream, our automatic method performs similarly to viewers, and streamers find it useful for navigating their streams after making slight adjustments and adding section titles.2020CFC. Ailie Fraser et al.Adobe Research & University of California, San DiegoLive Streaming & Content CreatorsCreative Collaboration & Feedback SystemsCHI
Discovering Natural Language Commands in Multimodal InterfacesDiscovering what to say and how to say it remains a challenge for users of multimodal interfaces supporting speech input. Users end up "guessing" commands that a system might support, often leading to interpretation errors and frustration. One solution to this problem is to display contextually relevant command examples as users interact with a system. The challenge, however, is deciding when, how, and which examples to recommend. In this work, we describe an approach for generating and ranking natural language command examples in multimodal interfaces. We demonstrate the approach using a prototype touch- and speech-based image editing tool. We experiment with augmentations of the UI to understand when and how to present command examples. Through an online user study, we evaluate these alternatives and find that in-situ command suggestions promote discovery and encourage the use of speech input.2019ASArjun Srinivasan et al.Voice User Interface (VUI) DesignIntelligent Voice Assistants (Alexa, Siri, etc.)Prototyping & User TestingIUI
RePlay: Contextually Presenting Learning Videos Across Software ApplicationsComplex activities often require people to work across multiple software applications. However, people frequently lack valuable knowledge about at least one application, especially as software changes and new software emerges. Existing help systems either lack contextual knowledge or are tightly-knit into a single application. We introduce an application-independent approach for contextually presenting video learning resources and demonstrate it through the RePlay system. RePlay uses accessibility APIs to gather context about the user's activity. It leverages an existing search engine to present relevant videos and highlights key segments within them using video captions. We report on a week-long field study (n=7) and a lab study (n=24) showing that contextual assistance helps people spend less time away from their task than web video search and replaces current video navigation strategies. Our findings highlight challenges with representing and using context across applications.2019CFC. Ailie Fraser et al.University of California, San Diego & Adobe ResearchInteractive Data VisualizationOnline Learning & MOOC PlatformsCollaborative Learning & Peer TeachingCHI
Vocal Shortcuts for Creative ExpertsVocal shortcuts, short spoken phrases to control interfaces, have the potential to reduce cognitive and physical costs of interactions. They may benefit expert users of creative applications (e.g., designers, illustrators) by helping them maintain creative focus. To aid the design of vocal shortcuts and gather use cases and design guidelines for speech interaction, we interviewed ten creative experts. Based on our findings, we built VoiceCuts, a prototype implementation of vocal shortcuts in the context of an existing creative application. In contrast to other speech interfaces, VoiceCuts targets experts' unique needs by handling short and partial commands and leverages document model and application context to disambiguate user utterances. We report on the viability and limitations of our approach based on feedback from creative experts.2019YKYea-Seul Kim et al.University of WashingtonVoice User Interface (VUI) DesignMusic Composition & Sound Design ToolsCHI
TakeToons: Script-driven Performance AnimationPerformance animation is an expressive method for animating characters through human performance. However, character motion is only one part of creating animated stories. The typical workflow also involves writing a script, coordinating actors, and editing recorded performances. In most cases, these steps are done in isolation with separate tools, which introduces friction and hinders iteration. We propose TakeToons, a script-driven approach that allows authors to annotate standard scripts with relevant animation events like character actions, camera positions, and scene backgrounds. We compile this script into a story model that persists throughout the production process and provides a consistent structure for organizing and assembling recorded performances and propagating script or timing edits to existing recordings. TakeToons enables writing, performing and editing to happen in an integrated and interleaved manner that streamlines production and facilitates iteration. Informal feedback from professional animators suggests that our approach can benefit many existing workflows supporting individual authors and production teams with many different contributors.2018HSHariharan Subramonyam et al.3D Modeling & AnimationInteractive Narrative & Immersive StorytellingUIST
Charrette: Supporting In-Person Discussions around Iterations in User Interface DesignAs a rule, user interface designers work iteratively. Over the course of a project, they repeatedly gather feedback, typically through in-person meetings, and update their designs accordingly. Through formative work, we find that design software tools do not support designers in managing meeting notes and previous design iterations as a cohesive whole. This causes designers to rely on ad-hoc practices for organizing work, which makes it hard for them to keep track of relevant feedback and explain their design decisions. To address this problem, we present Charrette, a system that allows designers to curate design iterations, attach meeting notes to the relevant content, and navigate sequences of design iterations with the associated notes to facilitate in-person discussions. In an exploratory user study, we evaluate how Charrette affects designers' self-reported ease in handling feedback during face-to-face discussions, compared with using their own tools. We find that using Charrette correlates with increased confidence and recall in discussing previous design decisions.2018JOJasper O'Leary et al.University of Washington, Adobe ResearchCreative Collaboration & Feedback SystemsPrototyping & User TestingCHI
Rewire: Interface Design Assistance from ExamplesInterface designers often use screenshot images of example designs as building blocks for new designs. Since images are unstructured and hard to edit, designers typically reconstruct screenshots with vector graphics tools in order to reuse or edit parts of the design. Unfortunately, this reconstruction process is tedious and slow. We present Rewire, an interactive system that helps designers leverage example screenshots. Rewire automatically infers a vector representation of screenshots where each UI component is a separate object with editable shape and style properties. Based on this representation, the system provides three design assistance modes that help designers reuse or redraw components of the example design. The results from our quantitative and user evaluations demonstrate that Rewire can generate accurate vector representations of interface screenshots found in the wild and that design assistance enables users to reconstruct and edit example designs more efficiently compared to a baseline design tool.2018ASAmanda Swearngin et al.University of WashingtonPrototyping & User TestingCHI
Interactive Guidance Techniques for Improving Creative FeedbackGood feedback is critical to creativity and learning, yet rare. Many people do not know how to actually provide effective feedback. There is increasing demand for quality feedback – and thus feedback givers – in learning and professional settings. This paper contributes empirical evidence that two interactive techniques – reusable suggestions and adaptive guidance – can improve feedback on creative work. We present these techniques embodied in the CritiqueKit system to help reviewers give specific, actionable, and justified feedback. Two real-world deployment studies and two controlled experiments with CritiqueKit found that adaptively-presented suggestions improve the quality of feedback from novice reviewers. Reviewers also reported that suggestions and guidance helped them describe their thoughts and reminded them to provide effective feedback.2018TNTricia J Ngoon et al.UC San DiegoCreative Collaboration & Feedback SystemsPrototyping & User TestingCHI
Data Illustrator: Augmenting Vector Design Tools with Lazy Data Binding for Expressive Visualization AuthoringBuilding graphical user interfaces for visualization authoring is challenging as one must reconcile the tension between flexible graphics manipulation and procedural visualization generation based on a graphical grammar or declarative languages. To better support designers' workflows and practices, we propose Data Illustrator, a novel visualization framework. In our approach, all visualizations are initially vector graphics; data binding is applied when necessary and only constrains interactive manipulation to that data bound property. The framework augments graphic design tools with new concepts and operators, and describes the structure and generation of a variety of visualizations. Based on the framework, we design and implement a visualization authoring system. The system extends interaction techniques in modern vector design tools for direct manipulation of visualization configurations and parameters. We demonstrate the expressive power of our approach through a variety of examples. A qualitative study shows that designers can use our framework to compose visualizations.2018ZLZhicheng Liu et al.Adobe ResearchInteractive Data VisualizationGraphic Design & Typography ToolsCHI