CineVision: An Interactive Pre-visualization Storyboard System for Director–Cinematographer CollaborationEffective communication between directors and cinematographers is fundamental in film production, yet traditional approaches relying on visual references and hand-drawn storyboards often lack the efficiency and precision necessary during pre-production. We present CineVision, an AI-driven platform that integrates scriptwriting with real-time visual pre-visualization to bridge this communication gap. By offering dynamic lighting control, style emulation based on renowned filmmakers, and customizable character design, CineVision enables directors to convey their creative vision with heightened clarity and rapidly iterate on scene composition. In a 24-participant lab study, CineVision yielded shorter task times and higher usability ratings than two baseline methods, suggesting a potential to ease early-stage communication and accelerate storyboard drafts under controlled conditions. These findings underscore CineVision’s potential to streamline pre-production processes and foster deeper creative synergy among filmmaking teams, particularly for new collaborators. Our code and demo are available at https://github.com/TonyHongtaoWu/CineVision.2025ZWZheng Wei et al.AI-Assisted Creative WritingVideo Production & Editing3D Modeling & AnimationUIST
Block and Detail: Scaffolding Sketch-to-Image GenerationWe introduce a novel sketch-to-image tool that aligns with the iterative refinement process of artists. Our tool lets users sketch blocking strokes to coarsely represent the placement and form of objects and detail strokes to refine their shape and silhouettes. We develop a two-pass algorithm for generating high-fidelity images from such sketches at any point in the iterative process. In the first pass we use a ControlNet to generate an image that strictly follows all the strokes (blocking and detail) and in the second pass we add variation by renoising regions surrounding blocking strokes. We also present a dataset generation scheme that, when used to train a ControlNet architecture, allows regions that do not contain strokes to be interpreted as not-yet-specified regions rather than empty space. We show that this partial-sketch-aware ControlNet can generate coherent elements from partial sketches that only contain a small number of strokes. The high-fidelity images produced by our approach serve as scaffolds that can help the user adjust the shape and proportions of objects or add additional elements to the composition. We demonstrate the effectiveness of our approach with a variety of examples and evaluative comparisons. Quantitatively, novice viewers prefer the quality of images from our algorithm over a baseline Scribble ControlNet for 82% of the pairs and found our images had less distortion in 80% of the pairs.2024VSVishnu Sarukkai et al.Generative AI (Text, Image, Music, Video)AI-Assisted Creative Writing3D Modeling & AnimationUIST
ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie DatabaseScriptwriters usually rely on their mental visualization to create a vivid story by using their imagination to see, feel, and experience the scenes they are writing. Besides mental visualization, they often refer to existing images or scenes in movies and analyze the visual elements to create a certain mood or atmosphere. In this paper, we develop a new tool, ScriptViz, to provide external visualization based on a large movie database for the screenwriting process. It retrieves reference visuals on the fly based on scripts’ text and dialogue from a large movie database. The tool provides two types of control on visual elements that enable writers to 1) see exactly what they want with fixed visual elements and 2) see variances in uncertain elements. User evaluation among 15 scriptwriters shows that ScriptViz is able to present scriptwriters with consistent yet diverse visual possibilities, aligning closely with their scripts and helping their creation.2024ARAnyi Rao et al.Generative AI (Text, Image, Music, Video)Data StorytellingUIST
Design Space of Visual Feedforward And Corrective Feedback in XR-Based Motion Guidance SystemsExtended reality (XR) technologies are highly suited in assisting individuals in learning motor skills and movements---referred to as motion guidance. In motion guidance, the ``feedforward’’ provides instructional cues of the motions that are to be performed, whereas the ``feedback’’ provides cues which help correct mistakes and minimize errors. Designing synergistic feedforward and feedback is vital to providing an effective learning experience, but this interplay between the two has not yet been adequately explored. Based on a survey of the literature, we propose design space for both motion feedforward and corrective feedback in XR, and describe the interaction effects between them. We identify common design approaches of XR-based motion guidance found in our literature corpus, and discuss them through the lens of our design dimensions. We then discuss additional contextual factors and considerations that influence this design, together with future research opportunities for motion guidance in XR.2024XYHariharan Subramonyam et al.University of Stuttgart, University of StuttgartFull-Body Interaction & Embodied InputVR Medical Training & RehabilitationInteractive Narrative & Immersive StorytellingCHI
Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMsLarge language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users `envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of \textit{Envisioning} by highlighting three misalignments on not knowing: (1) what the task should be, (2) how to instruct the LLM to do the task, and (3) what to expect for the LLM’s output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.2024HSHariharan Subramonyam et al.Stanford UniversityHuman-LLM CollaborationUser Research Methods (Interviews, Surveys, Observation)CHI
TaleStream: Supporting Story Ideation with Trope KnowledgeStory ideation is a critical part of the story-writing process. It is challenging to support computationally due to its exploratory and subjective nature. Tropes, which are recurring narrative elements across stories, are essential in stories as they shape the structure of narratives and our understanding of them. In this paper, we propose to use tropes as an intermediate representation of stories to approach story ideation. We present TaleStream, a canvas system that uses tropes as building blocks of stories while providing steerable suggestions of story ideas in the form of tropes. Our trope suggestion methods leverage data from the tvtropes.org wiki. We find that 97\% of the time, trope suggestions generated by our methods provide better story ideation materials than random tropes. Our system evaluation suggests that TaleStream can support writers’ creative flow and greatly facilitates story development. Tropes, as a rich lexicon of narratives with available examples, play a key role in TaleStream and hold promise for story-creation support systems.2023JCJean-Peïc Chou et al.AI-Assisted Creative WritingInteractive Narrative & Immersive StorytellingUIST
Automated Conversion of Music Videos into Lyric VideosMusicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs. However, making such videos can be challenging and time-consuming as the lyrics need to be added in synchrony and visual harmony with the video. Informed by prior work and close examination of existing lyric videos, we propose a set of design guidelines to help creators make such videos. Our guidelines ensure the readability of the lyric text while maintaining a unified focus of attention. We instantiate these guidelines in a fully automated pipeline that converts an input music video into a lyric video. We demonstrate the robustness of our pipeline by generating lyric videos from a diverse range of input sources. A user study shows that lyric videos generated by our pipeline are effective in maintaining text readability and unifying the focus of attention.2023JMJiaju Ma et al.Music Composition & Sound Design ToolsVideo Production & EditingUIST
iWood: Makeable Vibration Sensor for Interactive Plywood iWood is interactive plywood that can sense vibration based on triboelectric effect. As a material, iWood survives common woodworking operations, such as sawing, screwing, and nailing and can be used to create furniture and artifacts. Things created using iWood inherit its sensing capability and can detect a variety of user input and activities based on their unique vibration patterns. Through a series of experiments and machine simulations, we carefully chose the size of the sensor electrodes, the type of triboelectric materials, and the bonding method of the sensor layers to optimize the sensitivity and fabrication complexity. The sensing performance of iWood was evaluated with 4 gestures and 12 daily activities carried out on a table, nightstand, and cutting board, all created using iWood. Our result suggested over 90% accuracies for activity and gesture recognition.2022TWMackenzie Leake et al.Vibrotactile Feedback & Skin StimulationHaptic WearablesShape-Changing Interfaces & Soft Robotic MaterialsUIST
Sketch-Based Design of Foundation Paper Pieceable QuiltsFoundation paper piecing is a widely used quilt-making technique in which fabric pieces are sewn onto a paper guide to facilitate construction. But, designing paper pieceable quilt patterns is challenging because the sewing process imposes constraints on both the geometry and sewing order of the fabric pieces. Based on a formative study with expert quilt designers, we develop a novel sketch-based tool for designing such quilt patterns. Our tool lets designers sketch a partial design as a set of edges, which may intersect but do not have to form closed polygons, and our tool automatically completes it into a fully paper pieceable pattern. We contribute a new sketch-completion algorithm that extends the input sketched edges into a planar mesh composed of closed polygonal faces representing fabric pieces, determines a paper pieceable sewing order for the faces, and breaks complicated sketches into independently paper pieceable sections when necessary. A partial input design often admits multiple visually different completions. Thus, our tool lets designers specify completion heuristics, which are based on current quilt design practices, to control the appearance of the completed quilt. Initial user evaluations with novice and expert quilt designers suggest that our tool fits within current design workflows and greatly facilitates designing foundation paper pieceable quilts by allowing users to focus on the visual design rather than tedious constraint checks.2022MLMackenzie Leake et al.Customizable & Personalized ObjectsUIST
Automated Accessory Rigs for Layered 2D Character IllustrationsMix-and-match character creation tools enable users to quickly produce 2D character illustrations by combining various predefined accessories, like clothes and hairstyles, which are represented as separate, interchangeable artwork layers. However, these accessory layers are often designed to fit only the default body artwork, so users cannot modify the body without manually updating all the accessory layers as well. To address this issue, we present a method that captures and preserves important relationships between artwork layers so that the predefined accessories adapt with the character's body. We encode these relationships with four types of constraints that handle common interactions between layers: (1) occlusion, (2) attachment at a point, (3) coincident boundaries, and (4) overlapping regions. A rig is a set of constraints that allow a motion or deformation specified on the body to transfer to the accessory layers. We present an automated algorithm for generating such a rig for each accessory layer, but also allow users to select which constraints to apply to specific accessories. We demonstrate how our system supports a variety of modifications to body shape and pose using artwork from mix-and-match data sets.2021JLJingyi Li et al.3D Modeling & AnimationCustomizable & Personalized ObjectsUIST
Coupling Simulation and Hardware for Interactive Circuit DebuggingSimulation offers many advantages when designing analog circuits. Designers can explore alternatives quickly, without added cost or risk of hardware faults. However, it is challenging to use simulation as an aid during interactive debugging of physical circuits, due to difficulties in comparing simulated analyses with hardware measurements. Designers must continually configure simulations to match the state of the physical circuit (e.g. capturing sensor inputs), and must manually rework the hardware to replicate changes or analyses performed in simulation. We propose techniques leveraging instrumentation and programmable test hardware to create a tight coupling between a physical circuit and its simulated model. Bridging these representations helps designers to compare simulated and measured behaviors, and to quickly perform analytical techniques on hardware (e.g. parameter-response analysis) that are typically cumbersome outside of simulation. We implement these techniques in a prototype and show how it aids in efficiently debugging a variety of analog circuits.2021ESEvan Strasnick et al.Stanford UniversityCircuit Making & Hardware PrototypingCHI
Automatic Generation of Two Level Hierarchical Tutorials from Instructional Makeup VideosWe present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.2021ATAnh Truong et al.Stanford UniversityVoice User Interface (VUI) DesignMusic Composition & Sound Design ToolsCHI
The Role of Working Memory in Program TracingProgram tracing, or mentally simulating a program on concrete inputs, is an important part of general program comprehension. Programs involve many kinds of virtual state that must be held in memory, such as variable/value pairs and a call stack. In this work, we examine the influence of short-term working memory (WM) on a person's ability to remember program state during tracing. We first confirm that previous findings in cognitive psychology transfer to the programming domain: people can keep about 7 variable/value pairs in WM, and people will accidentally swap associations between variables due to WM load. We use a restricted focus viewing interface to further analyze the strategies people use to trace through programs, and the relationship of tracing strategy to WM. Given a straight-line program, we find half of our participants traced a program from the top-down line-by-line (linearly), and the other half start at the bottom and trace upward based on data dependencies (on-demand). Participants with an on-demand strategy made more WM errors while tracing straight-line code than with a linear strategy, but the two strategies contained an equal number of WM errors when tracing code with functions. We conclude with the implications of these findings for the design of programming tools: first, programs should be analyzed to identify and refactor human-memory-intensive sections of code. Second, programming environments should interactively visualize variable metadata to reduce WM load in accordance with a person's tracing strategy. Third, tools for program comprehension should enable externalizing program state while tracing.2021WCWill Crichton et al.Stanford UniversityProgramming Education & Computational ThinkingComputational Methods in HCICHI
Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line ChartsCharts often contain visually prominent features that draw attention to aspects of the data and include text captions that emphasize aspects of the data. Through a crowdsourced study, we explore how readers gather takeaways when considering charts and captions together. We first ask participants to mark visually prominent regions in a set of line charts. We then generate text captions based on the prominent features and ask participants to report their takeaways after observing chart-caption pairs. We find that when both the chart and caption describe a high-prominence feature, readers treat the doubly emphasized high-prominence feature as the takeaway; when the caption describes a low-prominence chart feature, readers rely on the chart and report a higher-prominence feature as the takeaway. We also find that external information that provides context, helps further convey the caption’s message to the reader. We use these findings to provide guidelines for authoring effective chart-caption pairs.2021DKDae Hyun Kim et al.Stanford University, Tableau ResearchInteractive Data VisualizationVisualization Perception & CognitionCHI
Crosscast: Adding Visuals to Audio Travel PodcastsAudio travel podcasts are a valuable source of information for travelers. Yet, travel is, in many ways, a visual experience and the lack of visuals in travel podcasts can make it difficult for listeners to fully understand the places being discussed. We present Crosscast: a system for automatically adding visuals to audio travel podcasts. Given an audio travel podcast as input, Crosscast uses natural language processing and text mining to identify geographic locations and descriptive keywords within the podcast transcript. Crosscast then uses these locations and keywords to automatically select relevant photos from online repositories and synchronizes their display to align with the audio narration. In a user evaluation, we find that 85.7% of the participants preferred Crosscast generated audio-visual travel podcasts compared to audio-only travel podcasts.2020HXHaijun Xia et al.Museum & Cultural Heritage DigitizationInteractive Narrative & Immersive StorytellingUIST
Supporting Visual Artists in Programming through Direct Inspection and Control of Program ExecutionProgramming offers new opportunities for visual art creation, but understanding and manipulating the abstract representations that make programming powerful can pose challenges for artists who are accustomed to manual tools and concrete visual interaction. We hypothesize that we can reduce these barriers through programming environments that link state to visual artwork output. We created Demystified Dynamic Brushes (DDB), a tool that bidirectionally links code, numerical data, and artwork across the programming interface and the execution environment – i.e., the artist's in-progress artwork. DDB automatically records stylus input as artists draw, and stores a history of brush state and output in relation to the input. This structure enables artists to inspect current and past numerical input, state, and output and control program execution through the direct selection of visual geometric elements in the drawing canvas. An observational study suggests that artists engage in program inspection when they can visually access geometric state information on the drawing canvas in the process of manual drawing.2020JLJingyi Li et al.Stanford UniversityCreative Coding & Computational ArtCreative Collaboration & Feedback SystemsCHI
Answering Questions about Charts and Generating Visual ExplanationsPeople often use charts to analyze data, answer questions and explain their answers to others. In a formative study, we find that such human-generated questions and explanations commonly refer to visual features of charts. Based on this study, we developed an automatic chart question answering pipeline that generates visual explanations describing how the answer was obtained. Our pipeline first extracts the data and visual encodings from an input Vega-Lite chart. Then, given a natural language question about the chart, it transforms references to visual attributes into references to the data. It next applies a state-of-the-art machine learning algorithm to answer the transformed question. Finally, it uses a template-based approach to explain in natural language how the answer is determined from the chart's visual features. A user study finds that our pipeline-generated visual explanations significantly outperform in transparency and are comparable in usefulness and trust to human-generated explanations.2020DKDae Hyun Kim et al.Stanford UniversityInteractive Data VisualizationData StorytellingCHI
Generating Audio-Visual Slideshows from Text Articles Using Word ConcretenessWe present a system that automatically transforms text articles into audio-visual slideshows by leveraging the notion of word concreteness, which measures how strongly a word or phrase is related to some perceptible concept. In a formative study we learn that people not only prefer such audio-visual slideshows but find that the content is easier to understand compared to text articles or text articles augmented with images. We use word concreteness to select search terms and find images relevant to the text. Then, based on the distribution of concrete words and the grammatical structure of an article, we time-align selected images with audio narration obtained through text-to-speech to produce audio-visual slideshows. In a user evaluation we find that our concreteness-based algorithm selects images that are highly relevant to the text. The quality of our slideshows is comparable to slideshows produced manually using standard video editing tools, and people strongly prefer our slideshows to those generated using a simple keyword-search based approach.2020MLMackenzie Leake et al.Stanford UniversityGenerative AI (Text, Image, Music, Video)Data StorytellingCHI
View-Dependent Video Textures for 360° VideoA major concern for filmmakers creating 360° video is ensuring that the viewer does not miss important narrative elements because they are looking in the wrong direction. This paper introduces gated clips which do not play past a gate time until a filmmaker-defined viewer gaze condition is met, such as looking at the region of interest (ROI). Until the condition is met, we seamlessly loop playback using view-dependent video textures, which extend standard video textures by adapting the looping behavior to the portion of the scene in the viewer’s field of view. We use our desktop GUI to edit live action and computer animated 360◦videos, and show them to casual viewers. Study participants prefer our looping videos over the standard versions and are able to successfully see all of the looping videos’ ROIs without fear of missing anything.2019SLSunny Liu et al.360° Video & Panoramic ContentInteractive Narrative & Immersive StorytellingUIST
VisiBlends: A Flexible Workflow for Visual BlendsVisual blends are an advanced graphic design technique to draw attention to a message. They combine two objects in a way that is novel and useful in conveying a message symbolically. This paper presents VisiBlends, a flexible workflow for creating visual blends that follows the iterative design process. We introduce a design pattern for blending symbols based on principles of human visual object recognition. Our workflow decomposes the process into both computational techniques and human microtasks. It allows users to collaboratively generate visual blends with steps involving brainstorming, synthesis, and iteration. An evaluation of the workflow shows that decentralized groups can generate blends in independent microtasks, co-located groups can collaboratively make visual blends for their own messages, and VisiBlends improves novices' ability to make visual blends.2019LCLydia B. Chilton et al.Columbia UniversityGraphic Design & Typography ToolsCreative Collaboration & Feedback SystemsCHI