MapStory: Prototyping Editable Map Animations with LLM AgentsWe introduce MapStory, an LLM‑powered animation prototyping tool that generates editable map animation sequences directly from natural language text by leveraging a dual-agent LLM architecture. Given a user-written script, MapStory automatically produces a scene breakdown, which decomposes the text into key map animation primitives such as camera movements, visual highlights, and animated elements. Our system includes a researcher agent that accurately queries geospatial information by leveraging an LLM with web search, enabling automatic extraction of relevant regions, paths, and coordinates while allowing users to edit and query for changes or additional information to refine the results. Additionally, users can fine-tune parameters of these primitive blocks through an interactive timeline editor. We detail the system’s design and architecture, informed by formative interviews with professional animators and by an analysis of 200 existing map animation videos. Our evaluation, which includes expert interviews (N=5), and a usability study (N=12), demonstrates that MapStory enables users to create map animations with ease, facilitates faster iteration, encourages creative exploration, and lowers barriers to creating map-centric stories.2025AGAditya Gunturu et al.Geospatial & Map VisualizationComputational Methods in HCIUIST
Narrative Motion Blocks: Combining Direct Manipulation and Natural Language Interactions for Animation CreationAuthoring compelling animations often requires artists to come up with creative high-level ideas and translate them into precise low-level spatial and temporal properties like position, orientation, scale, and frame timing. Traditional animation tools offer direct manipulation strategies to control these properties but lack support for implementing higher-level ideas. Alternatively, AI-based tools allow animation production using natural language prompts but lack the fine-grained control over properties required for professional workflows. To bridge this gap, we propose AniMate, a hand-drawn animation system that integrates direct manipulation and natural language interaction. Central to AniMate are narrative motion blocks, clip-like components located on a timeline that let animators specify animated behaviors with a combination of textual and manual input. Through an expert evaluation and the creation of short demonstrative animations, we show how focusing on intermediate-level actions provides a common representation for animators to work across both interaction modalities.2025SBSam Bourgault et al.AI-Assisted Creative WritingMusic Composition & Sound Design Tools3D Modeling & AnimationDIS
Video2MR: Automatically Generating Mixed Reality 3D Instructions by Augmenting Extracted Motion from 2D VideosThis paper introduces Video2MR, a mixed reality system that automatically generates 3D sports and exercise instructions from 2D videos. Mixed reality instructions have great potential for physical training, but existing works require substantial time and cost to create these 3D experiences. Video2MR overcomes this limitation by transforming arbitrary instructional videos available online into MR 3D avatars with AI-enabled motion capture (DeepMotion). Then, it automatically enhances the avatar motion through the following augmentation techniques: 1) contrasting and highlighting differences between the user and avatar postures, 2) visualizing key trajectories and movements of specific body parts, 3) manipulation of time and speed using body motion, and 4) spatially repositioning avatars for different perspectives. Developed on Hololens 2 and Azure Kinect, we showcase various use cases, including yoga, dancing, soccer, tennis, and other physical exercises. The study results confirm that Video2MR provides more engaging and playful learning experiences, compared to existing 2D video instructions.2025KIKeiichi Ihara et al.Full-Body Interaction & Embodied InputMixed Reality WorkspacesBiosensors & Physiological MonitoringIUI
LogoMotion: Visually-Grounded Code Synthesis for Creating and Editing AnimationCreating animation takes time, effort, and technical expertise. To help novices with animation, we present LogoMotion, an AI code generation approach that helps users create semantically meaningful animation for logos. LogoMotion automatically generates animation code with a method called visually-grounded code synthesis and program repair. This method performs visual analysis, instantiates a design concept, and conducts visual checking to generate animation code. LogoMotion provides novices with code-connected AI editing widgets that help them edit the motion, grouping, and timing of their animation. In a comparison study on 276 animations, LogoMotion was found to produce more content-aware animation than an industry-leading tool. In a user evaluation (n=16) comparing against a prompt-only baseline, these code-connected widgets helped users edit animations with control, iteration, and creative expression.2025VLVivian Liu et al.Columbia University3D Modeling & AnimationCreative Coding & Computational ArtCHI
Augmented Physics: Creating Interactive and Embedded Physics Simulations from Static Textbook DiagramsWe introduce Augmented Physics, a machine learning-integrated authoring tool designed for creating embedded interactive physics simulations from static textbook diagrams. Leveraging recent advancements in computer vision, such as Segment Anything and Multi-modal LLMs, our web-based system enables users to semi-automatically extract diagrams from physics textbooks and generate interactive simulations based on the extracted content. These interactive diagrams are seamlessly integrated into scanned textbook pages, facilitating interactive and personalized learning experiences across various physics concepts, such as optics, circuits, and kinematics. Drawing from an elicitation study with seven physics instructors, we explore four key augmentation strategies: 1) augmented experiments, 2) animated diagrams, 3) bi-directional binding, and 4) parameter visualization. We evaluate our system through technical evaluation, a usability study (N=12), and expert interviews (N=12). Study findings suggest that our system can facilitate more engaging and personalized learning experiences in physics education.2024AGAditya Gunturu et al.Geospatial & Map VisualizationProgramming Education & Computational ThinkingSTEM Education & Science CommunicationUIST
DrawTalking: Building Interactive Worlds by Sketching and SpeakingWe introduce DrawTalking, an approach to building and controlling interactive worlds by sketching and speaking while telling stories. It emphasizes user control and flexibility, and gives programming-like capability without requiring code. An early open-ended study with our prototype shows that the mechanics resonate and are applicable to many creative-exploratory use cases, with the potential to inspire and inform research in future natural interfaces for creative exploration and authoring.2024KRKarl Toby Rosenberg et al.AI-Assisted Creative WritingCreative Collaboration & Feedback SystemsUIST
Elastica: Adaptive Live Augmented Presentations with Elastic Mappings Across ModalitiesAugmented presentations offer compelling storytelling by combining speech content, gestural performance, and animated graphics in a congruent manner. The expressiveness of these presentations stems from the harmonious coordination of spoken words and graphic elements, complemented by smooth animations aligned with the presenter's gestures. However, achieving such desired congruence in a live presentation poses significant challenges due to the unpredictability and imprecision inherent in presenters' real-time actions. Existing methods either leveraged rigid mapping without predefined states or required the presenters to conform to predefined animations. We introduce adaptive presentations that dynamically adjust predefined graphic animations to real-time speech and gestures. Our approach leverages script following and motion warping to establish elastic mappings that generate runtime graphic parameters coordinating speech, gesture, and predefined animation state. Our evaluation demonstrated that the proposed adaptive presentation can effectively mitigate undesired visual artifacts caused by performance deviations and enhance the expressiveness of resulting presentations.2024YCYining Cao et al.University of California, San DiegoMixed Reality WorkspacesInteractive Narrative & Immersive StorytellingDance & Body Movement ComputingCHI
Automated Conversion of Music Videos into Lyric VideosMusicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs. However, making such videos can be challenging and time-consuming as the lyrics need to be added in synchrony and visual harmony with the video. Informed by prior work and close examination of existing lyric videos, we propose a set of design guidelines to help creators make such videos. Our guidelines ensure the readability of the lyric text while maintaining a unified focus of attention. We instantiate these guidelines in a fully automated pipeline that converts an input music video into a lyric video. We demonstrate the robustness of our pipeline by generating lyric videos from a diverse range of input sources. A user study shows that lyric videos generated by our pipeline are effective in maintaining text readability and unifying the focus of attention.2023JMJiaju Ma et al.Music Composition & Sound Design ToolsVideo Production & EditingUIST
PoseVEC: Authoring Adaptive Pose-aware Effects Using Visual Programming and DemonstrationsPose-aware visual effects where graphics assets and animations are rendered reactively to the human pose have become increasingly popular, appearing on mobile devices, the web, or even head-mounted displays like AR glasses. Yet, creating such effects still remains difficult for novices. In a traditional video editing workflow, a creator could utilize keyframes to create expressive but non-adaptive results which cannot be reused for other videos. Alternatively, programming-based approaches allow users to develop interactive effects, but are cumbersome for users to quickly express their creative intents. In this work, we propose a lightweight visual programming workflow for authoring adaptive and expressive pose effects. By combining a programming by demonstration paradigm with visual programming, we simplify three key tasks in the authoring process: creating pose triggers, designing animation parameters, and rendering. We evaluated our system with a qualitative user study and a replicated example study, finding that all participants can create effects efficiently.2023YZYongqi Zhang et al.Human Pose & Activity Recognition3D Modeling & AnimationCrowdsourcing Task Design & Quality ControlUIST
RealityTalk: Real-time Speech-driven Augmented Presentation for AR Live StorytellingWe present RealityTalk, a system that augments real-time live presentations with speech-driven interactive virtual elements. Augmented presentations leverage embedded visuals and animation for engaging and expressive storytelling. However, existing tools for live presentations often lack interactivity and improvisation, while creating such effects in video editing tools require significant time and expertise. RealityTalk enables users to create live augmented presentations with real-time speech-driven interactions. The user can interactively prompt, move, and manipulate graphical elements through real-time speech and supporting modalities. Based on our analysis of 177 existing video-edited augmented presentations, we propose a novel set of interaction techniques and then incorporated them into RealityTalk. We evaluate our tool from a presenter’s perspective to demonstrate the effectiveness of our system.2022JLJian Liao et al.AR Navigation & Context AwarenessInteractive Narrative & Immersive StorytellingUIST
A Layered Authoring Tool for Stylized 3D animationsGuided by the 12 principles of animation, stylization is a core 2D animation feature but has been utilized mainly by experienced animators. Although there are tools for stylizing 2D animations, creating stylized 3D animations remains a challenging problem due to the additional spatial dimension and the need for responsive actions like contact and collision. We propose a system that helps users create stylized casual 3D animations. A layered authoring interface is employed to balance between ease of use and expressiveness. Our surface level UI is a timeline sequencer that lets users add preset stylization effects such as squash and stretch and follow through to plain motions. Users can adjust spatial and temporal parameters to fine-tune these stylizations. These edits are propagated to our node-graph-based second level UI, in which the users can create custom stylizations after they are comfortable with the surface level UI. Our system also enables the stylization of interactions among multiple objects like force, energy, and collision. A pilot user study has shown that our fluid layered UI design allows for both ease of use and expressiveness better than existing tools.2022JMJiaju Ma et al.Brown University, Adobe ResearchShape-Changing Interfaces & Soft Robotic Materials3D Modeling & AnimationCHI
StreamSketch: Exploring Multi-Modal Interactions in Creative Live StreamsCreative live streams, where artists or designers demonstrate their creative process, have emerged as a unique and popular genre of live streams due to the real-time interactivity they afford. However, streamer-viewer interactions on most live streaming platforms only enable users to utilize text and emojis to communicate, which limits what viewers can convey and share in real time. To investigate the design space of potential visual and non-textual modalities within creative live streams, we first analyzed existing Twitch extensions and conducted a formative study with streamers who share creative activities to uncover key challenges that these streamers face. We then designed and implemented a prototype system, StreamSketch, which enables viewers and streamers to interact during live streams using multiple modalities, including freeform sketches and text. The prototype was evaluated by two professional artist streamers and their viewers during six streaming sessions. Overall, streamers and viewers found that StreamSketch provided increased engagement and new affordances compared to the traditional text-only modality, and highlighted how efficiency, moderation, and tool integration were continued challenges.2021ZLZhicong Lu et al.User ExperiencesCSCW
Beyond Show of Hands: Engaging Viewers via Expressive and Scalable Visual Communication in Live StreamingLive streaming is gaining popularity across diverse application domains in recent years. A core part of the experience is streamer-viewer interaction, which has been mainly text-based. Recent systems explored extending viewer interaction to include visual elements with richer expression and increased engagement. However, understanding expressive visual inputs becomes challenging with many viewers, primarily due to the relative lack of structure in visual input. On the other hand, adding rigid structures can limit viewer interactions to narrow use cases or decrease the expressiveness of viewer inputs. To facilitate the sensemaking of many visual inputs while retaining the expressiveness or versatility of viewer interactions, we introduce a visual input management framework(VIMF) and a system, VisPoll, that help streamers specify, aggregate, and visualize many visual inputs. A pilot evaluation indicated that VisPoll can expand the types of viewer interactions. Our framework provides insights for designing scalable and expressive visual communication for live streaming.2021JCJohn Joon Young Chung et al.University of MichiganLive Streaming & Spectating ExperienceSocial Platform Design & User BehaviorCHI
Constructing Embodied Algebra by SketchingMathematical models and expressions traditionally evolved as symbolic representations, with cognitively arbitrary rules of symbol manipulation. The embodied mathematics philosophy posits that abstract math concepts are layers of metaphors grounded in our intuitive arithmetic capabilities, such as categorizing objects and part-whole analysis. We introduce a design framework that facilitates the construction and exploration of embodied representations for algebraic expressions, using interactions inspired by innate arithmetic capabilities. We instantiated our design in a sketch interface that enables construction of visually interpretable compositions that are directly mappable to algebraic expressions and explorable through a ladder of abstraction. The emphasis is on bottom-up construction, with the user sketching pictures while the system generates corresponding algebra. We present diverse examples created by our prototype system. A coverage of the US Common Core curriculum and playtesting studies with children point to the future direction and potential for a sketch-based design paradigm for mathematics.2021NSNazmus Saquib et al.MIT Media LabProgramming Education & Computational ThinkingComputational Methods in HCICHI
RealitySketch: Embedding Responsive Graphics and Visualizations in AR through Dynamic SketchingWe present RealitySketch, an augmented reality interface for sketching interactive graphics and visualizations. In recent years, an increasing number of AR sketching tools enable users to draw and embed sketches in the real world. However, with the current tools, sketched contents are inherently static, floating in mid air without responding to the real world. This paper introduces a new way to embed dynamic and responsive graphics in the real world. In RealitySketch, the user draws graphical elements on a mobile AR screen and binds them with physical objects in real-time and improvisational ways, so that the sketched elements dynamically move with the corresponding physical motion. The user can also quickly visualize and analyze real-world phenomena through responsive graph plots or interactive visualizations. This paper contributes to a set of interaction techniques that enable capturing, parameterizing, and visualizing real-world motion without pre-defined programs and configurations. Finally, we demonstrate our tool with several application scenarios, including physics education, sports training, and in-situ tangible interfaces.2020RSRyo Suzuki et al.AR Navigation & Context AwarenessInteractive Data VisualizationUIST
Autocomplete Animated SculptingKeyframe-based sculpting provides unprecedented freedom to author animated organic models, which can be difficult to create with other methods such as simulation, scripting, and rigging. However, sculpting animated objects can require significant artistic skill and manual labor, even more so than sculpting static 3D shapes or drawing 2D animations, which are already quite challenging. We present a keyframe-based animated sculpting system with the capability to autocomplete user editing under a simple and intuitive brushing interface. Similar to current desktop sculpting and VR brushing tools, users can brush surface details and volume structures. Meanwhile, our system analyzes their workflows and predicts what they might do in the future, both spatially and temporally. Users can accept or ignore these suggestions and thus maintain full control. We propose the first interactive suggestive keyframe sculpting system, specifically for spatio-temporal repetitive tasks, including low-level spatial details and high-level brushing structures across multiple frames. Our key ideas include a deformation-based optimization framework to analyze recorded workflows and synthesize predictions, and a semi-causal global similarity measurement to support flexible brushing stroke sequences and complex shape changes. Our system supports a variety of shape and motion styles, including those difficult to achieve via existing animation systems, such as topological changes that cannot be accomplished via simple rig-based deformations and stylized physically-implausible motions that cannot be simulated. We evaluate our system via a pilot user study that demonstrates the effectiveness of our system.2020MPMengqi Peng et al.AI-Assisted Creative Writing3D Modeling & AnimationUIST
Pronto: Rapid Augmented Reality Video Prototyping Using Sketches and EnactionDesigners have limited tools to prototype AR experiences rapidly. Can lightweight, immediate tools let designers prototype dynamic AR interactions while capturing the nuances of a 3D experience? We interviewed three AR experts and identified several recurring issues in AR design: creating and positioning 3D assets, handling the changing user position, and orchestrating multiple animations. We introduce PROJECT PRONTO, a tablet-based video prototyping system that combines 2D video with 3D manipulation. PRONTO supports four intertwined activities: capturing 3D spatial information alongside a video scenario, positioning and sketching 2D drawings in a 3D world, and enacting animations with physical interactions. An observational study with professional designers shows that participants can use PRONTO to prototype diverse AR experiences. All participants performed two tasks: replicating a sample non-trivial AR experience and prototyping their open-ended designs. All participants completed the replication task and found PRONTO easy to use. Most participants found that PRONTO encourages more exploration of designs than their current practices.2020GLGermán Leiva et al.Aarhus UniversityMixed Reality WorkspacesPrototyping & User TestingCHI
MagicalHands: Mid-Air Hand Gestures for Animating in VRWe seek to understand the preferred hand gestures for canonical animation authoring tasks in virtual reality (VR) environments. We first perform a gesture elicitation study for animation authoring to understand user preferences for a spatio-temporal bare-handed interaction system in VR. Specifically, we focus on the creation and editing of dynamic, physical phenomena (e.g., particle systems, deformation, coupling), where the mapping from gesture to animation is ambiguous and indirect. We present commonly observed mid-air gestures from the study, eliciting a set of rich interaction techniques---from direct manipulation to abstract demonstrations. To this end, we extend existing gesture taxonomies to the rich spatiotemporal interaction space needed by animators for authoring in both space and time. We distill our findings into a set of design guidelines for the construction of natural user interfaces for VR-based animation systems. Finally, based on our guidelines, we develop a proof-of-concept gestural animation system in VR. Our results, as well as feedback from user evaluation, suggest that the expressive qualities of hand gestures effectively allow users to animate in VR.2019RARahul Arora et al.Hand Gesture Recognition3D Modeling & AnimationUIST
Interactive Body-Driven Graphics for Augmented Video PerformanceWe present a system that augments live presentation videos with interactive graphics to create a powerful and expressive storytelling environment. Using our system, the presenter interacts with the graphical elements in real-time with gestures and postures, thus leveraging our innate, everyday skills to enhance our communication capabilities with the audience. However, crafting such an interactive and expressive performance typically requires programming, or highly-specialized tools tailored for experts. Our core contribution is a flexible, direct manipulation UI which enables amateurs and experts to craft such presentations beforehand by mapping a variety of body movements to a wide range of graphical manipulations. By simplifying the mapping between gestures, postures, and their corresponding output effects, our UI enables users to craft customized, rich interactions with the graphical elements. Our user study demonstrates the potential usage and unique affordance of this mixed-reality medium for storytelling and presentation across a range of application domains.2019NSNazmus Saquib et al.Massachusetts Institute of TechnologyFull-Body Interaction & Embodied InputInteractive Narrative & Immersive StorytellingCHI