MoSound: An Interactive Tool for Generative Sound Design in Motion GraphicsMotion graphics, which bring logos, text, and other illustrations to life, are greatly enhanced with sound effects. Sound design for motion graphics presents unique challenges due to their short, abstract nature. Sound designers must identify opportunities for adding sound, decide on the sound's character to match the visual graphics, synchronize sounds with events, and align sonic properties with motions. We introduce MoSound, an interactive system that helps with all steps of this creation process. We designed the interface of MoSound based on formative studies with practitioners and implemented the system as a combination of visual event detection, spatial attribute mapping, and generative sound stylization. We demonstrate MoSound on a variety of examples, showing that it is capable of creating high quality soundtracks while being accessible to novices.2026JHJialin Huang et al.George Mason UniversityMusic Composition & Sound Design Tools3D Modeling & AnimationCreative Collaboration & Feedback SystemsCHI
Narrative Motion Blocks: Combining Direct Manipulation and Natural Language Interactions for Animation CreationAuthoring compelling animations often requires artists to come up with creative high-level ideas and translate them into precise low-level spatial and temporal properties like position, orientation, scale, and frame timing. Traditional animation tools offer direct manipulation strategies to control these properties but lack support for implementing higher-level ideas. Alternatively, AI-based tools allow animation production using natural language prompts but lack the fine-grained control over properties required for professional workflows. To bridge this gap, we propose AniMate, a hand-drawn animation system that integrates direct manipulation and natural language interaction. Central to AniMate are narrative motion blocks, clip-like components located on a timeline that let animators specify animated behaviors with a combination of textual and manual input. Through an expert evaluation and the creation of short demonstrative animations, we show how focusing on intermediate-level actions provides a common representation for animators to work across both interaction modalities.2025SBSam Bourgault et al.AI-Assisted Creative WritingMusic Composition & Sound Design Tools3D Modeling & AnimationDIS
LogoMotion: Visually-Grounded Code Synthesis for Creating and Editing AnimationCreating animation takes time, effort, and technical expertise. To help novices with animation, we present LogoMotion, an AI code generation approach that helps users create semantically meaningful animation for logos. LogoMotion automatically generates animation code with a method called visually-grounded code synthesis and program repair. This method performs visual analysis, instantiates a design concept, and conducts visual checking to generate animation code. LogoMotion provides novices with code-connected AI editing widgets that help them edit the motion, grouping, and timing of their animation. In a comparison study on 276 animations, LogoMotion was found to produce more content-aware animation than an industry-leading tool. In a user evaluation (n=16) comparing against a prompt-only baseline, these code-connected widgets helped users edit animations with control, iteration, and creative expression.2025VLVivian Liu et al.Columbia University3D Modeling & AnimationCreative Coding & Computational ArtCHI
DrawTalking: Building Interactive Worlds by Sketching and SpeakingWe introduce DrawTalking, an approach to building and controlling interactive worlds by sketching and speaking while telling stories. It emphasizes user control and flexibility, and gives programming-like capability without requiring code. An early open-ended study with our prototype shows that the mechanics resonate and are applicable to many creative-exploratory use cases, with the potential to inspire and inform research in future natural interfaces for creative exploration and authoring.2024KRKarl Toby Rosenberg et al.AI-Assisted Creative WritingCreative Collaboration & Feedback SystemsUIST
Elastica: Adaptive Live Augmented Presentations with Elastic Mappings Across ModalitiesAugmented presentations offer compelling storytelling by combining speech content, gestural performance, and animated graphics in a congruent manner. The expressiveness of these presentations stems from the harmonious coordination of spoken words and graphic elements, complemented by smooth animations aligned with the presenter's gestures. However, achieving such desired congruence in a live presentation poses significant challenges due to the unpredictability and imprecision inherent in presenters' real-time actions. Existing methods either leveraged rigid mapping without predefined states or required the presenters to conform to predefined animations. We introduce adaptive presentations that dynamically adjust predefined graphic animations to real-time speech and gestures. Our approach leverages script following and motion warping to establish elastic mappings that generate runtime graphic parameters coordinating speech, gesture, and predefined animation state. Our evaluation demonstrated that the proposed adaptive presentation can effectively mitigate undesired visual artifacts caused by performance deviations and enhance the expressiveness of resulting presentations.2024YCYining Cao et al.University of California, San DiegoMixed Reality WorkspacesInteractive Narrative & Immersive StorytellingDance & Body Movement ComputingCHI
iPose: Interactive Human Pose Reconstruction from VideoReconstructing 3D human poses from video has wide applications, such as character animation and sports analysis. Automatic 3D pose reconstruction methods have demonstrated promising results, but failure cases can still appear due to the diversity of human actions, capturing conditions, and depth ambiguities. Thus, manual intervention remains indispensable, which can be time-consuming and require professional skills. We thus present iPose, an interactive tool that facilitates intuitive human pose reconstruction from a given video. Our tool incorporates both human perception in specifying pose appearance to achieve controllability, and video frame processing algorithms to achieve precision and automation. A user manipulates the projection of a 3D pose via 2D operations on top of video frames, and the 3D poses are updated correspondingly while satisfying both kinematic and video frame constraints. The pose updates are propagated temporally to reduce user workload. We evaluate the effectiveness of iPose with a user study on the 3DPW dataset and expert interviews.2024JLJingyuan Liu et al.The University of TokyoHuman Pose & Activity Recognition3D Modeling & AnimationCHI
Automated Conversion of Music Videos into Lyric VideosMusicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs. However, making such videos can be challenging and time-consuming as the lyrics need to be added in synchrony and visual harmony with the video. Informed by prior work and close examination of existing lyric videos, we propose a set of design guidelines to help creators make such videos. Our guidelines ensure the readability of the lyric text while maintaining a unified focus of attention. We instantiate these guidelines in a fully automated pipeline that converts an input music video into a lyric video. We demonstrate the robustness of our pipeline by generating lyric videos from a diverse range of input sources. A user study shows that lyric videos generated by our pipeline are effective in maintaining text readability and unifying the focus of attention.2023JMJiaju Ma et al.Music Composition & Sound Design ToolsVideo Production & EditingUIST
A Layered Authoring Tool for Stylized 3D animationsGuided by the 12 principles of animation, stylization is a core 2D animation feature but has been utilized mainly by experienced animators. Although there are tools for stylizing 2D animations, creating stylized 3D animations remains a challenging problem due to the additional spatial dimension and the need for responsive actions like contact and collision. We propose a system that helps users create stylized casual 3D animations. A layered authoring interface is employed to balance between ease of use and expressiveness. Our surface level UI is a timeline sequencer that lets users add preset stylization effects such as squash and stretch and follow through to plain motions. Users can adjust spatial and temporal parameters to fine-tune these stylizations. These edits are propagated to our node-graph-based second level UI, in which the users can create custom stylizations after they are comfortable with the surface level UI. Our system also enables the stylization of interactions among multiple objects like force, energy, and collision. A pilot user study has shown that our fluid layered UI design allows for both ease of use and expressiveness better than existing tools.2022JMJiaju Ma et al.Brown University, Adobe ResearchShape-Changing Interfaces & Soft Robotic Materials3D Modeling & AnimationCHI
StreamSketch: Exploring Multi-Modal Interactions in Creative Live StreamsCreative live streams, where artists or designers demonstrate their creative process, have emerged as a unique and popular genre of live streams due to the real-time interactivity they afford. However, streamer-viewer interactions on most live streaming platforms only enable users to utilize text and emojis to communicate, which limits what viewers can convey and share in real time. To investigate the design space of potential visual and non-textual modalities within creative live streams, we first analyzed existing Twitch extensions and conducted a formative study with streamers who share creative activities to uncover key challenges that these streamers face. We then designed and implemented a prototype system, StreamSketch, which enables viewers and streamers to interact during live streams using multiple modalities, including freeform sketches and text. The prototype was evaluated by two professional artist streamers and their viewers during six streaming sessions. Overall, streamers and viewers found that StreamSketch provided increased engagement and new affordances compared to the traditional text-only modality, and highlighted how efficiency, moderation, and tool integration were continued challenges.2021ZLZhicong Lu et al.User ExperiencesCSCW
Beyond Show of Hands: Engaging Viewers via Expressive and Scalable Visual Communication in Live StreamingLive streaming is gaining popularity across diverse application domains in recent years. A core part of the experience is streamer-viewer interaction, which has been mainly text-based. Recent systems explored extending viewer interaction to include visual elements with richer expression and increased engagement. However, understanding expressive visual inputs becomes challenging with many viewers, primarily due to the relative lack of structure in visual input. On the other hand, adding rigid structures can limit viewer interactions to narrow use cases or decrease the expressiveness of viewer inputs. To facilitate the sensemaking of many visual inputs while retaining the expressiveness or versatility of viewer interactions, we introduce a visual input management framework(VIMF) and a system, VisPoll, that help streamers specify, aggregate, and visualize many visual inputs. A pilot evaluation indicated that VisPoll can expand the types of viewer interactions. Our framework provides insights for designing scalable and expressive visual communication for live streaming.2021JCJohn Joon Young Chung et al.University of MichiganLive Streaming & Spectating ExperienceSocial Platform Design & User BehaviorCHI
Constructing Embodied Algebra by SketchingMathematical models and expressions traditionally evolved as symbolic representations, with cognitively arbitrary rules of symbol manipulation. The embodied mathematics philosophy posits that abstract math concepts are layers of metaphors grounded in our intuitive arithmetic capabilities, such as categorizing objects and part-whole analysis. We introduce a design framework that facilitates the construction and exploration of embodied representations for algebraic expressions, using interactions inspired by innate arithmetic capabilities. We instantiated our design in a sketch interface that enables construction of visually interpretable compositions that are directly mappable to algebraic expressions and explorable through a ladder of abstraction. The emphasis is on bottom-up construction, with the user sketching pictures while the system generates corresponding algebra. We present diverse examples created by our prototype system. A coverage of the US Common Core curriculum and playtesting studies with children point to the future direction and potential for a sketch-based design paradigm for mathematics.2021NSNazmus Saquib et al.MIT Media LabProgramming Education & Computational ThinkingComputational Methods in HCICHI
RealitySketch: Embedding Responsive Graphics and Visualizations in AR through Dynamic SketchingWe present RealitySketch, an augmented reality interface for sketching interactive graphics and visualizations. In recent years, an increasing number of AR sketching tools enable users to draw and embed sketches in the real world. However, with the current tools, sketched contents are inherently static, floating in mid air without responding to the real world. This paper introduces a new way to embed dynamic and responsive graphics in the real world. In RealitySketch, the user draws graphical elements on a mobile AR screen and binds them with physical objects in real-time and improvisational ways, so that the sketched elements dynamically move with the corresponding physical motion. The user can also quickly visualize and analyze real-world phenomena through responsive graph plots or interactive visualizations. This paper contributes to a set of interaction techniques that enable capturing, parameterizing, and visualizing real-world motion without pre-defined programs and configurations. Finally, we demonstrate our tool with several application scenarios, including physics education, sports training, and in-situ tangible interfaces.2020RSRyo Suzuki et al.AR Navigation & Context AwarenessInteractive Data VisualizationUIST
Autocomplete Element FieldsAggregate elements are ubiquitous in natural and man-made objects. Interactively authoring these elements with varying anisotropy and deformability can require high artistic skills and manual labor. To reduce input workload and enhance output quality, we present an autocomplete system that can help users distribute and align such elements over different domains. Through a brushing interface, users can place and mix a few elements, and let our system automatically populate more elements for the remaining output. Furthermore, aggregate elements often require proper direction/scalar fields for proper arrangements, but fully specifying such fields across entire domains can be difficult or inconvenient for ordinary users. To address this usability challenge, we formulate element fields that can smoothly orient all the elements based on partial user specifications without requiring full input fields in any step. We validate our prototype system with a pilot user study and show applications in design, collage, and modeling.2020CHChen-Yuan Hsu et al.Bournemouth UniversityGraphic Design & Typography ToolsCustomizable & Personalized ObjectsPrototyping & User TestingCHI
Interactive Body-Driven Graphics for Augmented Video PerformanceWe present a system that augments live presentation videos with interactive graphics to create a powerful and expressive storytelling environment. Using our system, the presenter interacts with the graphical elements in real-time with gestures and postures, thus leveraging our innate, everyday skills to enhance our communication capabilities with the audience. However, crafting such an interactive and expressive performance typically requires programming, or highly-specialized tools tailored for experts. Our core contribution is a flexible, direct manipulation UI which enables amateurs and experts to craft such presentations beforehand by mapping a variety of body movements to a wide range of graphical manipulations. By simplifying the mapping between gestures, postures, and their corresponding output effects, our UI enables users to craft customized, rich interactions with the graphical elements. Our user study demonstrates the potential usage and unique affordance of this mixed-reality medium for storytelling and presentation across a range of application domains.2019NSNazmus Saquib et al.Massachusetts Institute of TechnologyFull-Body Interaction & Embodied InputInteractive Narrative & Immersive StorytellingCHI