JustShape: Exploring Co-Speech Gestures for Multimodal LLM-Powered 3D Parametric ModelingParametric modeling is a prevailing 3D modeling approach in design, architecture, and engineering. The emergence of multimodal large language models (LLMs) brings a new opportunity to lower the entry barriers to this powerful tool. However, describing 3D geometries through natural language can be fuzzy and challenging. We introduce co-speech gesture, a natural and expressive interaction modality to complement text prompts for LLM-empowered generative parametric modeling. We first conducted an elicitation study to explore and categorize co-speech gesture expressions. Based on the findings, we designed a multimodal fusion pipeline that parametrizes gestures and synthesizes them with speech. This approach reduces language ambiguity by translating implicit user intentions into explicit parametric attributes, thus lifting the model generation performance. We conducted a two-session user study testing and comparing it with traditional language and sketch inputs. This work streamlines the parametric modeling workflow and explores novel multimodal interaction paradigms for LLM-empowered design and creation.2026RDRunlin Duan et al.Purdue UniversityHand Gesture RecognitionGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
ARify: Leveraging Narrated Instructional Videos to Create Augmented Reality Tutorials for Procedural TasksAugmented Reality (AR) tutorials enhance procedural task learning by providing situated, step-by-step guidance. Yet, creating such tutorials requires AR authoring expertise, posing a significant entry barrier. To lower this barrier, we introduce ARify, an authoring system that semi-automatically transforms narrated instructional videos into AR tutorials. To guide system design, we conducted a content analysis of video tutorials and derived a design space of instructional intents, tactics, and AR representations. Building on this, ARify generates AR tutorials by integrating a vision–language model to plan tutorial structures and an AR builder to configure AR representations, and offers interfaces that allow users to refine and customize the results. A numerical study on three machine tasks and a user study with 18 participants showed that ARify achieves promising performance across task types, and allows novices to author effective AR tutorials, validating its effectiveness and usability.2026XHXiyun Hu et al.Purdue UniversityAR Navigation & Context AwarenessPrototyping & User TestingMixed Reality WorkspacesCHI
AgentCoach: LLM-Based Adaptive Coaching Feedback for Motor Skill LearningWe present AgentCoach, an LLM-powered system that provides adaptive feedback for motor skill learning from tutorial videos. The system works by extracting key coaching points (CPs) and compiling CP-specific evaluators that map each cue to measurable kinematic parameters. This process allows AgentCoach to connect high-level semantic meaning with low-level postural estimation for accurate, context-aware evaluation. During practice, learners receive concise visual diagnostics of their mistakes paired with prescriptive verbal feedback that adapts based on their performance history. We technically validate the CP extraction and evaluator compilation across a wide range of common sports and exercise videos. A user study confirms the system's usability and shows the system's potential effectiveness of its adaptive feedback across multiple skills.2026DMDizhi Ma et al.Purdue UniversityHuman Pose & Activity RecognitionFitness Tracking & Physical Activity MonitoringBehavior Change & Reflection TechnologyCHI
AmIWrite: Exploring Scalable One-on-One Handwriting-Based Tutoring for Mathematical Problem-Solving with an LLM-Powered AI TutorReal-time handwriting interactions between tutors and students —where tutors observe individual problem-solving processes, provide personalized annotations, and adapt explanations based on students' work—are fundamental to effective STEM tutoring. However, scaling such personalized handwriting-based tutoring remains challenging—human tutors cannot be available to every student on demand, and current online platforms often fail to recreate equivalent learning experiences. As an initial step toward tackling this challenge, we present AmIWrite, an LLM-powered AI tutoring system for mathematical problem-solving that provides real-time co-speech handwriting interactions on tablet devices, instantiated here as a case study in linear algebra. We conducted a within-subjects study (N = 40) comparing AmIWrite to a text-based AI tutor on two linear algebra topics. Our case study demonstrates how a multimodal AI tutor can preserve the pedagogical benefits of handwriting-based math tutoring and offer a potential path toward more scalable one-on-one STEM tutoring.2026ZLZiyi Liu et al.Purdue UniversityHand Gesture RecognitionIntelligent Tutoring Systems & Learning AnalyticsTangible Interaction in EducationCHI
agentAR: Creating Augmented Reality Applications with Tool-Augmented LLM-based Autonomous AgentsCreating Augmented Reality (AR) applications requires expertise in both design and implementation, posing significant barriers to entry for non-expert users. While existing methods reduce some of this burden, they often fall short in flexibility or usability for complex or varied use cases. To address this, we introduce agentAR, an AR authoring system that leverages a tool-augmented large language model (LLM)–based autonomous agent to support end-to-end, in-situ AR application creation from natural language input. Built on an application structure and tool library derived from state-of-the-art AR research, the agent autonomously creates AR applications from natural language dialogue. We demonstrate the effectiveness of agentAR through a case study of six AR applications and a user study with twelve participants, showing that it significantly reduces user effort while supporting the creation of diverse and functional AR experiences.2025CZChenfei Zhu et al.AR Navigation & Context AwarenessGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationUIST
GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual RealityLarge Language Model (LLM)-based copilots have shown great potential in Extended Reality (XR) applications. However, the user faces challenges when describing the 3D environments to the copilots due to the complexity of conveying spatial-temporal information through text or speech alone. To address this, we introduce GesPrompt, a multimodal XR interface that combines co-speech gestures with speech, allowing end-users to communicate more naturally and accurately with LLM-based copilots in XR environments. By incorporating gestures, GesPrompt extracts spatial-temporal reference from co-speech gestures, reducing the need for precise textual prompts and minimizing cognitive load for end-users. Our contributions include (1) a workflow to integrate gesture and speech input in the XR environment, (2) a prototype VR system that implements the workflow, and (3) a user study demonstrating its effectiveness in improving user communication in VR environments.2025XHXiyun Hu et al.Hand Gesture RecognitionMixed Reality WorkspacesHuman-LLM CollaborationDIS
avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented RealityTable tennis stroke training is a critical aspect of player development. We designed a new augmented reality (AR) system, avaTTAR, for table tennis stroke training. The system provides both “on-body” (first-person view) and “detached” (third-person view) visual cues, enabling users to visualize target strokes and correct their attempts effectively with this dual perspectives setup. By employing a combination of pose estimation algorithms and IMU sensors, avaTTAR captures and reconstructs the 3D body pose and paddle orientation of users during practice, allowing real-time comparison with expert strokes. Through a user study, we affirm avaTTAR ’s capacity to amplify player experience and training results2024DMDizhi Ma et al.Full-Body Interaction & Embodied InputAR Navigation & Context AwarenessVR Medical Training & RehabilitationUIST
ClassMeta: Designing Interactive Virtual Classmate to Promote VR Classroom ParticipationPeer influence plays a crucial role in promoting classroom participation, where behaviors from active students can contribute to a collective classroom learning experience. However, the presence of these active students depends on several conditions and is not consistently available across all circumstances. Recently, Large Language Models (LLMs) such as GPT have demonstrated the ability to simulate diverse human behaviors convincingly due to their capacity to generate contextually coherent responses based on their role settings. Inspired by this advancement in technology, we designed ClassMeta, a GPT-4 powered agent to help promote classroom participation by playing the role of an active student. These agents, which are embodied as 3D avatars in virtual reality, interact with actual instructors and students with both spoken language and body gestures. We conducted a comparative study to investigate the potential of ClassMeta for improving the overall learning experience of the class.2024ZLZiyi Liu et al.Purdue UniversitySocial & Collaborative VRHuman-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsCHI
Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented RealityEdges are one of the most ubiquitous geometric features of physical objects. They provide accurate haptic feedback and easy-to-track features for camera systems, making them an ideal basis for Tangible User Interfaces (TUI) in Augmented Reality (AR). We introduce Ubi Edge, an AR authoring tool that allows end-users to customize edges on daily objects as TUI inputs to control varied digital functions. We develop an integrated AR device and an integrated vision-based detection pipeline that can track 3D edges and detect the touch interaction between fingers and edges. Leveraging the spatial awareness of AR, users can simply select an edge by sliding fingers along it and then make the edge interactive by connecting it to various digital functions. We demonstrate four use cases including multi-function controllers, smart homes, games, and TUI-based tutorials. We also evaluated and proved our system’s usability through a two-session user study, where qualitative and quantitative results are positive.2023FHFengming He et al.Purdue UniversityShape-Changing Interfaces & Soft Robotic MaterialsAR Navigation & Context AwarenessCHI
ARnnotate: An Augmented Reality Interface for Collecting Custom Dataset of 3D Hand-Object Interaction Pose EstimationVision-based 3D pose estimation has substantial potential in hand-object interaction applications and requires user-specified datasets to achieve robust performance. We propose ARnnotate, an Augmented Reality (AR) interface enabling end-users to create custom data using a hand-tracking-capable AR device. Unlike other dataset collection strategies, ARnnotate first guides a user to manipulate a virtual bounding box and records its poses and the user's hand joint positions as the labels. By leveraging the spatial awareness of AR, the user manipulates the corresponding physical object while following the in-situ AR animation of the bounding box and hand model, while ARnnotate captures the user's first-person view as the images of the dataset. A 12-participant user study was conducted, and the results proved the system's usability in terms of the spatial accuracy of the labels, the satisfactory performance of the deep neural networks trained with the data collected by ARnnotate, and the users' subjective feedback.2022XQXun Qian et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionHuman Pose & Activity RecognitionUIST
ScalAR: Authoring Semantically Adaptive Augmented Reality Experiences in Virtual RealityAugmented Reality (AR) experiences tightly associate virtual contents with environmental entities. However, the dissimilarity of different environments limits the adaptive AR content behaviors under large-scale deployment. We propose ScalAR, an integrated workflow enabling designers to author semantically adaptive AR experiences in Virtual Reality (VR). First, potential AR consumers collect local scenes with a semantic understanding technique. ScalAR then synthesizes numerous similar scenes. In VR, a designer authors the AR contents' semantic associations and validates the design while being immersed in the provided scenes. We adopt a decision-tree-based algorithm to fit the designer’s demonstrations as a semantic adaptation model to deploy the authored AR experience in a physical scene. We further showcase two application scenarios authored by ScalAR and conduct a two-session user study where the quantitative results prove the accuracy of the AR content rendering and the qualitative results show the usability of ScalAR.2022XQXun Qian et al.Purdue UniversityAR Navigation & Context AwarenessMixed Reality WorkspacesCHI
GesturAR: An Authoring System for Creating Freehand Interactive Augmented Reality ApplicationsThe freehand gesture is an essential input modality for modern Augmented Reality (AR) user experiences. However, developing AR applications with customized hand interactions remains a challenge for end-users. Therefore, we propose GesturAR, an end-to-end authoring tool that supports users to create in-situ freehand AR applications through embodied demonstration and visual programming. During authoring, users can intuitively demonstrate the customized gesture inputs while referring to the spatial and temporal context. Based on the taxonomy of gestures in AR, we proposed a hand interaction model which maps the gesture inputs to the reactions of the AR contents. Thus, users can author comprehensive freehand applications using trigger-action visual programming and instantly experience the results in AR. Further, we demonstrate multiple application scenarios enabled by GesturAR, such as interactive virtual objects, robots, and avatars, room-level interactive AR spaces, embodied AR presentations, etc. Finally, we evaluate the performance and usability of GesturAR through a user study.2021TWTianyi Wang et al.Hand Gesture RecognitionAR Navigation & Context AwarenessUIST
CAPturAR: An Augmented Reality Tool for Authoring Human-Involved Context-Aware ApplicationsRecognition of human behavior plays an important role in context-aware applications. However, it is still a challenge for end-users to build personalized applications that accurately recognize their own activities. Therefore, we present CAPturAR, an in-situ programming tool that supports users to rapidly author context-aware applications by referring to their previous activities. We customize an AR head-mounted device with multiple camera systems that allow for non-intrusive capturing of user’s daily activities. During authoring, we reconstruct the captured data in AR with an animated avatar and use virtual icons to represent the surrounding environment. With our visual programming interface, users create human-centered rules for the applications and experience them instantly in AR. We further demonstrate four use cases enabled by CAPturAR. Also, we verify the effectiveness of the AR-HMD and the authoring workflow with a system evaluation using our prototype. Moreover, we conduct a remote user study in an AR simulator to evaluate the usability.2020TWTianyi Wang et al.Human Pose & Activity RecognitionAR Navigation & Context AwarenessMixed Reality WorkspacesUIST