Sensing Your Vocals: Exploring the Activity of Vocal Cord Muscles for Pitch Assessment Using Electromyography and UltrasonographyVocal training is difficult because the muscles that control pitch, resonance, and phonation are internal and invisible to learners. This paper investigates how Electromyography (EMG) and ultrasonic imaging (UI) can make these muscles observable for training purposes. We report three studies. First, we analyze the EMG and UI data from 16 singers (beginners, experienced \& professionals), revealing differences among three vocal groups of the muscle control proficiency. Second, we use the collected data to create a system that visualizes an expert's muscle activity as reference. This system is tested in a user study with 12 novices, showing that EMG highlighted muscle activation nuances, while UI provided insights into vocal cord length and dynamics. Third, to compare our approach to traditional methods (audio analysis and coach instructions), we conducted a focus group study with 15 experienced singers. Our results suggest that EMG is promising for improving vocal skill development and enhancing feedback systems. We conclude the paper with a detailed comparison of the analyzed modalities (EMG, UI and traditional methods), resulting in recommendations to improve vocal muscle training systems.2026KCKanyu Chen et al.Graduate School of Media DesignBiosensors & Physiological MonitoringEmotion Recognition & DetectionAffective Feedback & Emotion Regulation InterfacesCHI
SoleCoach: Sole Pressure and IMU-based MLLMs for Skill CoachingIn sports training, individualized skill assessment and feedback are essential for athletes to master complex movements and enhance performance. Existing approaches for generating coaching comments primarily rely on externally captured pose information, which limits their applicability in outdoor sports such as skiing that involve large-scale movement. To address this challenge, we propose a method for presenting athletes' postures and generating coaching feedback solely based on foot pressure and IMU data collected from insole sensors. In our approach, a large language model directly interprets foot pressure signals to provide actionable coaching, thereby supporting independent practice. Through model evaluation and user studies, we demonstrate that the proposed method generates expert-level feedback and outperforms pose-based approaches. Furthermore, the user study shows that the feedback helps athletes identify body parts requiring correction and enhances their motivation for training.2026THToshihiro Hirano et al.Institute of Science TokyoHuman Pose & Activity RecognitionGenerative AI (Text, Image, Music, Video)Fitness Tracking & Physical Activity MonitoringCHI
Chromotion: Controlling Motion-Induced Color on Object Motion Paths via High-Speed Temporal Additive ProjectionWe present Chromotion, a high-speed projection method that renders intended colors along the motion trajectories of moving objects. When an object moves across a temporally multiplexed sequence, its occlusion of the projected patterns can, through persistence of vision, produce motion dependent colors along its path. Chromotion exploits this phenomenon by decomposing each static image into a short sequence in which target color frames are interleaved with a single complementary color frame. This temporal design allows moving objects to sample the sequence so that the perceived color along their motion paths converges to the target color, while stationary regions still integrate to the original static color. We built a prototype and conducted a camera based technical evaluation and user evaluations. The results show that Chromotion reliably produces the target color on motion trajectories without degrading static color fidelity. Because the approach requires no body or gaze tracking and no decoding of embedded information, it scales to public settings and supports multiuser and multimodal interactions. We also discuss limitations, and outline application scenarios such as public, ambient displays that blend into the environment.2026SMShio Miyafuji et al.Institute of Science TokyoDigital Signage & Ambient DisplaysInteractive Floors & Spatial InterfacesCHI
PiaMuscle: Improving Piano Skill Acquisition by Cost-effectively Estimating and Visualizing Activities of Miniature Hand MusclesUnderstanding neuromusculoskeletal mechanisms significantly impacts skill specialization and proficiency. While existing methods can infer large muscle activities during gross motor movements, the estimation of dexterous motor control involving miniature muscles remains underexplored. Targeting the coordinated hand muscles in advanced piano performance, we learn spatiotemporal discrete representations of electromyography (EMG) data and hand postures utilizing a multimodal dataset. Subsequently, we train a precise and cost-effective neural network model. Based on this model, PiaMuscle is introduced to investigate if visualizing muscle activities during piano training enhances piano performance. Quantitative and qualitative results of a user study with highly skilled professional pianists demonstrate that PiaMuscle provides reliable muscle activation data to support and optimize force control. Our research underscores the potential of a naturalistic workflow to estimate small muscles' activities from readily accessible human-centric information and more accurately when combined with tool-centric data, thereby enhancing skill acquisition.2025RLRuofan Liu et al.Tokyo Institute of Technology, School of Computing; Sony Computer Science Laboratories Inc.Human Pose & Activity RecognitionBiosensors & Physiological MonitoringCHI
SolePoser: Real-Time 3D Human Pose Estimation using Insole Pressure SensorsWe propose SolePoser, a real-time 3D pose estimation system that leverages only a single pair of insole sensors. Unlike conventional methods relying on fixed cameras or bulky wearable sensors, our approach offers minimal and natural setup requirements. The proposed system utilizes pressure and IMU sensors embedded in insoles to capture the body weight's pressure distribution at the feet and its 6 DoF acceleration. This information is used to estimate the 3D full-body joint position by a two-stream transformer network. A novel double-cycle consistency loss and a cross-attention module are further introduced to learn the relationship between 3D foot positions and their pressure distributions. We also introduced two different datasets of sports and daily exercises, offering 908k frames across eight different activities. Our experiments show that our method's performance is on par with top-performing approaches, which utilize more IMUs and even outperform third-person-view camera-based methods in certain scenarios.2024EWErwin Wu et al.Foot & Wrist InteractionHuman Pose & Activity RecognitionUIST
MOSion: Gaze Guidance with Motion-triggered Visual Cues by Mosaic PatternsWe propose a gaze-guiding method called MOSion to adjust the guiding strength reacted to observers’ motion based on a high-speed projector and the afterimage effect in the human vision system. Our method decomposes the target area into mosaic patterns to embed visual cues in the perceived images. The patterns can only direct the attention of the moving observers to the target area. The stopping observer can see the original image with little distortion because of light integration in the visual perception. The pre computation of the patterns provides the adaptive guiding effect without tracking devices and computational costs depending on the movements. The evaluation and the user study show that the mosaic decomposition enhances the perceived saliency with a few visual artifacts, especially in moving conditions. Our method embedded in white lights works in various situations such as planar posters, advertisements, and curved objects.2024AKArisa Kohtani et al.Tokyo Institute of TechnologyEye Tracking & Gaze InteractionVisualization Perception & CognitionCHI
MR Microsurgical Suture Training System with Level-Appropriate SupportThe integration of advanced technologies in healthcare necessitates the development of systems accommodating the daily routines in medical practices. Neurosurgeons, in particular, require extensive practice in microsurgical suturing in the long term, even in the busy routine of a medical practice. This study collaboratively developed a Mixed Reality system with neurosurgeons to support self-training in microscopic suturing. Based on the neurosurgeons' opinions, we implemented a level-appropriate microsurgical suture training system. For novices, the system offers shadow-matching training to support the practice of precise movements under the high-sensitivity environment of the microscope. For intermediates, it provides a real-time feedback system, which allows users to practice attention to details. Evaluation involved testing the novice system on students with no medical background and the intermediate system on neurosurgery residents. The effectiveness of the system was demonstrated through the experimental results and subsequent discussion.2024YTYuka Tashiro et al.Tokyo Institute of TechnologyMixed Reality WorkspacesVR Medical Training & RehabilitationRobots in Education & HealthcareCHI
VabricBeads : Variable Stiffness Structured Fabric using Artificial Muscle in Woven BeadsWoven beads, a structured fabric category, comprises interconnected rows of beads joined by fiber strands. While the stiffness of woven beads can be adjusted by relying on fiber tension during fabrication, the resulting shape and stiffness properties remain fixed. This study explores the potential of tunable shape and stiffness in woven beads, offering adaptability in comfort, functionality, and form factor. By leveraging Pneumatic Artificial Muscles (PAMs), we employ a state-of-the-art technique for dynamically modulating fabric stiffness through mechanical constraints in bead form. This approach enables a modular and scalable fabrication process, fostering programmability in mechanical properties. Our investigation encompasses diverse bead iterations and stitching patterns to broaden their applicability in fabric behavior including degree of freedom, stretchability, permeability, and textures. We evaluate the mechanical properties to differentiate design capabilities, and present techniques for locally adjusting stiffness. We showcase the versatility through applications, including variable stiffness wearables and shape-changing everyday objects.2024JPJefferson Pardomuan et al.Tokyo Institute of TechnologyHaptic WearablesShape-Changing Interfaces & Soft Robotic MaterialsShape-Changing Materials & 4D PrintingCHI
OmniSense: Exploring Novel Input Sensing and Interaction Techniques on Mobile Device with an Omni-Directional CameraAn omni-directional (360°) camera captures the entire viewing sphere surrounding its optical center. Such cameras are growing in use to create highly immersive content and viewing experiences. When such a camera is held by a user, the view includes the user's hand grip, finger, body pose, face, and the surrounding environment, providing a complete understanding of the visual world and context around it. This capability opens up numerous possibilities for rich mobile input sensing. In OmniSense, we explore the broad input design space for mobile devices with a built-in omni-directional camera and broadly categorize them into three sensing pillars: i) near device ii) around device and iii) surrounding device. In addition we explore potential use cases and applications that leverage these sensing capabilities to solve user needs. Following this, we develop a working system to put these concepts into action, by leveraging these sensing capabilities to enable potential use cases and applications. We studied the system in a technical evaluation and a preliminary user study to gain initial feedback and insights. Collectively these techniques illustrate how a single, omni-purpose sensor on a mobile device affords many compelling ways to enable expressive input, while also affording a broad range of novel applications that improve user experience during mobile interaction.2023HYHui-Shyong Yeo et al.HuaweiEye Tracking & Gaze InteractionImmersion & Presence Research360° Video & Panoramic ContentCHI
MonoEye: Multimodal Human Motion Capture System Using A Single Ultra-Wide Fisheye CameraWe present MonoEye, a multimodal human motion capture system using a single RGB camera with an ultra-wide fisheye lens, mounted on the user’s chest. Existing optical motion capture systems use multiple cameras, which are synchronized and require camera calibration. These systems also have usability constraints that limit the user’s movement and operating space. Since the MonoEye system is based on a wearable single RGB camera, the wearer’s 3D body pose can be captured without space and environment limitations. The body pose, captured with our system, is aware of the camera orientation and therefore it is possible to recognize various motions that existing egocentric motion capture systems cannot recognize. Furthermore, the proposed system captures not only the wearer’s body motion but also their viewport using the head pose estimation and an ultra-wide image. To implement robust multimodal motion capture, we design three deep neural networks: BodyPoseNet, HeadPoseNet, and CameraPoseNet, that estimate 3D body pose, head pose, and camera pose in real-time, respectively. We train these networks with our new extensive synthetic dataset providing 680K frames of renderings of people with a wide range of body shapes, clothing, actions, backgrounds, and lighting conditions. To demonstrate the interactive potential of the MonoEye system, we present several application examples from common body gestural to context-aware interactions.2020DHDong-Hyun Hwang et al.Full-Body Interaction & Embodied InputHuman Pose & Activity RecognitionContext-Aware ComputingUIST
Back-Hand-Pose: 3D Hand Pose Estimation for a Wrist-worn Camera via Dorsum Deformation NetworkThe automatic recognition of how people use their hands and fingers in natural settings – without instrumenting the fngers – can be useful for many mobile computing applications. To achieve such an interface, we propose a vision-based 3D hand pose estimation framework using a wrist-worn camera. The main challenge is the oblique angle of the wrist-worn camera, which makes the fngers scarcely visible. To address this, a special network that observes deformations on the back of the hand is required. We introduce DorsalNet, a two-stream convolutional neural network to regress fnger joint angles from spatio-temporal features of the dorsal hand region (the movement of bones, muscle, and tendons). This work is the frst vision-based real-time 3D hand pose estimator using visual features from the dorsal hand region. Our system achieves a mean joint-angle error of 8.81° for user-specifc models and 9.77° for a general model. Further evaluation shows that our system outperforms previous work with an average of 20% higher accuracy in recognizing dynamic gestures, and achieves a 75% accuracy of detecting 11 different grasp types. We also demonstrate 3 applications which employ our system as a control device, an input device, and a grasped object recognizer.2020EWErwin Wu et al.Hand Gesture RecognitionHuman Pose & Activity RecognitionUIST
OmniGlobeVR: A Collaborative 360-Degree Communication System for VRIn this paper, we present a novel collaboration tool, OmniGlobeVR, which is an asymmetric system that supports communication and collaboration between a VR user (occupant) and multiple non-VR users (designers) across the virtual and physical platform. OmniGlobeVR allows designer(s) to explore the VR space from any point of view using two view modes: a 360° first-person mode and a third-person mode. In addition, a shared gaze awareness cue is provided to further enhance communication between the occupant and the designer(s). Finally, the system has a face window feature that allows designer(s) to share their facial expressions and upper body view with the occupant for exchanging and expressing information using nonverbal cues. We conducted a user study to evaluate the OmniGlobeVR, comparing three conditions: (1) first-person mode with the face window, (2) first-person mode with a solid window, and (3) third-person mode with the face window. We found that the first-person mode with the face window required significantly less mental effort, and provided better spatial presence, usability, and understanding of the partner’s focus. We discuss the design implications of these results and directions for future research.2020ZLZhengqing Li et al.Social & Collaborative VRImmersion & Presence ResearchDIS
Opisthenar: Hand Poses and Finger Tapping Recognition by Observing Back of Hand Using Embedded Wrist CameraWe introduce a vision-based technique to recognize hand poses and gestures by simply observing changes on the back of the hand. Our approach employs a camera on the wrist, which we envisage can be included in a wrist-worn device such as a smartwatch, fitness tracker or wristband. However, in this configuration the fingers are occluded from the view of the camera. The oblique angle and placement of the camera make typical vision-based techniques difficult to adopt. Our alternative approach observes small changes and movements in the shape, tendons, skin and bone on the back of the hand. We uses a deep neural network to train and recognize both static hand poses and dynamic gestures. While this is a challenging configuration for sensing, we tested the recognition with a real-time user test and can achieve a high recognition rate of 89.4% (static) and 67.5% (dynamic). Our results further demonstrate that our approach can generalize across sessions and to new users. Namely, users can remove and replace the wrist-worn device while new users can employ a previously trained system, to a certain extent. This form of sensing affords a range of new interaction capabilities from one-handed to subtle inputs or eyes-free to orientation invariant interactions.2019HYHui-Shyong Yeo et al.Foot & Wrist InteractionEye Tracking & Gaze InteractionUIST