Synthia: Visually Interpreting and Synthesizing Feedback for Writing RevisionWhile recent advances in HCI and generative AI have improved authors' access to feedback on their work, the abundance of critiques can overwhelm writers and obscure actionable insights. We introduce Synthia, a system that visually scaffolds feedback-based writing revision with LLM-powered synthesis. Synthia helps authors strategize their revisions by breaking down large feedback collections into interactive visual bubbles that can be clustered, colored, and resized to reveal patterns and highlight valuable suggestions. Bidirectional highlighting links each feedback unit to its original context and relevant parts of the text. Writers can selectively combine feedback units to generate alternative drafts, enabling rapid, parallel exploration of revision possibilities. These interactions support feedback curation, interpretation, and experimentation throughout the revision process. A within-subjects study (N=12) showed that Synthia helped participants identify more helpful feedback, explore more diverse revisions, and revise with greater intentionality and transparency than a GPT-4-based writing interface.2025CZRuidong Zhang et al.Generative AI (Text, Image, Music, Video)Human-LLM CollaborationInteractive Data VisualizationUIST
SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose TrackingSeams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the clothing surface, our solution leverages existing seams inside of a shirt by machine-sewing insulated conductive threads over the seams. The unique invisibilities and placements of the seams afford the sensing shirt to look and wear similarly as a conventional shirt while providing exciting pose-tracking capabilities. To validate this approach, we implemented a proof-of-concept untethered shirt with 8 capacitive sensing seams. With a 12-participant user study, our customized deep-learning pipeline accurately estimates the relative (to the pelvis) upper-body 3D joint positions with a mean per joint position error (MPJPE) of 6.0 cm. SeamPose represents a step towards unobtrusive integration of smart clothing for everyday pose estimation.2024TYTianhong Catherine Yu et al.Haptic WearablesHuman Pose & Activity RecognitionBiosensors & Physiological MonitoringUIST
EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a WristbandOur hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction contexts is critical for human-computer interaction (HCI). We present EchoWrist, a low-power wristband that continuously estimates 3D hand poses and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible sound waves toward the hand. These sound waves interact with the hand and its surroundings through reflections and diffractions, carrying rich information about the hand's shape and the objects it interacts with. The information captured by the two microphones goes through a deep learning inference system that recovers hand poses and identifies various everyday hand activities. Results from the two 12-participant user studies show that EchoWrist is effective and efficient at tracking 3D hand poses and recognizing hand-object interactions. Operating at 57.9 mW, EchoWrist can continuously reconstruct 20 3D hand joints with MJEDE of 4.81 mm and recognize 12 naturalistic hand-object interactions with 97.6% accuracy.2024CLChi-Jung Lee et al.Cornell UniversityHand Gesture RecognitionFoot & Wrist InteractionCHI
EyeEcho: Continuous and Low-power Facial Expression Tracking on GlassesIn this paper, we introduce EyeEcho, a minimally-obtrusive acoustic sensing system designed to enable glasses to continuously monitor facial expressions. It utilizes two pairs of speakers and microphones mounted on glasses, to emit encoded inaudible acoustic signals directed towards the face, capturing subtle skin deformations associated with facial expressions. The reflected signals are processed through a customized machine-learning pipeline to estimate full facial movements. EyeEcho samples at 83.3 Hz with a relatively low power consumption of 167 mW. Our user study involving 12 participants demonstrates that, with just four minutes of training data, EyeEcho achieves highly accurate tracking performance across different real-world scenarios, including sitting, walking, and after remounting the devices. Additionally, a semi-in-the-wild study involving 10 participants further validates EyeEcho's performance in naturalistic scenarios while participants engage in various daily activities. Finally, we showcase EyeEcho's potential to be deployed on a commercial-off-the-shelf (COTS) smartphone, offering real-time facial expression tracking.2024KLKe Li et al.Cornell UniversityHand Gesture RecognitionEye Tracking & Gaze InteractionHuman Pose & Activity RecognitionCHI
PoseSonic: 3D Upper Body Pose Estimation Through Egocentric Acoustic Sensing on Smartglasses"In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications." https://doi.org/10.1145/36108952023SMSaif Mahmud et al.Human Pose & Activity RecognitionContext-Aware ComputingUbiComp
VRoxy: Wide-Area Collaboration From an Office Using a VR-Driven Robotic ProxyRecent research in robotic proxies has demonstrated that one can automatically reproduce many non-verbal cues important in co-located collaboration. However, they often require a symmetrical hardware setup in each location. We present the VRoxy system, designed to enable access to remote spaces through a robotic embodiment, using a VR headset in a much smaller space, such as a personal office. VRoxy maps small movements in VR space to larger movements in the physical space of the robot, allowing the user to navigate large physical spaces easily. Using VRoxy, the VR user can quickly explore and navigate in a low-fidelity rendering of the remote space. Upon the robot's arrival, the system uses the feed of a 360 camera to support real-time interactions. The system also facilitates various interaction modalities by rendering the micro-mobility around shared spaces, head and facial animations, and pointing gestures on the proxy. We demonstrate how our system can accommodate mapping multiple physical locations onto a unified virtual space. In a formative study, users could complete a design decision task where they navigated and collaborated in a complex 7.5m x 5m layout using a 3m x 2m VR space.2023MSMose Sakashita et al.Teleoperation & TelepresenceUIST
C-Auth: Exploring the Feasibility of Using Egocentric View of Face Contour for User Authentication on GlassesIn this paper, we present C-Auth, a novel authentication method for smart glasses that explores the feasibility of authenticating users using spatial facial information. Our system uses a down-facing camera in the middle of the glasses to capture facial contour lines from the nose and cheeks. The images captured by the camera are then processed and learned by a customized algorithm for authentication. To evaluate the system, we conducted a user study with 20 participants in three sessions on different days. Our system correctly identified the 20 users with a true positive rate of 98.0% (SD: 2.96%) and a false positive rate of 4.97% (2.88 %) across all three days. We conclude by discussing current limitations and challenges as well as the potential future applications for C-Auth.2023HLHyunchul Lim et al.Passwords & AuthenticationOn-Skin Display & On-Skin InputUbiComp
HPSpeech: Silent Speech Interface for Commodity HeadphonesWe present HPSpeech, a silent speech interface for commodity headphones. HPSpeech utilizes the existing speakers of the headphones to emit inaudible acoustic signals. The movements of the temporomandibular joint (TMJ) during speech modify the reflection pattern of these signals, which are captured by a microphone positioned inside the headphones. To evaluate the performance of HPSpeech, we tested it on two headphones with a total of 18 participants. The results demonstrated that HPSpeech successfully recognized 8 popular silent speech commands for controlling the music player with an accuracy over 90%. While our tests use modified commodity hardware (both with and without active noise cancellation), our results show that sensing the movement of the TMJ could be as simple as a firmware update for ANC headsets which already include a microphone inside the hear cup. This leaves us to believe that this technique has great potential for rapid deployment in the near future. We further discuss the challenges that need to be addressed before deploying HPSpeech at scale.2023RZRuidong Zhang et al.Voice User Interface (VUI) DesignUbiComp
EchoNose: Sensing Mouth, Breathing and Tongue Gestures inside Oral Cavity using a Non-contact Nose InterfaceSensing movements and gestures inside the oral cavity has been a long-standing challenge for the wearable research community. This paper introduces EchoNose, a novel nose interface that explores a unique sensing approach to recognize gestures related to mouth, breathing, and tongue by analyzing the acoustic signal reflections inside the nasal and oral cavities. The interface incorporates a speaker and a microphone placed at the nostrils, emitting inaudible acoustic signals and capturing the corresponding reflections. These received signals were processed using a customized data processing and machine learning pipeline, enabling the distinction of 16 gestures involving speech, tongue, and breathing. A user study with 10 participants demonstrates that EchoNose achieves an average accuracy of 93.7% in recognizing these 16 gestures. Based on these promising results, we discuss the potential opportunities and challenges associated with applying this innovative nose interface in various future applications.2023RSRujia Sun et al.Electrical Muscle Stimulation (EMS)Hand Gesture RecognitionBrain-Computer Interface (BCI) & NeurofeedbackUbiComp
EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic SensingWe present EchoSpeech, a minimally-obtrusive silent speech interface (SSI) powered by low-power active acoustic sensing. EchoSpeech uses speakers and microphones mounted on a glass-frame and emits inaudible sound waves towards the skin. By analyzing echos from multiple paths, EchoSpeech captures subtle skin deformations caused by silent utterances and uses them to infer silent speech. With a user study of 12 participants, we demonstrate that EchoSpeech can recognize 31 isolated commands and 3-6 figure connected digits with 4.5% (std 3.5%) and 6.1% (std 4.2%) Word Error Rate (WER), respectively. We further evaluated EchoSpeech under scenarios including walking and noise injection to test its robustness. We then demonstrated using EchoSpeech in demo applications in real-time operating at 73.3mW, where the real-time pipeline was implemented on a smartphone with only 1-6 minutes of training data. We believe that EchoSpeech takes a solid step towards minimally-obtrusive wearable SSI for real-life deployment.2023RZRuidong Zhang et al.Cornell UniversityVibrotactile Feedback & Skin StimulationVoice User Interface (VUI) DesignBiosensors & Physiological MonitoringCHI
ReMotion: Supporting Remote Collaboration in Open Space with Automatic Robotic EmbodimentDesign activities, such as brainstorming or critique, often take place in open spaces combining whiteboards and tables to present artefacts. In co-located settings, peripheral awareness enables participants to understand each other’s locus of attention with ease. However, these spatial cues are mostly lost while using videoconferencing tools. Telepresence robots could bring back a sense of presence, but controlling them is distracting. To address this problem, we present ReMotion, a fully automatic robotic proxy designed to explore a new way of supporting non-collocated open-space design activities. ReMotion combines a commodity body tracker (Kinect) to capture a user’s location and orientation over a wide area with a minimally invasive wearable system (NeckFace) to capture facial expressions. Due to its omnidirectional platform, ReMotion embodiment can render a wide range of body movements. A formative evaluation indicated that our system enhances the sharing of attention and the sense of co-presence enabling seamless movement-in-space during a design review task.2023MSMose Sakashita et al.Cornell UniversityHuman-Robot Collaboration (HRC)Teleoperation & TelepresenceCHI
Configuring Audiences: A Case Study of Email CommunicationWhen people communicate with each other, their choice of what to say is tied to their perceptions of the audience. For many communication channels, people have some ability to explicitly specify their audience members and the different roles they can play. While existing accounts of communication behavior have largely focused on how people tailor the content of their messages, we focus on the configuring of the audience as a complementary family of decisions in communication. We formulate a general description of audience configuration choices, highlighting key aspects of the audience that could be configured to reflect a range of communicative goals. We then illustrate these ideas via a case study of email usage---a realistic domain where audience configuration choices are particularly fine-grained and explicit in how email senders fill the To and Cc address fields. In a large collection of enterprise emails, we explore how people configure their audiences, finding salient patterns relating a sender's choice of configuration to the types of participants in the email exchange, the content of the message, and the nature of the subsequent interactions. Our formulation and findings illustrate how audience configurations could be analyzed as meaningful communication choices, and frame research directions on audience configuration decisions in communication and collaboration.2020JZRuidong Zhang et al.Conversation and CommunicationCSCW
Characterizing online public discussions through patterns of participant interactionsPublic discussions on social media platforms are an intrinsic part of online information consumption. Characterizing the diverse range of discussions that can arise is crucial for these platforms, as they may seek to organize and curate them. This paper introduces a computational framework to characterize public discussions, relying on a representation that captures a broad set of social patterns which emerge from the interactions between interlocutors, comments and audience reactions. We apply our framework to study public discussions on Facebook at two complementary scales. First, we use it to predict the eventual trajectory of individual discussions, anticipating future antisocial actions (such as participants blocking each other) and forecasting a discussion's growth. Second, we systematically analyze the variation of discussions across thousands of Facebook sub-communities, revealing subtle differences (and unexpected similarities) in how people interact when discussing online content. We further show that this variation is driven more by participant tendencies than by the content triggering these discussions.2018JZRuidong Zhang et al.Online Discussion and EngagementCSCW