IMUCoCo: Enabling Flexible On-Body IMU Placement for Human Pose Estimation and Activity RecognitionIMUs are regularly used to sense human motion, recognize activities, and estimate full-body pose. Users are typically required to place sensors in predefined locations that are often dictated by common wearable form factors and the machine learning model's training process. Consequently, despite the increasing number of everyday devices equipped with IMUs, the limited adaptability has seriously constrained the user experience to only using a few well-explored device placements (e.g., wrist and ears). In this paper, we rethink IMU-based motion sensing by acknowledging that signals can be captured from any point on the human body. We introduce IMU over Continuous Coordinates (IMUCoCo), a novel framework that maps signals from a variable number of IMUs placed on the body surface into a unified feature space based on their spatial coordinates. These features can be plugged into downstream models for pose estimation and activity recognition. Our evaluations demonstrate that IMUCoCo supports accurate pose estimation in a wide range of typical and atypical sensor placements. Overall, IMUCoCo supports significantly more flexible use of IMUs for motion sensing than the state-of-the-art, allowing users to place their sensors-laden devices according to their needs and preferences. The framework also supports the ability to change device locations depending on the context and suggests placement depending on the use case.2025HZHaozhe Zhou et al.Human Pose & Activity RecognitionUIST
Scaling Context-Aware Task Assistants that Learn from Demonstration and Adapt through Mixed-Initiative DialogueDaily tasks such as cooking, machine operation, and medical self-care often require context-aware assistance, yet existing systems are hard to scale due to high training costs and unpredictable and imperfect performance. This work introduces the PrISM framework, which streamlines the process of creating an assistant for users' own tasks using demonstration and dialogue. First, our tracking algorithm effectively learns sensor representation for steps in procedures from a single demonstration. Second, and critically, to tackle the challenges of sensing imperfections and unpredictable user behaviors, we implement a dialogue-based context adaptation mechanism. The dialogue refines the system's understanding in real time, thereby reducing errors such as inappropriate responses to user queries. Evaluated through multiple studies involving several examples of daily tasks in a user's life, our approach demonstrates improved step-tracking accuracy, enhanced user interaction, and an improved sense of collaboration. These results promise a scalable, multimodal, context-aware assistant that effectively bridges the gap between human guidance and adaptive support in diverse real-world applications.2025RARiku Arakawa et al.Voice User Interface (VUI) DesignContext-Aware ComputingUbiquitous ComputingUIST
JoulesEye: Energy Expenditure Estimation and Respiration Sensing from Thermal Imagery While ExercisingAdhikary 等人开发 JoulesEye 热成像系统,通过分析人体热信号实现运动时能量消耗估算与呼吸频率监测,提供无接触健康监测方案。2024RARishiraj Adhikary et al.Fitness Tracking & Physical Activity MonitoringBiosensors & Physiological MonitoringUbiComp
Kirigami: Lightweight Speech Filtering for Privacy-preserving Activity Recognition using AudioBoovaraghavan 等人设计 Kirigami 轻量级语音过滤框架,通过过滤敏感语音特征在保护用户隐私前提下实现音频活动识别。2024SBSudershan Boovaraghavan et al.Privacy by Design & User ControlUbiComp
PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a SmartwatchWe routinely perform procedures (such as cooking) that include a set of atomic steps. Often, inadvertent omission or misordering of a single step can lead to serious consequences, especially for those experiencing cognitive challenges such as dementia. This paper introduces PrISM-Observer, a smartwatch-based, context-aware, real-time intervention system designed to support daily tasks by preventing errors. Unlike traditional systems that require users to seek out information, the agent observes user actions and intervenes proactively. This capability is enabled by the agent's ability to continuously update its belief in the user's behavior in real-time through multimodal sensing and forecast optimal intervention moments and methods. We first validated the steps-tracking performance of our framework through evaluations across three datasets with different complexities. Then, we implemented a real-time agent system using a smartwatch and conducted a user study in a cooking task scenario. The system generated helpful interventions, and we gained positive feedback from the participants. The general applicability of PrISM-Observer to daily tasks promises broad applications, for instance, including support for users requiring more involved interventions, such as people with dementia or post-surgical patients.2024RARiku Arakawa et al.Fitness Tracking & Physical Activity MonitoringElderly Care & Dementia SupportContext-Aware ComputingUIST
Bring Privacy To The Table: Interactive Negotiation for Privacy Settings of Shared Sensing DevicesTo address privacy concerns with the Internet of Things (IoT) devices, researchers have proposed enhancements in data collection transparency and user control. However, managing privacy preferences for shared devices with multiple stakeholders remains challenging. We introduced ThingPoll, a system that helps users negotiate privacy configurations for IoT devices in shared settings. We designed ThingPoll by observing twelve participants verbally negotiating privacy preferences, from which we identified potentially successful and inefficient negotiation patterns. ThingPoll bootstraps a preference model from a custom crowdsourced privacy preferences dataset. During negotiations, ThingPoll strategically scaffolds the process by eliciting users’ privacy preferences, providing helpful contexts, and suggesting feasible configuration options. We evaluated ThingPoll with 30 participants negotiating the privacy settings of 4 devices. Using ThingPoll, participants reached an agreement in 97.5% of scenarios within an average of 3.27 minutes. Participants reported high overall satisfaction of 83.3% with ThingPoll as compared to baseline approaches.2024HZHaozhe Zhou et al.Carnegie Mellon UniversityPrivacy by Design & User ControlIoT Device PrivacyCHI
EITPose: Wearable and Practical Electrical Impedance Tomography for Continuous Hand Pose EstimationReal-time hand pose estimation has a wide range of applications spanning gaming, robotics, and human-computer interaction. In this paper, we introduce EITPose, a wrist-worn, continuous 3D hand pose estimation approach that uses eight electrodes positioned around the forearm to model its interior impedance distribution during pose articulation. Unlike wrist-worn systems relying on cameras, EITPose has a slim profile (12 mm thick sensing strap) and is power-efficient (consuming only 0.3 W of power), making it an excellent candidate for integration into consumer electronic devices. In a user study involving 22 participants, EITPose achieves with a within-session mean per joint positional error of 11.06 mm. Its camera-free design prioritizes user privacy, yet it maintains cross-session and cross-user accuracy levels comparable to camera-based wrist-worn systems, thus making EITPose a promising technology for practical hand pose estimation.2024AKAlexander Kyu et al.Human Computer Interaction InstituteHaptic WearablesHand Gesture RecognitionFoot & Wrist InteractionCHI
MI-Poser: Human Body Pose Tracking Using Magnetic and Inertial Sensor Fusion with Metal Interference Mitigation"Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications." https://doi.org/10.1145/36108912023RARiku Arakawa et al.Human Pose & Activity RecognitionUbiComp
LemurDx: Using Unconstrained Passive Sensing for an Objective Measurement of Hyperactivity in Children with no Parent InputHyperactivity is the most dominant presentation of Attention-Deficit/Hyperactivity Disorder in young children. Currently, measuring hyperactivity involves parents' or teachers' reports. These reports are vulnerable to subjectivity and can lead to misdiagnosis. LemurDx provides an objective measure of hyperactivity using passive mobile sensing. We collected data from 61 children (25 with hyperactivity) who wore a smartwatch for up to 7 days without changing their daily routine. The participants' parents maintained a log of the child's activities at a half-hour granularity (e.g., sitting, exercising) as contextual information. Our ML models achieved 85.2% accuracy in detecting hyperactivity in children (using parent-provided activity labels). We also built models that estimated children's context from the sensor data and did not rely on activity labels to reduce parent burden. These models achieved 82.0% accuracy in detecting hyperactivity. In addition, we interviewed five clinicians who suggested a need for a tractable risk score that enables analysis of a child's behavior across contexts. Our results show the feasibility of supporting the diagnosis of hyperactivity by providing clinicians with an interpretable and objective score of hyperactivity using off-the-shelf watches and adding no constraints to children or their guardians. https://dl.acm.org/doi/10.1145/35962442023RARIKU ARAKAWA et al.Cognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Biosensors & Physiological MonitoringUbiComp
PrISM-Tracker: A Framework for Multimodal Procedure Tracking Using Wearable Sensors and State Transition Information with User-Driven Handling of Errors and UncertaintyA user often needs training and guidance while performing several daily life procedures, e.g., cooking, setting up a new appliance, or doing a COVID test. Watch-based human activity recognition (HAR) can track users' actions during these procedures. However, out of the box, state-of-the-art HAR struggles from noisy data and less-expressive actions that are often part of daily life tasks. This paper proposes PrISM-Tracker, a procedure-tracking framework that augments existing HAR models with (1) graph-based procedure representation and (2) a user-interaction module to handle model uncertainty. Specifically, PrISM-Tracker extends a Viterbi algorithm to update state probabilities based on time-series HAR outputs by leveraging the graph representation that embeds time information as prior. Moreover, the model identifies moments or classes of uncertainty and asks the user for guidance to improve tracking accuracy. We tested PrISM-Tracker in two procedures: latte-making in an engineering lab study and wound care for skin cancer patients at a clinic. The results showed the effectiveness of the proposed algorithm utilizing transition graphs in tracking steps and the efficacy of using simulated human input to enhance performance. This work is the first step toward human-in-the-loop intelligent systems for guiding users while performing new and complicated procedural tasks. https://dl.acm.org/doi/10.1145/35695042023RARiku Arakawa et al.Human Pose & Activity RecognitionBiosensors & Physiological MonitoringUbiComp
VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors"The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels)." https://doi.org/10.1145/36109072023PPPrasoon Patidar et al.Human Pose & Activity RecognitionBiosensors & Physiological MonitoringContext-Aware ComputingUbiComp
uKnit: A Position-aware Reconfigurable Machine-knitted Wearable for Gestural Interaction and Passive Sensing using Electrical Impedance Tomography A scarf is inherently reconfigurable: wearers often use it as a neck wrap, a shawl, a headband, a wristband, and more. We developed uKnit, a scarf-like soft sensor with scarf-like reconfigurability, built with machine knitting and electrical impedance tomography sensing. Soft wearable devices are comfortable and thus attractive for many human-computer interaction scenarios. While prior work has demonstrated various soft wearable capabilities, each capability is device- and location-specific, being incapable of meeting users' various needs with a single device. In contrast, uKnit explores the possibility of one-soft-wearable-for-all. We describe the fabrication and sensing principles behind uKnit, demonstrate several example applications, and evaluate it with 10-participant user studies and a washability test. uKnit achieves 88.0%/78.2% accuracy for 5-class worn-location detection and 80.4%/75.4% accuracy for 7-class gesture recognition with a per-user/universal model. Moreover, it identifies respiratory rate with an error rate of 1.25 bpm and detects binary sitting postures with an average accuracy of 86.2%.2023TYTianhong Catherine Yu et al.Carnegie Mellon UniversityElectrical Muscle Stimulation (EMS)Haptic WearablesHuman Pose & Activity RecognitionCHI
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsTracking body pose on-the-go could have powerful uses in fitness, mobile gaming, context-aware virtual assistants, and rehabilitation. However, users are unlikely to buy and wear special suits or sensor arrays to achieve this end. Instead, in this work, we explore the feasibility of estimating body pose using IMUs already in devices that many users own --- namely smartphones, smartwatches, and earbuds. This approach has several challenges, including noisy data from low-cost commodity IMUs, and the fact that the number of instrumentation points on a user's body is both sparse and in flux. Our pipeline receives whatever subset of IMU data is available, potentially from just a single device, and produces a best-guess pose. To evaluate our model, we created the IMUPoser Dataset, collected from 10 participants wearing or holding off-the-shelf consumer devices and across a variety of activity contexts. We provide a comprehensive evaluation of our system, benchmarking it on both our own and existing IMU datasets.2023VMVimal Mollyn et al.Carnegie Mellon UniversityHuman Pose & Activity RecognitionBiosensors & Physiological MonitoringCHI
FitNibble: A Field Study to Evaluate the Utility and Usability of Automatic Diet Monitoring in Food Journaling using an Eyeglasses-based WearableAutomatic diet monitoring systems (ADM) research has been aiming to increase the adoption of food journaling by making it as easy as counting steps with a smartwatch. Understanding the utility and usability of ADM systems is essential to inform new designs. This has been a challenging task since many ADM systems perform poorly in real-world settings. Therefore, the main focus of ADM research has been on improving ecological validity. This paper presents a preliminary evaluation of ADM’s utility and usability using an end-to-end system, FitNibble, which provides just-in-time notifications to remind users to journal as soon as they start eating. In this evaluation, we conducted a long-term field study to compare traditional self-report journaling and journaling with ADM. We recruited 13 participants from diverse backgrounds and asked them to try each journaling method for 9 days. Our results showed that FitNibble improved adherence by significantly reducing the number of missed events (19.6% improvement, p=.0132). Results have shown that participants were highly dependent on FitNibble in maintaining their journals. Participants also reported increased awareness of their dietary patterns especially with snacking. All these results highlight the potential of ADM in improving the food journaling experience.2022ABAbdelkareem Bedri et al.Motor Impairment Assistive Input TechnologiesDiet Tracking & Nutrition ManagementBiosensors & Physiological MonitoringIUI
Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse KinematicsWe present Pose-on-the-Go, a full-body pose estimation system that uses sensors already found in today’s smartphones. This stands in contrast to prior systems, which require worn or external sensors. We achieve this result via extensive sensor fusion, leveraging a phone’s front and rear cameras, the user-facing depth camera, touchscreen, and IMU. Even still, we are missing data about a user’s body (e.g., angle of the elbow joint), and so we use inverse kinematics to estimate and animate probable body poses. We provide a detailed evaluation of our system, benchmarking it against a professional-grade Vicon tracking system. We conclude with a series of demonstration applications that underscore the unique potential of our approach, which could be enabled on many modern smartphones with a simple software update.2021KAKaran Ahuja et al.Carnegie Mellon UniversityFull-Body Interaction & Embodied InputHuman Pose & Activity RecognitionCHI
PrivacyMic: Utilizing Inaudible Frequencies for Privacy Preserving Daily Activity RecognitionSound presents an invaluable signal source that enables computing systems to perform daily activity recognition. However, microphones are optimized for human speech and hearing ranges: capturing private content, such as speech, while omitting useful, inaudible information that can aid in acoustic recognition tasks. We simulated acoustic recognition tasks using sounds from 127 everyday household/workplace objects, finding that inaudible frequencies can act as a substitute for privacy-sensitive frequencies. To take advantage of these inaudible frequencies, we designed a Raspberry Pi-based device that captures inaudible acoustic frequencies with settings that can remove speech or all audible frequencies entirely. We conducted a perception study, where participants "eavesdropped" on PrivacyMic’s filtered audio and found that none of our participants could transcribe speech. Finally, PrivacyMic’s real-world activity recognition performance is comparable to our simulated results, with over 95% classification accuracy across all environments, suggesting immediate viability in performing privacy-preserving daily activity recognition.2021YIYasha Iravantchi et al.University of MichiganPrivacy Perception & Decision-MakingIoT Device PrivacyContext-Aware ComputingCHI
Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity RecognitionMillimeter wave (mmWave) Doppler radar is a new and promising sensing approach for human activity recognition, offering signal richness approaching that of microphones and cameras, but without many of the privacy-invading downsides. However, unlike audio and computer vision approaches that can draw from huge libraries of videos for training deep learning models, Doppler radar has no existing large datasets, holding back this otherwise promising sensing modality. In response, we set out to create a software pipeline that converts videos of human activities into realistic, synthetic Doppler radar data. We show how this cross-domain translation can be successful through a series of experimental results. Overall, we believe our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems, and could help bootstrap uses in human-computer interaction.2021KAKaran Ahuja et al.Carnegie Mellon UniversityHuman Pose & Activity RecognitionBrain-Computer Interface (BCI) & NeurofeedbackPrivacy Perception & Decision-MakingCHI
Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Device EcosystemsFuture homes and offices will feature increasingly dense ecosystems of IoT devices, such as smart lighting, speakers, and domestic appliances. Voice input is a natural candidate for interacting with out-of-reach and often small devices that lack full-sized physical interfaces. However, at present, voice agents generally require wake-words and device names in order to specify the target of a spoken command (e.g., “Hey Alexa, kitchen lights to full brightness”). In this research, we explore whether speech alone can be used as a directional communication channel, in much the same way visual gaze specifies a focus. Instead of a device’s microphones simply receiving and processing spoken commands, we suggest they also infer the Direction of Voice (DoV). Our approach innately enables voice commands with addressability (i.e., devices know if a command was directed at them) in a natural and rapid manner. We quantify the accuracy of our implementation across users, rooms, spoken phrases, and other key factors that affect performance and usability. Taken together, we believe our DoV approach demonstrates feasibility and the promise of making distributed voice interactions much more intuitive and fluid.2020KAKaran Ahuja et al.Voice User Interface (VUI) DesignContext-Aware ComputingSmart Home Interaction DesignUIST
Digital Ventriloquism: Giving Voice to Everyday ObjectsSmart speakers with voice agents are becoming increasingly common. However, the agent's voice always emanates from the device, even when that information is contextually and spatially relevant elsewhere. Digital Ventriloquism allows smart speakers to render sound onto everyday objects, such that it appears they are speaking and are interactive. This can be achieved without any modification of objects or the environment. For this, we used a highly directional pan-tilt ultrasonic array. By modulating a 40 kHz ultrasonic signal, we can emit sound that is inaudible "in flight" and demodulates to audible frequencies when impacting a surface through acoustic parametric interaction. This makes it appear as though the sound originates from an object and not the speaker. We ran a study in which we projected speech onto five objects in three environments, and found that participants were able to correctly identify the source object 92% of the time and correctly repeat the spoken message 100% of the time, demonstrating our digital ventriloquy is both directional and intelligible.2020YIYasha Iravantchi et al.University of Michigan Ann ArborMid-Air Haptics (Ultrasonic)Voice User Interface (VUI) DesignCHI
Eyes on the Road: Detecting Phone Usage by Drivers Using On-Device CamerasUsing a phone while driving is distracting and dangerous. It increases the accident chances by 400%. Several techniques have been proposed in the past to detect driver distraction due to phone usage. However, such techniques usually require instrumenting the user or the car with custom hardware. While detecting phone usage in the car can be done by using the phone's GPS, it is harder to identify whether the phone is used by the driver or one of the passengers. In this paper, we present a lightweight, software-only solution that uses the phone's camera to observe the car's interior geometry to distinguish phone position and orientation. We then use this information to distinguish between driver and passenger phone use. We collected data in 16 different cars with 33 different users and achieved an overall accuracy of 94% when the phone is held in hand and 92.2% when the phone is docked (?1 sec. delay). With just a software upgrade, this work can enable smartphones to proactively adapt to the user's context in the car and and substantially reduce distracted driving incidents.2020RKRushil Khurana et al.Carnegie Mellon UniversityAutomated Driving Interface & Takeover DesignEye Tracking & Gaze InteractionCHI