LemurDx: Using Unconstrained Passive Sensing for an Objective Measurement of Hyperactivity in Children with no Parent InputHyperactivity is the most dominant presentation of Attention-Deficit/Hyperactivity Disorder in young children. Currently, measuring hyperactivity involves parents' or teachers' reports. These reports are vulnerable to subjectivity and can lead to misdiagnosis. LemurDx provides an objective measure of hyperactivity using passive mobile sensing. We collected data from 61 children (25 with hyperactivity) who wore a smartwatch for up to 7 days without changing their daily routine. The participants' parents maintained a log of the child's activities at a half-hour granularity (e.g., sitting, exercising) as contextual information. Our ML models achieved 85.2% accuracy in detecting hyperactivity in children (using parent-provided activity labels). We also built models that estimated children's context from the sensor data and did not rely on activity labels to reduce parent burden. These models achieved 82.0% accuracy in detecting hyperactivity. In addition, we interviewed five clinicians who suggested a need for a tractable risk score that enables analysis of a child's behavior across contexts. Our results show the feasibility of supporting the diagnosis of hyperactivity by providing clinicians with an interpretable and objective score of hyperactivity using off-the-shelf watches and adding no constraints to children or their guardians. https://dl.acm.org/doi/10.1145/35962442023RARIKU ARAKAWA et al.Cognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Biosensors & Physiological MonitoringUbiComp
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and EarbudsTracking body pose on-the-go could have powerful uses in fitness, mobile gaming, context-aware virtual assistants, and rehabilitation. However, users are unlikely to buy and wear special suits or sensor arrays to achieve this end. Instead, in this work, we explore the feasibility of estimating body pose using IMUs already in devices that many users own --- namely smartphones, smartwatches, and earbuds. This approach has several challenges, including noisy data from low-cost commodity IMUs, and the fact that the number of instrumentation points on a user's body is both sparse and in flux. Our pipeline receives whatever subset of IMU data is available, potentially from just a single device, and produces a best-guess pose. To evaluate our model, we created the IMUPoser Dataset, collected from 10 participants wearing or holding off-the-shelf consumer devices and across a variety of activity contexts. We provide a comprehensive evaluation of our system, benchmarking it on both our own and existing IMU datasets.2023VMVimal Mollyn et al.Carnegie Mellon UniversityHuman Pose & Activity RecognitionBiosensors & Physiological MonitoringCHI
TriboTouch: Micro-Patterned Surfaces for Low Latency TouchscreensTouchscreen tracking latency, often 80ms or more, creates a rubber-banding effect in everyday direct manipulation tasks such as dragging, scrolling, and drawing. This has been shown to decrease system preference, user performance, and overall realism of these interfaces. In this research, we demonstrate how the addition of a thin, 2D micro-patterned surface with 5 micron spaced features can be used to reduce motor-visual touchscreen latency. When a finger, stylus, or tangible is translated across this textured surface frictional forces induce acoustic vibrations which naturally encode sliding velocity. This acoustic signal is sampled at 192kHz using a conventional audio interface pipeline with an average latency of 28ms. When fused with conventional low-speed, but high-spatial-accuracy 2D touch position data, our machine learning model can make accurate predictions of real time touch location.2022CSCraig Shultz et al.Carnegie Mellon UniversityMid-Air Haptics (Ultrasonic)Vibrotactile Feedback & Skin StimulationCHI
ControllerPose: Inside-Out Body Capture with VR Controller CamerasWe present a new and practical method for capturing user body pose in virtual reality experiences: integrating cameras into handheld controllers, where batteries, computation and wireless communication already exist. By virtue of the hands operating in front of the user during many VR interactions, our controller-borne cameras can capture a superior view of the body for digitization. Our pipeline composites multiple camera views together, performs 3D body pose estimation, uses this data to control a rigged human model with inverse kinematics, and exposes the resulting user avatar to end user applications. We developed a series of demo applications illustrating the potential of our approach and more leg-centric interactions, such as balancing games and kicking soccer balls. We describe our proof-of-concept hardware and software, as well as results from our user study, which point to imminent feasibility.2022KAKaran Ahuja et al.Carnegie Mellon UniversityFull-Body Interaction & Embodied InputHuman Pose & Activity RecognitionCHI
TouchPose: Hand Pose Prediction, Depth Estimation, and Touch Classification from Capacitive ImagesToday’s touchscreen devices commonly detect the coordinates of user input through capacitive sensing. Yet, these coordinates are the mere 2D manifestations of the more complex 3D configuration of the whole hand—a sensation that touchscreen devices so far remain oblivious to. In this work, we introduce the problem of reconstructing a 3D hand skeleton from capacitive images, which encode the sparse observations captured by touch sensors. These low-resolution images represent intensity mappings that are proportional to the distance to the user’s fingers and hands. We present the first dataset of capacitive images with corresponding depth maps and 3D hand pose coordinates, comprising 65,374 aligned records from 10 participants. We introduce our supervised method TouchPose, which learns a 3D hand model and a corresponding depth map using a cross-modal trained embedding from capacitive images in our dataset. We quantitatively evaluate TouchPose’s accuracy in touch classification, depth estimation, and 3D joint reconstruction, showing that our model generalizes to hand poses it has never seen during training and can infer joints that lie outside the touch sensor’s volume. Enabled by TouchPose, we demonstrate a series of interactive apps and novel interactions on multitouch devices. These applications show TouchPose’s versatile capability to serve as a general purpose model, operating independent of use-case, and establishing 3D hand pose as an integral part of the input dictionary for application designers and developers. We also release our dataset, code, and model to enable future work in this domain.2021KAKaran Ahuja et al.Hand Gesture RecognitionHuman Pose & Activity RecognitionUIST
Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse KinematicsWe present Pose-on-the-Go, a full-body pose estimation system that uses sensors already found in today’s smartphones. This stands in contrast to prior systems, which require worn or external sensors. We achieve this result via extensive sensor fusion, leveraging a phone’s front and rear cameras, the user-facing depth camera, touchscreen, and IMU. Even still, we are missing data about a user’s body (e.g., angle of the elbow joint), and so we use inverse kinematics to estimate and animate probable body poses. We provide a detailed evaluation of our system, benchmarking it against a professional-grade Vicon tracking system. We conclude with a series of demonstration applications that underscore the unique potential of our approach, which could be enabled on many modern smartphones with a simple software update.2021KAKaran Ahuja et al.Carnegie Mellon UniversityFull-Body Interaction & Embodied InputHuman Pose & Activity RecognitionCHI
PrivacyMic: Utilizing Inaudible Frequencies for Privacy Preserving Daily Activity RecognitionSound presents an invaluable signal source that enables computing systems to perform daily activity recognition. However, microphones are optimized for human speech and hearing ranges: capturing private content, such as speech, while omitting useful, inaudible information that can aid in acoustic recognition tasks. We simulated acoustic recognition tasks using sounds from 127 everyday household/workplace objects, finding that inaudible frequencies can act as a substitute for privacy-sensitive frequencies. To take advantage of these inaudible frequencies, we designed a Raspberry Pi-based device that captures inaudible acoustic frequencies with settings that can remove speech or all audible frequencies entirely. We conducted a perception study, where participants "eavesdropped" on PrivacyMic’s filtered audio and found that none of our participants could transcribe speech. Finally, PrivacyMic’s real-world activity recognition performance is comparable to our simulated results, with over 95% classification accuracy across all environments, suggesting immediate viability in performing privacy-preserving daily activity recognition.2021YIYasha Iravantchi et al.University of MichiganPrivacy Perception & Decision-MakingIoT Device PrivacyContext-Aware ComputingCHI
Classroom Digital Twins with Instrumentation-Free Gaze TrackingClassroom sensing is an important and active area of research with great potential to improve instruction. Complementing professional observers - the current best practice - automated pedagogical professional development systems can attend every class and capture fine-grained details of all occupants. One particularly valuable facet to capture is class gaze behavior. For students, certain gaze patterns have been shown to correlate with interest in the material, while for instructors, student-centered gaze patterns have been shown to increase approachability and immediacy. Unfortunately, prior classroom gaze-sensing systems have limited accuracy and often require specialized external or worn sensors. In this work, we developed a new computer-vision-driven system that powers a 3D “digital twin” of the classroom and enables whole-class, 6DOF head gaze vector estimation without instrumenting any of the occupants. We describe our open source implementation, and results from both controlled studies and real-world classroom deployments.2021KAKaran Ahuja et al.Carnegie Mellon UniversityEye Tracking & Gaze InteractionMixed Reality WorkspacesCHI
Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity RecognitionMillimeter wave (mmWave) Doppler radar is a new and promising sensing approach for human activity recognition, offering signal richness approaching that of microphones and cameras, but without many of the privacy-invading downsides. However, unlike audio and computer vision approaches that can draw from huge libraries of videos for training deep learning models, Doppler radar has no existing large datasets, holding back this otherwise promising sensing modality. In response, we set out to create a software pipeline that converts videos of human activities into realistic, synthetic Doppler radar data. We show how this cross-domain translation can be successful through a series of experimental results. Overall, we believe our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems, and could help bootstrap uses in human-computer interaction.2021KAKaran Ahuja et al.Carnegie Mellon UniversityHuman Pose & Activity RecognitionBrain-Computer Interface (BCI) & NeurofeedbackPrivacy Perception & Decision-MakingCHI
Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Device EcosystemsFuture homes and offices will feature increasingly dense ecosystems of IoT devices, such as smart lighting, speakers, and domestic appliances. Voice input is a natural candidate for interacting with out-of-reach and often small devices that lack full-sized physical interfaces. However, at present, voice agents generally require wake-words and device names in order to specify the target of a spoken command (e.g., “Hey Alexa, kitchen lights to full brightness”). In this research, we explore whether speech alone can be used as a directional communication channel, in much the same way visual gaze specifies a focus. Instead of a device’s microphones simply receiving and processing spoken commands, we suggest they also infer the Direction of Voice (DoV). Our approach innately enables voice commands with addressability (i.e., devices know if a command was directed at them) in a natural and rapid manner. We quantify the accuracy of our implementation across users, rooms, spoken phrases, and other key factors that affect performance and usability. Taken together, we believe our DoV approach demonstrates feasibility and the promise of making distributed voice interactions much more intuitive and fluid.2020KAKaran Ahuja et al.Voice User Interface (VUI) DesignContext-Aware ComputingSmart Home Interaction DesignUIST
MeCap: Whole-Body Digitization for Low-Cost VR/AR HeadsetsLow-cost, smartphone-powered VR/AR headsets are becoming more popular. These basic devices – little more than plastic or cardboard shells – lack advanced features, such as controllers for the hands, limiting their interactive capability. Moreover, even high-end consumer headsets lack the ability to track the body and face. For this reason, interactive experiences like social VR are underdeveloped. We introduce MeCap, which enables commodity VR headsets to be augmented with powerful motion capture (“MoCap”) and user-sensing capabilities at very low cost (under $5). Using only a pair of hemispherical mirrors and the existing rear-facing camera of a smartphone, MeCap provides real-time estimates of a wearer’s 3D body pose, hand pose, facial expression, physical appearance and surrounding environment – capabilities which are either absent in contemporary VR/AR systems or which require specialized hardware and controllers. We evaluate the accuracy of each of our tracking features, the results of which show imminent feasibility.2019KAKaran Ahuja et al.Full-Body Interaction & Embodied InputHuman Pose & Activity RecognitionAR Navigation & Context AwarenessUIST
LightAnchors: Appropriating Point Lights for Spatially-Anchored Augmented Reality InterfacesAugmented reality requires precise and instant overlay of digital information onto everyday objects. We present our work on LightAnchors, a new method for displaying spatially-anchored data. We take advantage of pervasive point lights – such as LEDs and light bulbs – for both inview anchoring and data transmission. These lights are blinked at high speed to encode binary data. We built a proof-of-concept application that runs on iOS without any hardware or software modifications. We ran a study to characterize the performance of our system and built ten example demos to highlight the potential of our approach.2019KAKaran Ahuja et al.AR Navigation & Context AwarenessContext-Aware ComputingUIST
Ubicoustics: Plug-and-Play Acoustic Activity RecognitionDespite sound being a rich source of information, computing devices with microphones do not leverage audio to glean useful insights about their physical and social context. For example, a smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a user is doing in a kitchen – a missed opportunity. In this work, we describe a novel, real-time, sound-based activity recognition system. We start by taking an existing, state-of-the-art sound labeling model, which we then tune to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry. These well-labeled and high-quality sounds are the perfect atomic unit for data augmentation, including amplitude, reverb, and mixing, allowing us to exponentially grow our tuning data in realistic ways. We quantify the performance of our approach across a range of environments and device categories and show that microphone-equipped computing devices already have the requisite capability to unlock real-time activity recognition comparable to human accuracy.2018GLGierad Laput et al.Context-Aware ComputingUbiquitous ComputingUIST