WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

Jiwan Kim; Chi-Jung Lee; Hohurn Jung; Tianhong Catherine Yu; Ruidong Zhang; Ian Oakley; Cheng Zhang

doi:10.1145/3772318.3790932

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

articleCHI '26

Authors

JK

Korea Advanced Institute of Science and Technology

CL

Cornell University

HJ

Korea Advanced Institute of Science and Technology

TC

Cornell University

RZ

Cornell University

IO

Korea Advanced Institute of Science and Technology

CZ

Cornell University

Hand Gesture RecognitionSmartwatches & Fitness BandsContext-Aware ComputingSoftware Engineers & DevelopersUI/UX DesignersAI/ML Researchers & Engineers

DOI

https://doi.org/10.1145/3772318.3790932open_in_new

Paper Title

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

Publication Info

Topic area: Continuous 3D hand pose tracking using commercial smartwatches.
Keywords: Hand pose tracking, smartwatch, active acoustic sensing, COTS devices, deep learning, 3D hand tracking, gesture recognition, wearable computing, human-computer interaction, privacy-preserving sensing.

Background and Problem

Problem / challenge: Continuous 3D hand pose tracking is unavailable on commercial smartwatches due to reliance on external sensors or custom hardware. Existing solutions either focus on discrete gestures or require bespoke configurations, limiting scalability and real-world applicability.
Significance: Enabling continuous hand pose tracking on off-the-shelf (COTS) smartwatches could unlock expressive, context-aware interactions for millions of devices already in use, enhancing user experience and accessibility.
Motivation and related work: Prior work has explored sensing modalities like cameras, radar, and EMG, but these approaches face challenges such as privacy concerns, hardware requirements, or limited generalizability. WatchHand builds on recent advances in active acoustic sensing, repurposing built-in smartwatch sensors to achieve continuous hand pose tracking without additional hardware.

Solution

Proposed approach: WatchHand, a system leveraging the built-in speaker and microphone of COTS smartwatches to perform continuous 3D hand pose tracking using active acoustic sensing and deep learning.
Novelty:
1. First system to achieve continuous 3D hand pose tracking exclusively using built-in sensors on commercial smartwatches.
2. Development of a deep-learning pipeline for robust hand pose estimation across diverse conditions (e.g., hardware, postures, noise).
3. Extensive empirical evaluations on multiple smartwatch models, yielding sub-centimeter accuracy in cross-session tests.
4. Practical considerations for real-world deployment, including on-device processing and privacy-preserving design.
Procedure and key techniques:
1. Emit inaudible frequency-modulated continuous waves (18–21 kHz) via the smartwatch speaker and capture reflections with the microphone.
2. Process acoustic signals into spatiotemporal echo profiles using cross-correlation and differential techniques.
3. Use a FastViT-based deep-learning model to estimate 3D positions of 20 finger joints.
4. Evaluate performance across multiple conditions (e.g., hardware, postures, noise) and adapt models with fine-tuning for unseen scenarios.

Results

Concrete findings:
- Achieved a mean per-joint position error (MPJPE) of 7.87 mm in cross-session tests and 14.88 mm in cross-user tests.
- Maintained robustness under diverse conditions, including body postures (MPJPE reduced to 6.58 mm with fine-tuning) and noise scenarios (e.g., loud music, walking).
- Demonstrated sub-centimeter accuracy in within-session tests (MPJPE: 6.02 mm).
Advantage over baselines:
- Outperformed prior systems like DiscoBand (17.87 mm MPJPE) and EITPose (17.81 mm MPJPE) in cross-session evaluations.
- Enabled real-time, on-device processing with a latency of 0.115 seconds per prediction.
Experiments / evaluation:
- Conducted four studies with 40 participants, testing across three smartwatch models (Samsung, Xiaomi, Google) and various conditions (e.g., postures, noise, dynamic hand pose variations).
- Evaluated models using cross-session, within-session, and cross-user protocols.
Limitations and future work:
- Performance drops in cross-user scenarios due to inter-user variability.
- Challenges with object interactions and unseen hand poses.
- Future work includes expanding datasets, integrating object-awareness, and exploring self-supervised learning for improved generalization.

Summary

WatchHand introduces a novel approach to continuous 3D hand pose tracking using only the built-in speaker and microphone of commercial smartwatches. By leveraging active acoustic sensing and deep learning, the system achieves sub-centimeter accuracy across diverse conditions, including hardware variations, postures, and noise. Extensive evaluations demonstrate its robustness and adaptability, with potential applications in gesture-based interaction, accessibility, and cross-device interfaces. WatchHand represents a significant step toward scalable, privacy-preserving hand tracking on millions of existing smartwatches, with opportunities for further optimization and dataset expansion.

Quick Actions

Share

Share this page

ios_share

X in mail

https://hci.top/en/papers/chi/223529/2026

AdRecommended

Learn AI Coding at CodeNow

open_in_newOpen DOI Link

DOI: https://doi.org/10.1145/3772318.3790932

At a Glance

Paper Snapshot

fact_check

dataset

Source

CHI

calendar_month

Year

2026

emoji_events

Award

No award tagged

group

Authors

7 authors

sell

Subtopics

Hand Gesture Recognition, Smartwatches & Fitness Bands, Context-Aware Computing

work

Professions

Software Engineers & Developers, UI/UX Designers, AI/ML Researchers & Engineers

article

Content Status

Full text indexed

hub