WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

articleCHI '26

Authors

JK

Korea Advanced Institute of Science and Technology

CL

Cornell University

HJ

Korea Advanced Institute of Science and Technology

TC

Cornell University

RZ

Cornell University

IO

Korea Advanced Institute of Science and Technology

CZ

Cornell University

Hand Gesture RecognitionSmartwatches & Fitness BandsContext-Aware ComputingSoftware Engineers & DevelopersUI/UX DesignersAI/ML Researchers & Engineers

Paper Title

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

Publication Info

  • Topic area: Continuous 3D hand pose tracking using commercial smartwatches.
  • Keywords: Hand pose tracking, smartwatch, active acoustic sensing, COTS devices, deep learning, 3D hand tracking, gesture recognition, wearable computing, human-computer interaction, privacy-preserving sensing.

Background and Problem

  • Problem / challenge: Continuous 3D hand pose tracking is unavailable on commercial smartwatches due to reliance on external sensors or custom hardware. Existing solutions either focus on discrete gestures or require bespoke configurations, limiting scalability and real-world applicability.
  • Significance: Enabling continuous hand pose tracking on off-the-shelf (COTS) smartwatches could unlock expressive, context-aware interactions for millions of devices already in use, enhancing user experience and accessibility.
  • Motivation and related work: Prior work has explored sensing modalities like cameras, radar, and EMG, but these approaches face challenges such as privacy concerns, hardware requirements, or limited generalizability. WatchHand builds on recent advances in active acoustic sensing, repurposing built-in smartwatch sensors to achieve continuous hand pose tracking without additional hardware.

Solution

  • Proposed approach: WatchHand, a system leveraging the built-in speaker and microphone of COTS smartwatches to perform continuous 3D hand pose tracking using active acoustic sensing and deep learning.
  • Novelty:
    1. First system to achieve continuous 3D hand pose tracking exclusively using built-in sensors on commercial smartwatches.
    2. Development of a deep-learning pipeline for robust hand pose estimation across diverse conditions (e.g., hardware, postures, noise).
    3. Extensive empirical evaluations on multiple smartwatch models, yielding sub-centimeter accuracy in cross-session tests.
    4. Practical considerations for real-world deployment, including on-device processing and privacy-preserving design.
  • Procedure and key techniques:
    1. Emit inaudible frequency-modulated continuous waves (18–21 kHz) via the smartwatch speaker and capture reflections with the microphone.
    2. Process acoustic signals into spatiotemporal echo profiles using cross-correlation and differential techniques.
    3. Use a FastViT-based deep-learning model to estimate 3D positions of 20 finger joints.
    4. Evaluate performance across multiple conditions (e.g., hardware, postures, noise) and adapt models with fine-tuning for unseen scenarios.

Results

  • Concrete findings:
    • Achieved a mean per-joint position error (MPJPE) of 7.87 mm in cross-session tests and 14.88 mm in cross-user tests.
    • Maintained robustness under diverse conditions, including body postures (MPJPE reduced to 6.58 mm with fine-tuning) and noise scenarios (e.g., loud music, walking).
    • Demonstrated sub-centimeter accuracy in within-session tests (MPJPE: 6.02 mm).
  • Advantage over baselines:
    • Outperformed prior systems like DiscoBand (17.87 mm MPJPE) and EITPose (17.81 mm MPJPE) in cross-session evaluations.
    • Enabled real-time, on-device processing with a latency of 0.115 seconds per prediction.
  • Experiments / evaluation:
    • Conducted four studies with 40 participants, testing across three smartwatch models (Samsung, Xiaomi, Google) and various conditions (e.g., postures, noise, dynamic hand pose variations).
    • Evaluated models using cross-session, within-session, and cross-user protocols.
  • Limitations and future work:
    • Performance drops in cross-user scenarios due to inter-user variability.
    • Challenges with object interactions and unseen hand poses.
    • Future work includes expanding datasets, integrating object-awareness, and exploring self-supervised learning for improved generalization.

Summary

WatchHand introduces a novel approach to continuous 3D hand pose tracking using only the built-in speaker and microphone of commercial smartwatches. By leveraging active acoustic sensing and deep learning, the system achieves sub-centimeter accuracy across diverse conditions, including hardware variations, postures, and noise. Extensive evaluations demonstrate its robustness and adaptability, with potential applications in gesture-based interaction, accessibility, and cross-device interfaces. WatchHand represents a significant step toward scalable, privacy-preserving hand tracking on millions of existing smartwatches, with opportunities for further optimization and dataset expansion.

Quick Actions

AdRecommended

Learn AI Coding at CodeNow

open_in_newOpen DOI Link
DOI: https://doi.org/10.1145/3772318.3790932
At a Glance

Paper Snapshot

fact_check
dataset
Source
CHI
calendar_month
Year
2026
emoji_events
Award
No award tagged
group
Authors
7 authors
sell
Subtopics
Hand Gesture Recognition, Smartwatches & Fitness Bands, Context-Aware Computing
work
Professions
Software Engineers & Developers, UI/UX Designers, AI/ML Researchers & Engineers
article
Content Status
Full text indexed
hub
Related Papers
1 related papers
Spread Ideas

Share This Paper

ios_share

https://hci.top/en/papers/chi/223529/2026

WatchHand: Enabling Continuous Hand Pose Tracking On Off… | CHI 2026 | HCI.TOP