Wristp²
A Wrist-Worn System for Hand Pose and Pressure Estimation

WristP² empowers the subtlety of a fingertip's caress, and unleashes the
power of a palm's command. We refuse to be defined by the 'click'.
It's time to return the sovereignty of interaction to your hands.
Full video on
YouTube
Timeframe
Jul-Sep2025
Role
Experimenter
Designer
Publication
PDF
Author rank
2nd Author
Academic recognition
ACM CHI '26
(Accepted)
*This paper is submitted to the CHI Conference on Human Factors in Computing Systems (CHI ’26). It is currently under review and temporarily open for application purpose. People who see this page are kindly requested not to spread it.
  Motivation
A "click" action in the physical world can carry countless nuances—a tentative tap, a decisive press, a forceful squeeze—yet in the digital world, they are all reduced to the same binary signal. This represents a significant loss of information.

I aim to transform the movements of a single hand into a high-bandwidth input channel with "force and texture," opening up more nuanced expressive possibilities for AR, wearables, and everyday interactions.
*The infinite possibilities of gestures conveyed by hands.
  Introduction

WristP2 is a wrist-worn system with a wide-FOV fisheye camera that reconstructs 3D hand pose and per-vertex pressure in real time.
*Across previous works from 2012–2025, the average reported performance is roughly MPJPE ≈ 10.5 mm, MJAE ≈ 7.1°, and contact accuracy ≈ 92%.
2.88 mm
mean per-joint position error
3.15°
mean joint angle error
10.3 gram
mean pressure error
97 %
contact accuracy
3 h
battery life at full power
72%
Contact IoU
  Application
Utilizing the detailed hand pose and pressure information generated by WristP2, I have effectively applied it across a wide spectrum of scenarios.
Mid-air gesture input in XR

WRISTP2 enables mid-air gestures for natural interaction.
The video demonstrates how mid-air hand gestures can be used to control video playback in XR environments.
Custom Action Control Media

WRISTP2 knows the meaning of all your actions.The video shows how WristP2 supports large-screen interaction in mobile contexts, such as controlling slide presentations during a talk.
Virtual touchpad input on a mobile device

WRISTP2 enables planar virtual touchpad input.The video illustrates how a user browses web pages on a virtual display by using planar input on a virtual touchpad.
Replacing traditional interaction tools

WRISTP2  can replace traditional interactive tools.The video demonstrates how it achieves left and right mouse clicks, which can theoretically be performed on any surface.
Embodied intelligence and dexterous teleoperation

WRISTP2 maps naturally to embodied intelligence and robotic manipulation. This is particularly valuable for high-risk or hard-to-reach tasks, such as industrial inspection or handling hazardous materials remotely.
  Hardware Implementation
The system consists of a nylon wristband, an RGB camera, and a Raspberry Pi Zero 2W. To overcome field-of-view limitations of vision-based sensing, we adopt a fisheye RGB camera module (180° FOV) mounted on the palmar side of the wrist via a 3D-printed, magnetized 90° rotatable hinge positioned ∼15 mm above the skin.
  Dataset:
My teammate and I built a synchronized multi-sensor system to create a dataset of 93,000 frames from 15 participants.

· Synced Sensors: We combined professional Motion Capture (for precise hand tracking) with a high-resolution Pressure Pad (to record touch   force). This allowed us to capture both 3D hand poses and physical pressure simultaneously.

‍· Diverse Scenarios: The dataset covers 48 surface interactions (like clicking and dragging) and 28 mid-air gestures, recorded under various   lighting conditions and backgrounds to ensure the system works reliably in the real world.
We developed an automated pipeline that aligns a 3D hand model with sensor data to generate high-fidelity meshes and per-point pressure labels without manual annotation.
On-plane Gestures:
The on-plane gestures we designed include 48 actions, consisting of common actions using pressure-sensitive touchpads,as well as additional complex actions to increase the diversity of touch data.
Mid-air Gestures:
Mid-air gestures encompass static standard sign language, including 10 American Sign Language (ASL) alphabet lettersand 18 common daily interaction gestures.
  Tech Stack:
Our AI uses a Vision Transformer combined with a VQ-VAE, which reconstructs hands by selecting from a learned "library" of realistic poses to ensure biological accuracy. It simultaneously predicts 3D shape, pressure, and camera position from a single image.
* Example of the wrist-mounted camera image with random replaced background
To ensure robustness, we pre-trained the model on general hand data and then fine-tuned it with randomized backgrounds, teaching it to ignore environmental distractions.
  Offline Evaluation:
The following figure intuitively displays wrist-camera perspective images of various aerial gestures, comparing the real three-dimensional hand poses (blue) with the predicted three-dimensional hand poses (red).
I evaluate WristP2 on three core tasks:

(i) 3D hand pose reconstruction
I evaluated 3D hand pose reconstruction in the hand-local frame using standard metrics: MPJPE(Mean Per-Joint Position Error), PA-MPJPE (Procrustes-Aligned Mean Per-Joint Position Error), PVE (Per-Vertex Error),PA-PVE, and MJAE (Mean Joint Angle Error).

WristP2 achieves high accuracy with an overall MPJPE of 2.9mm and MJAE of 3.2°. Notably, the system demonstrates strong robustness to illumination changes, maintaining consistent performance even in low-light conditions.
(ii) per-vertex pressure estimation
I also evaluate per-vertex contact and pressure estimation including Contact IoU (Contact Intersection overUnion), Vol. IoU (Volumetric Intersection over Union), Contact Accuracy (threshold >10 g), and MAE (Mean AbsoluteError) for foreground (𝑀𝐴𝐸FG) and all vertices (𝑀𝐴𝐸ALL). The following tables detail the performance across different surfaces and lighting conditions.
(iii) wrist-camera extrinsics estimation
Finally, I analyze the camera extrinsics prediction capability of WristP2. Table 5 reports the results. The systemachieves precise extrinsics estimation with a rotation error of 2.3◦ and translation error of 8.9 mm.
  User Study:
Indoor scenarios (Study A)
In three studies we conducted, we chose smartphone touchscreens as the baseline input method and compared PalmTrack with it. Smartphone touchscreens were selected because they provide absolute positioning capabilities and their screen size is comparable to that of a human hand.
Study 1: Virtual Air Mouse (Mid-Air Pointing)

I validated the system's pointing precision using a standard Fitts’ Law task, where participants controlled a cursor via index finger position and performed clicks using a pinch gesture. The system achieved a throughput of 2.5 bits/s, matching the efficiency of a standard laptop touchpad and proving its viability as a precise, equipment-free input device for mid-air interaction.
Study 2: Multi-Finger Pressure Control (Fine-Grained Sensing)

This study evaluated the ability to estimate force across 13 randomized finger combinations, ranging from single touches to full-hand presses. Despite significant self-occlusion inherent in wrist-worn views, participants successfully maintained specific pressure targets with an 86.7% success rate, demonstrating the model's robustness in distinguishing and measuring multi-point contact forces.
Study 3: Virtual Pressure-Sensitive Touchpad (Composite Interaction)

To simulate complex real-world workflows, we transformed a bare desktop into a virtual pressure-sensitive touchpad requiring simultaneous 2D dragging and precise vertical pressure holds. The system achieved an exceptional 98.0% success rate in this composite task, confirming it can reliably decouple pose tracking from pressure estimation to support stable, surface-agnostic control.
  Conclusion:
WristP² is an innovative wrist-worn human-computer interaction system designed to achieve high-precision 3D hand pose reconstruction and per-vertex pressure estimation in real-time within mobile scenarios, using a single wide-field-of-view fisheye RGB camera. This project introduces a deep learning framework ultimately achieving joint position errors as low as 2.9 mm and pressure estimation errors of 10.4 grams.  It has been validated as a low-cost, high-efficiency, and ergonomic general-purpose mobile interaction solution.
  Acknowledgments:
Sincerely thank the user study participants for their positive teamwork and the reviewers in CHI2026 for their valuable and supportive feedback.
  Something I want to say:
This project, along with palmtrack, serves as the core technology for my entrepreneurial venture. As the company is just starting out, we don't have an independent website yet. Therefore, I’d like to first address some key questions about the startup:

Why start this project?
Because I believe that future interactions should go far beyond just more foldable and thinner screens. With the advancement of AI technology, the ideal interaction should be like in the Stone Age—starting from human basic intentions and capabilities, allowing computers to understand our subtle emotions and movements. I want to be a definer of new interaction paradigms in the new era, not a follower.

What is your role in the team?
I am a co-founder, and the technology is my patent. Our team currently consists of seven people, all of whom, except me, are PhDs from Tsinghua University.

What applications does your startup product have?
We are developing a foundational product, and later we will release an SDK for developers to create a wide variety of functions. For example: in engineering, it can be paired with dexterous hands for remote control; in entertainment, there will be more interactive and immersive games; in future living, it can be applied to smart homes—waving to open or close curtains, snapping fingers to turn lights on or off, and so on... We believe this will be a disruption to existing interaction paradigms.

How far has the startup progressed, and what are the results?

We have currently completed angel-round financing, raising approximately $1.4 million, with a company valuation of about $6.4 million. After our related articles are published and patents registered, we will open-source the project directly and welcome feedback and corrections at that time.
ALL
AI
Hardware
SoftEng
UX
WEB
PalmTrack
SoftEng
Hardware
ALL
Wristp^2
Hardware
SoftEng
ALL
Pebbl
AI
SoftEng
UX
ALL
Eatinter
SoftEng
ALL
EgoPoint-Bench
AI
ALL
Beyond Qomolangma
UX
Web
ALL
Alibaba : Xing Yun AI
UX
AI
ALL
Mercedes-Benz: China one app
UX
ALL
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.