Ultra Diffusion Poser
Diffusion-Based Human Motion Tracking From Sparse Inertial Sensors and Ranging-Based Between-Sensor Distances
CVPR 2026Abstract
Methods using inertial measurement units (IMUs) provide a wearable alternative to camera-based motion capture. To mitigate drift from inertial signals, recent sparse inertial pose estimators integrate inter-sensor distances measured by ultra-wideband (UWB) ranging. So far, UWB distances have only been used as an additional input feature, ignoring the physical constraints they impose on sensor positions. However, these distances can also be used to reconstruct the underlying 3D sensor layout, which in turn provides more informative input for pose reconstruction. We propose Ultra Diffusion Poser, a diffusion model that explicitly models these geometric constraints. It includes a Spatial Layout Module that analytically reconstructs the 3D sensor positions from UWB measurements. These sensor positions are used alongside IMU signals and UWB distances as a conditioning signal during diffusion. Still, network predictions can violate inter-sensor distance measurements. To address this, we introduce UWB-Diffusion Guidance, which encourages alignment between predicted poses and measured distances during diffusion sampling. Together, these contributions enable our model to achieve state-of-the-art performance, reducing joint position error by up to 22% over prior work.
Reference
Dominik Hollidt, Tommaso Bendinelli, and Christian Holz. Ultra Diffusion Poser: Diffusion-Based Human Motion Tracking From Sparse Inertial Sensors and Ranging-Based Between-Sensor Distances. In Conference on Computer Vision and Pattern Recognition 2026 (CVPR).
Model Architecture

Figure 2. Ultra diffusion poser lifts inter-sensor distances as an auxiliary input by explicitly modeling the geometric constraints they impose. First it reconstructs the sensor layout from the inter-sensor distances in the Spatial Layout Module. Here multi-dimensional scaling is a key operation that recovers the 3D arrangement of sensors from just inter-sensor distances. This recovered sensor layout serves as conditioning signal in the diffusion model. To prevent body predictions that violate measured UWB-distances the UWB-Diffusion guidance aligns the body predictions with the measurements.
Evaluation

Figure 3. The integration and explicit modeling of inter-sensor distances improves the prediction of arm and leg positions noticeably. Overall, Ultra Diffusion Poser achieves state-of-the-art results.



