SympCam
Remote Optical Measurement of Sympathetic Arousal
IEEE EMBS BHI 2024Abstract
Recent work has shown that a person’s sympathetic arousal can be estimated from facial videos alone using basic signal processing. This opens up new possibilities in the field of telehealth and stress management, providing a non-invasive method to measure stress only using a regular RGB camera. In this paper, we present SympCam, a new 3D convolutional architecture tailored to the task of remote sympathetic arousal prediction. Our model incorporates a temporal attention module (TAM) to enhance the temporal coherence of our sequential data processing capabilities. The predictions from our method improve accuracy metrics of sympathetic arousal in prior work by 48% to a mean correlation of 0.77. We additionally compare our method with common remote photoplethysmography (rPPG) networks and show that they alone cannot accurately predict sympathetic arousal ‘out-of-the-box’. Furthermore, we show that the sympathetic arousal predicted by our method allows detecting physical stress with a balanced accuracy of 90% - an improvement of 61% compared to the rPPG method commonly used in related work, demonstrating the limitations of using rPPG alone. Finally, we contribute a dataset designed explicitly for the task of remote sympathetic arousal prediction. Our dataset contains synchronized face and hand videos of 20 participants from two cameras synchronized with electrodermal activity (EDA) and photoplethysmography (PPG) measurements. We will make this dataset available to the community and use it to evaluate the methods in this paper. To the best of our knowledge, this is the first dataset available to other researchers designed for remote sympathetic arousal prediction.
Reference
Bjoern Braun, Daniel McDuff, Tadas Baltrusaitis, Paul Streli, Max Moebus, and Christian Holz. SympCam: Remote Optical Measurement of Sympathetic Arousal. In International Conference on Biomedical and Health Informatics 2024 (IEEE EMBS BHI).
Study Apparatus
Figure 2. The apparatus used for our study. The participants placed their heads on a chin rest and their hands on a table with their palms facing upwards. We recorded the videos using two Basler acA1300-200uc cameras pointed toward the participants’ faces and hands and the physiological signals from a synchronized BIOPAC MP160 that triggered the cameras by wire.
SympCam
Figure 3. Our proposed neural architecture to predict sympathetic arousal from facial videos. As the backbone of our architecture, we use a 3D CNN (PhysNet) with a temporal input length of T = 768 frames. While such 3D CNN-based architectures have achieved impressive performance for the task of video-based HR prediction, the 3D CNN architectures treat all input frames equally, ignoring that different frames may provide different contributions to the target prediction. To address this problem, we propose a temporal attention module (TAM) that allows our model to learn to discriminate between more and less important features along the temporal dimension.
Results
Figure 4. Visual comparison between our predicted sympathetic arousal (blue) and the ground truth tonic EDA signal (black) for four participants. At minutes 2, 4.5, and 7, the participants are instructed to pinch themselves for 30 seconds to cause a sympathetic stress response. Note that for individual participants, such as participant 11, the model is currently only able to predict the global trend accurately. This difference is attributed to the nature of our method, which estimates sympathetic arousal by analyzing blood flow changes rather than measuring absolute EDA values.