Publications – CIRICS

Slide

Centre Interdisciplinaire
de Recherche et d’Innovation
en Cybersécurité et Société

Joudeh, I. O.; Cretu, A. -M.; Bouchard, S.

Predicting the Arousal and Valence Values of Emotional States Using Learned, Predesigned, and Deep Visual Features † Article de journal

Dans: Sensors, vol. 24, no 13, 2024, ISSN: 14248220 (ISSN), (Publisher: Multidisciplinary Digital Publishing Institute (MDPI)).

Résumé | Liens | BibTeX | Étiquettes: adult, Affective interaction, Arousal, artificial neural network, Cognitive state, Cognitive/emotional state, Collaborative interaction, computer, Convolutional neural networks, correlation coefficient, Deep learning, emotion, Emotional state, Emotions, female, Forecasting, Helmet mounted displays, human, Humans, Learning algorithms, Learning systems, Long short-term memory, Machine learning, Machine-learning, male, Mean square error, Neural networks, physiology, Regression, Root mean squared errors, Video recording, virtual reality, Visual feature, visual features

@article{joudeh_predicting_2024,

title = {Predicting the Arousal and Valence Values of Emotional States Using Learned, Predesigned, and Deep Visual Features †},

author = {I. O. Joudeh and A. -M. Cretu and S. Bouchard},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85198382238&doi=10.3390%2fs24134398&partnerID=40&md5=cefa8b2e2c044d02f99662af350007db},

doi = {10.3390/s24134398},

issn = {14248220 (ISSN)},

year  = {2024},

date = {2024-01-01},

journal = {Sensors},

volume = {24},

number = {13},

abstract = {The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA’s video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson’s correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions. © 2024 by the authors.},

note = {Publisher: Multidisciplinary Digital Publishing Institute (MDPI)},

keywords = {adult, Affective interaction, Arousal, artificial neural network, Cognitive state, Cognitive/emotional state, Collaborative interaction, computer, Convolutional neural networks, correlation coefficient, Deep learning, emotion, Emotional state, Emotions, female, Forecasting, Helmet mounted displays, human, Humans, Learning algorithms, Learning systems, Long short-term memory, Machine learning, Machine-learning, male, Mean square error, Neural networks, physiology, Regression, Root mean squared errors, Video recording, virtual reality, Visual feature, visual features},

pubstate = {published},

tppubtype = {article}

}

Fermer

The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA’s video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson’s correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions. © 2024 by the authors.

Fermer

Joudeh, I. O.; Cretu, A. -M.; Bouchard, S.; Guimond, S.

Prediction of Emotional States from Partial Facial Features for Virtual Reality Applications Article de journal

Dans: Annual Review of CyberTherapy and Telemedicine, vol. 21, p. 17–21, 2023, ISSN: 15548716, (Publisher: Interactive Media Institute).

Résumé | Liens | BibTeX | Étiquettes: Arousal, article, clinical article, convolutional neural network, correlation coefficient, data base, emotion, facies, female, human, human experiment, Image processing, long short term memory network, male, random forest, residual neural network, root mean squared error, videorecording, virtual reality

@article{joudeh_prediction_2023-1,

title = {Prediction of Emotional States from Partial Facial Features for Virtual Reality Applications},

author = {I. O. Joudeh and A. -M. Cretu and S. Bouchard and S. Guimond},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85182471413&partnerID=40&md5=8190e0dbb5b48ae508515f4029b0a0d1},

issn = {15548716},

year  = {2023},

date = {2023-01-01},

journal = {Annual Review of CyberTherapy and Telemedicine},

volume = {21},

pages = {17–21},

abstract = {The availability of virtual reality (VR) in numerous clinical contexts has been made possible by recent technological advancements. One application is using VR for cognitive interventions with individuals who have mental disorders. Predicting the emotional states of users could help to prevent their discouragement during VR interventions. We can monitor the emotional states of individuals using sensors like an external camera, as they engage in various tasks within VR environments. The emotional state of VR users can be measured through arousal and valence, as per the Circumplex model. We used the Remote Collaborative and Affective Interactions (RECOLA) database of emotional behaviours. We processed video frames from 18 RECOLA videos. Due to the headset in VR systems, we detected faces and cropped the images of faces to use the lower half of the face only. We labeled the images with arousal and valence values to reflect various emotions. Convolutional neural networks (CNNs), specifically MobileNet-v2 and ResNets-18, were then used to predict arousal and valence values. MobileNet-v2 outperforms ResNet-18 as well as others from the literature. We achieved a root mean squared error (RMSE), Pearson’s correlation coefficient (PCC), and Concordance correlation coefficient (CCC) of 0.1495, 0.6387, and 0.6081 for arousal, and 0.0996, 0.6453, and 0.6232 for valence. Our work acts as a proof-of-concept for predicting emotional states from arousal and valence values via visual data of users immersed in VR experiences. In the future, predicted emotions could be used to automatically adjust the VR environment for individuals engaged in cognitive interventions. © 2023, Interactive Media Institute. All rights reserved.},

note = {Publisher: Interactive Media Institute},

keywords = {Arousal, article, clinical article, convolutional neural network, correlation coefficient, data base, emotion, facies, female, human, human experiment, Image processing, long short term memory network, male, random forest, residual neural network, root mean squared error, videorecording, virtual reality},

pubstate = {published},

tppubtype = {article}

}

Fermer

Partager cette page