Samenvatting
Continuous affective state estimation from facial
information is a task which requires the prediction of time
series of emotional state outputs from a facial image sequence.
Modeling the spatial-temporal evolution of facial information
plays an important role in affective state estimation. One of
the most widely used methods is Recurrent Neural Networks
(RNN). RNNs provide an attractive framework for propagating
information over a sequence using a continuous-valued hidden layer representation. In this work, we propose to instead learn rich affective state dynamics. We model human affect as a dynamical system and define the affective state in terms of valence, arousal and their higher-order derivatives. We then pose the affective state estimation problem as a jointly trained state estimator for high-dimensional input images, combining an RNN and a Bayesian Filter, i.e. Kalman filters (KF) and Extended Kalman filters (EKF), so that all weights in the resulting network can be trained using backpropagation. We use a recently proposed general framework for designing and learning discriminative state estimators framed as computational graphs. Such approach can handle high dimensional observations and efficiently optimize, in an end-to-end fashion, the state estimator. In addition, to deal with the asynchrony between
emotion labels and input images, caused by the inherent reaction lag of the annotators, we introduce a convolutional layer that aligns features with emotion labels. Experimental results, on the RECOLA and SEMAINE datasets for continuous emotion prediction, illustrate the potential of the proposed framework compared to recent state-of-the-art models.
information is a task which requires the prediction of time
series of emotional state outputs from a facial image sequence.
Modeling the spatial-temporal evolution of facial information
plays an important role in affective state estimation. One of
the most widely used methods is Recurrent Neural Networks
(RNN). RNNs provide an attractive framework for propagating
information over a sequence using a continuous-valued hidden layer representation. In this work, we propose to instead learn rich affective state dynamics. We model human affect as a dynamical system and define the affective state in terms of valence, arousal and their higher-order derivatives. We then pose the affective state estimation problem as a jointly trained state estimator for high-dimensional input images, combining an RNN and a Bayesian Filter, i.e. Kalman filters (KF) and Extended Kalman filters (EKF), so that all weights in the resulting network can be trained using backpropagation. We use a recently proposed general framework for designing and learning discriminative state estimators framed as computational graphs. Such approach can handle high dimensional observations and efficiently optimize, in an end-to-end fashion, the state estimator. In addition, to deal with the asynchrony between
emotion labels and input images, caused by the inherent reaction lag of the annotators, we introduce a convolutional layer that aligns features with emotion labels. Experimental results, on the RECOLA and SEMAINE datasets for continuous emotion prediction, illustrate the potential of the proposed framework compared to recent state-of-the-art models.
Originele taal-2 | English |
---|---|
Aantal pagina's | 14 |
Tijdschrift | IEEE Transactions on Multimedia |
Vroegere onlinedatum | 1 apr 2022 |
DOI's | |
Status | Published - 2022 |