Projects per year
Organization profile
Organisation profile
Nowadays, audio-visual scene analysis becomes more and more important in many areas. A typical application regards teleconferencing systems for enhanced multimodal communication, or to steer electronically the camera to the active speaker without a human operator. But also in surveillance systems the audio-visual perception is of great interest for intruder detection, behaviour and event analysis, etc ... A further example affects the observation of a driver. By means of acoustic and visual data the vigilance of the driver can be detected to avoid accidents due to fatigue or inattention. Within this context, the AVSP research at ETRO is pioneering future directions in audio-visual systems. The AVSP research cluster explores and capitalizes on the correlation between speech and video data for Computational information technology where efficient numerical methods of computational engineering are combined with the problems of information processing. It has adopted an integrated approach to speech and vision processing, and is supported by in-house expertise in Speech Processing (ETRO-DSSP) and Computer Vision (ETRO-IRIS) and long term collaboration with the Dept. of Computer Information & Engineering (CIE) - School of Computer Science, North Western Polytechnic University (NWPU) Xi'an - China. The research work includes: * Audio Visual Interactions in Multimodal Communications * Audio Visual Emotion Analysis & Synthesis * Source localization and tracking in audio-visual scenes * Audio-Visual Synchrony * Audio Visual Speaker Identification The applications being addressed by this work include: * Multimedia applications- understanding, indexing and managing multimedia content as well as detecting scene transitions and events. * Ubiquitous computing applications - multimodal Audio-Visual Processing for Advanced Context Awareness in Smart Spaces (Smart rooms, Group meeting, Intelligent Environments) * Combined Visual-Acoustic processing for Humanoid Robots * Audio-visual surveillance * Audio-visual driver vigilance monitoring Description * Audio-Visual Interactions in Multimodal Communications Human perception of speech is bimodal in that acoustic speech can be affected by visual cues from lip movement. Due to the bimodality in speech perception, audio-visual interaction is an important design factor for multimodal communication systems, such as video telephony and video conferencing. The key issue in bimodal speech analysis and synthesis is the establishment of the mapping between acoustic and visual parameters. We are developing approaches for establishing this mapping. Our current work addresses three inter-related problems: (i) the synthesis of articulatory parameters for an MPEG-4 facial animation model, (ii) robust audio-visual speech recognition, and (iii) photorealistic audio-visual speech synthesis. Fusing these areas will impact the fields of speech and text driven facial animation parameters, speech and text driven facial animation of avatars and audio-visual speech recognition. Specifically, the proposed technique is not only intended to reproduce lip movements in accordance with the speech content, but the goal is also to reproduce the complex set of facial movements and expressions related with the emotional content of the speech. Such a system, for example, can provide visual information during telephone conversation that could be beneficial to people with impaired hearing. It can also be beneficial to normal hearing people when the audio signal is degraded. * Audio-Visual Emotion Analysis & Synthesis Recent research results in the area of multimodal interfaces have been focusing on the development of natural, adaptive and intelligent interfaces enabling machines to communicate with humans in ways much closer to the way humans communicate among themselves. In order to augment natural interactivity between humans and the physical or virtual environment, research is now carried out towards autonomous interfaces that are capable of learning and adapting (responding) to user emotions, intentions and behaviour. Methods for intention and emotion recognition are still in a very early stage of research and development. We are developing audio-visual methods that combine the information obtained from voice (mainly based on prosodic features) and from video (mainly based on facial expressions) for emotion analysis and synthesis. * Source localization and tracking in audio-visual sequences Human listeners make use of binaural cues (e.g., interaural time differences) to localise sound sources in space. To make use of such information, we propose the joint use of microphone arrays and video information to localize and track humans. A major innovation in this area will be the use of novel audio-visual localization methods, where audio and video processing methods are combined in order to realize reliable source localization. In this way, it will be possible to continuously identify the focus of e.g. a meeting discussion and to detect changes in this focus. A variety of different modes that could be used for tracking (e.g., face localization, sound localization, motion evaluation) will be combined in a stochastic framework in order to guarantee a robust tracking algorithm. * Audio-Visual Synchrony Human beings have a special ability in understanding what is happening in an audio-visual scene and are particularly efficient at assessing audiovisual synchrony. Visual observation allows objects to be tracked from one location to another and allows objects' appearances and activities to be characterized over time. Audio observation compliments visual observation in many ways. We are investigating methods to detect discrete audio and visual events, determine anomalous audio and visual events, cluster the audio and video events into meaningful classes, and determine the salient temporal chains of these events that correspond to particular activities in the environment. Another aspect of this work will be the combination and selection of a variety of multimodal features for these tasks. * Audio-Visual Speaker Identification Humans identify speakers based on a variety of attributes of the person which include acoustic cues, visual appearance cues and behavioural characteristics (such as characteristic gestures, lip movements). Speaker identification is an important technology for a variety of applications including security, meetings, and more recently as index for search and retrieval of digitized multimedia content (for instance in the MPEG7 standard). Audio-based speaker identification accuracy under acoustically degraded conditions (such as background noise) still needs further improvement. We have begun to investigate the combination of audio-based processing with visual processing for speaker Identification
Fingerprint
Network
Profiles
-
Jan Paul Cornelis
- Vrije Universiteit Brussel
- Vriendenkring VUB - Emeritus
- Translational Imaging Research Alliance
- Audio Visual Signal Processing
- Electronics and Informatics - Academic
Person: Sympathiser, Researcher, Professor
-
Hichem Sahli
- Vrije Universiteit Brussel
- Electronics and Informatics - Academic, postdoctoral researcher
- Audio Visual Signal Processing
Person: Researcher, Guest professor
-
Werner Verhelst
- Vrije Universiteit Brussel
- Vriendenkring VUB - Emeritus
- Audio Visual Signal Processing
- Laboratorium for Digital Speech and Audio Processing
Person: Sympathiser, Researcher
Projects
-
IOF3016: GEAR: “Venturing into Future Health Technologies”
Stiens, J., Wambacq, P., da Silva Gomes, B. T., Sahli, H., Vandemeulebroucke, J., Jansen, B., Lemeire, J., Steenhaut, K., Munteanu, A., Deligiannis, N., Schelkens, P., Kuijk, M., Parvais, B., Chan, C. W., Van Schependom, J., Touhafi, A., Braeken, A., Runacres, M., Cornelis, B., Schretter, C., Blinder, D. & Temmermans, F.
1/01/21 → 24/12/25
Project: Applied
-
EUAR46: H2020: icovid: AI-based chest CT analysis enabling rapid COVID diagnosis and prognosis
Vandemeulebroucke, J., De Mey, J. & Sahli, H.
1/09/20 → 31/08/22
Project: Fundamental
-
BRGRD41: Joint R&D 2020: ANTICIPATE “AugmeNTed IntelligenCe In orthopaedics TrEatments”
Sahli, H., Vandemeulebroucke, J. & Jansen, B.
1/08/20 → 31/07/23
Project: Applied
-
BRGRD43: Joint R&D 2020: REFINED – EaRly DEtection oF INfEctious Disease Outbreaks through Big Data Analytics
1/07/20 → 30/06/23
Project: Applied
Research output
-
An efficient model-level fusion approach for continuous affect recognition from audiovisual signals
Pei, E., Jiang, D. & Sahli, H., 2020, In : Neurocomputing. 376, p. 42-53 12 p.Research output: Contribution to journal › Article
-
Context-aware human trajectories prediction via latent variational model
Diaz Berenguer, A., Alioscha-Perez, M., Oveneke, M. C. & Sahli, H., 6 Aug 2020, In : IEEE Transactions on Circuits and Systems for Video Technology. Early AccessResearch output: Contribution to journal › Article
-
Data Augmentation of Surface Electromyography for Hand Gesture Recognition
Tsinganos, P., Cornelis, B., Cornelis, J. P. H., Jansen, B. & Skodras, A., 29 Aug 2020, In : Sensors. 20, 17, 23 p.Research output: Contribution to journal › Article
Open AccessFile5 Downloads (Pure) -
Feature Augmenting Networks for Improving Depression Severity Estimation from Speech Signals
Yang, L., Jiang, D. & Sahli, H., 2020, In : IEEE Access . 8, p. 24033-24045 13 p.Research output: Contribution to journal › Article
-
Hilbert sEMG data scanning for hand gesture recognition based on deep learning
Tsinganos, P., Cornelis, B., Jansen, B., Cornelis, J. P. H. & Skodras, A., 7 Jul 2020, In : Neural Computing & Applications. 2020, 22 p.Research output: Contribution to journal › Article
Open AccessFile1 Downloads (Pure)
Activities
-
Social Entrepreneurship Workshop
Maria Angeles Alpizar Terrero (Organiser), Nikolay Dentchev (Chair), Teresa Orbera Raton (Organiser), Pedro Mune Bandera (Organiser), Edgar Izquierdo (Visitor), Hichem Sahli (Chair)7 Nov 2017 → 8 Nov 2017Activity: Participating in or organising an event › Participating in or organizing an event at an external academic organisation
-
SINS Year 3 Workshop
Henk Emiel Brouckxon (Organiser), Werner Verhelst (Organiser)17 Feb 2017Activity: Participating in or organising an event › Participation in workshop, seminar
-
IEEE 6th International Conference on Affective Computing and Intelligent Interaction
Hichem Sahli (Organiser)21 Sep 2015 → 24 Sep 2015Activity: Participating in or organising an event › Participation in conference
-
Journal on Multimodal User Interfaces (Journal)
Isabel Gonzalez (Peer reviewer)10 Dec 2015Activity: Publication peer-review and editorial work › Publication peer-review
-
Dutch-Belgian Information Retrieval workshop
Werner Verhelst (Organiser)14 Apr 2008 → 15 Apr 2008Activity: Participating in or organising an event › Participation in workshop, seminar