Paired supervised learning and unsupervised pretraining of CNN-architecture for violence detection in videos

Research output: Chapter in Book/Report/Conference proceedingConference paper

Abstract

Recognizing violence in crowded scenes is a major challenge for automatic video surveillance. Indeed, there is a growing need of intelligent surveillance systems to strengthen public safety. In this paper we propose an effective approach to recognize violence in crowded videos based on a shallow Convolutional Neural Network (CNN) that is pretrained using an unsupervised layer-wise learning strategy. Afterwards, the pretrained hyper-parameters are fine-tuned to extract intermediate frame representations, which are subsequently aggregated via NetVLAD to obtain video representations to recognize violence in footage. Through experimental evaluation we validated that our proposal yields very competitive outcomes compared to results reported in the state-of-the-art.

Original languageEnglish
Title of host publicationProceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn2019)
PublisherCEUR Workshop Proceedings
Volume2491
Publication statusPublished - 7 Nov 2019
EventBNAIC 2019 - Brussels, Belgium
Duration: 7 Nov 20198 Nov 2019

Publication series

NameCEUR Workshop Proceedings
ISSN (Print)1613-0073

Conference

ConferenceBNAIC 2019
Country/TerritoryBelgium
CityBrussels
Period7/11/198/11/19

Fingerprint

Dive into the research topics of 'Paired supervised learning and unsupervised pretraining of CNN-architecture for violence detection in videos'. Together they form a unique fingerprint.

Cite this