Abstract
This paper presents a novel unsupervised method for identifying the semantic
structure in long semi-structured video streams. We identify chains, i.e., local clusters of
repeated features from both the video stream and audio transcripts. Each chain serves as an
indicator that the temporal interval it demarcates is part of the same semantic event. By
layering all the chains over each other, dense regions emerge from the overlapping chains,
from which we can identify the semantic structure of the video. We present two clustering
strategies that accomplish this task, and compare them against a baseline Scene Transition
Graph approach. We then develop a commentator that provides a semantic labeling of the
resultant video segmentation.
structure in long semi-structured video streams. We identify chains, i.e., local clusters of
repeated features from both the video stream and audio transcripts. Each chain serves as an
indicator that the temporal interval it demarcates is part of the same semantic event. By
layering all the chains over each other, dense regions emerge from the overlapping chains,
from which we can identify the semantic structure of the video. We present two clustering
strategies that accomplish this task, and compare them against a baseline Scene Transition
Graph approach. We then develop a commentator that provides a semantic labeling of the
resultant video segmentation.
| Original language | English |
|---|---|
| Pages (from-to) | 159-175 |
| Number of pages | 16 |
| Journal | Multimedia Tools and Applications |
| Volume | 70 |
| Issue number | 1 |
| Publication status | Published - 1 May 2014 |
Bibliographical note
Borko FurhtKeywords
- Semantic event detection
- Feature extraction
- . Multi-modal scene segmentation
- Video summarization
Fingerprint
Dive into the research topics of 'Unsupervised scene detection and commentator building using multi-modal chains'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver