Bayesian Damage Recognition in Document Images Based on a Joint Global and Local Homogeneity Model

Onderzoeksoutput: Articlepeer review

1 Citaat (Scopus)

Samenvatting

Physical damages (such as torn-offs and scratches) are commonly seen in historical documents. Recognition of such damages is currently absent in digitization-and-information-extraction (DIE) systems but crucial for automatic document comprehension and exploitation. In this paper we propose a generic damage recognition (DR) method based on a joint global and local modeling of the text homogeneity (TH) pattern exhibited in document images. More specifically, a connected component (CC) based formulation is developed as a global homogeneity measure, where TH is characterized using a probabilistic graph model for a coarse recognition of damaged regions. A multi-resolution analysis (MRA) of TH is further developed for a granular within-CC recognition of damage pixels, where the disparity between damage and text pixels is characterized by exploiting neighborhood transitions. This enables the formulation of a local homogeneity measure, where the neighborhood transition around an individual pixel is modeled using the propagation of the approximation coefficients of a stationary wavelet transform (SWT). The proposed global and local homogeneity measures are integrated as a joint likelihood in a Bayesian model with a Markov random field (MRF) prior, where DR is formulated as a maximum a posterior (MAP) inference which is addressed using Markov Chain Monte Carlo (MCMC) sampling. The resulting algorithm is tested on a set of real-life historical newspaper images containing damages of varying size and shape. The performance of the algorithm is evaluated using both F-measures and the Intersection-over-Union (IoU) metric, where test results demonstrate the promising potential of the proposed method.

Originele taal-2English
Artikelnummer108034
TijdschriftPattern Recognition
Volume118
DOI's
StatusPublished - okt 2021

Vingerafdruk

Duik in de onderzoeksthema's van 'Bayesian Damage Recognition in Document Images Based on a Joint Global and Local Homogeneity Model'. Samen vormen ze een unieke vingerafdruk.

Citeer dit