Noise characterization for historical documents with physical distortions

Research output: Chapter in Book/Report/Conference proceedingConference paper

3 Citations (Scopus)

Abstract

Physical distortions (such as thorn-offs and scratches) are commonly seen in historical documents. Their presence disturbs downstream processes such as optical character recognition (OCR) and layout analysis, which leads to reduced productivity in automatic document information retrieval. A proper characterization of such physical noise is an important step in the development of historical document denoising methods. In this paper, we tackle noise characterization with Bayesian labeling, where noise and text pixels are characterized in terms of likelihood densities. We employ in particular two different significance measures, which are formulated using pointwise and cone-of-influence (COI) approximation of local Lipschitz regularity in the wavelet domain. We evaluate the effectiveness of the proposed noise characterization using a binary noise versus text classification model, where we show that a naive binary classifier using average point ratio (APR) or average cone ratio (ACR) distribution densities leads to effective classification of noise and text pixels with encouraging overall success rates. This encourages future work on the development of Bayesian frameworks for the recognition of physical distortions in historical documents.
Original languageEnglish
Title of host publicationProceedings Volume 11353, Optics, Photonics and Digital Technologies for Imaging Applications VI
EditorsPeter Schelkens, Tomasz Kozacki
Place of PublicationFrance
PublisherSPIE
Pages1-11
Number of pages11
Volume11353
EditionVI, 113530F
ISBN (Electronic)9781510634787
DOIs
Publication statusPublished - 1 Apr 2020
Event SPIE Photonics Europe, 2020 - online, Strasbourg, France
Duration: 6 Apr 202010 Apr 2020
https://spie.org/conferences-and-exhibitions/photonics-europe?utm_id=repe20pae&spMailingID=4563957&spUserID=MjA2NDExNDgyMTA3S0&spJobID=920584314&spReportId=OTIwNTg0MzE0S0&SSO=1

Publication series

NameOptics, Photonics and Digital Technologies for Imaging Applications VI
PublisherSPIE
Number11353-14
Volume11353

Conference

Conference SPIE Photonics Europe, 2020
Country/TerritoryFrance
CityStrasbourg
Period6/04/2010/04/20
Internet address

Keywords

  • physical distortions
  • historical documents
  • local Lipschitz regularitY
  • wavelets

Fingerprint

Dive into the research topics of 'Noise characterization for historical documents with physical distortions'. Together they form a unique fingerprint.

Cite this