An Efficient Context-based Gene Normalization System

Yifei Chen, Feng Liu, Bernard Manderick

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

In this paper, we propose a two-stage integrated gene normalization
system to extract gene and gene product names from biomedical
literature then map them to EntrezGene database identifiers. In the
two stages we design a set of functional operators and make them
work together efficiently. In the first stage, a cascading error
corrector is created based on semantic and species information to
prevent the errors from flowing between the two system stages. In
the second stage, a preprocessor is designed to alleviate two main
challenges of biomedical nomenclature, variety and ambiguity, to
attain uniform gene mention expressions. These uniform expressions
make a simple and efficient exact text matcher possible to map gene
mentions to EntrezGene identifiers. A disambiguation filter making
used of the internal context information in the text to solve the
overlapping problem when mapping. Based on the set of functional
operators, our system can achieve a fairly good performance, which
can attain the precision of 79.8%, recall of 83.6% and balanced
F1 score of 81.7.
Original languageEnglish
Title of host publicationProceedings of the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09)
EditorsWilliam Loging, Mukesh Doble, Zhirong Sun, James Malone
PublisherISRST
Pages7-14
Number of pages8
ISBN (Print)978-1-60651-009-4
Publication statusPublished - 13 Jul 2009

Publication series

NameProceedings of the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09)

Bibliographical note

William Loging, Mukesh Doble, Zhirong Sun, James Malone

Keywords

  • Gene Normalization

Fingerprint

Dive into the research topics of 'An Efficient Context-based Gene Normalization System'. Together they form a unique fingerprint.

Cite this