Abstract
In this paper, we propose a two-stage integrated gene normalization
system to extract gene and gene product names from biomedical
literature then map them to EntrezGene database identifiers. In the
two stages we design a set of functional operators and make them
work together efficiently. In the first stage, a cascading error
corrector is created based on semantic and species information to
prevent the errors from flowing between the two system stages. In
the second stage, a preprocessor is designed to alleviate two main
challenges of biomedical nomenclature, variety and ambiguity, to
attain uniform gene mention expressions. These uniform expressions
make a simple and efficient exact text matcher possible to map gene
mentions to EntrezGene identifiers. A disambiguation filter making
used of the internal context information in the text to solve the
overlapping problem when mapping. Based on the set of functional
operators, our system can achieve a fairly good performance, which
can attain the precision of 79.8%, recall of 83.6% and balanced
F1 score of 81.7.
system to extract gene and gene product names from biomedical
literature then map them to EntrezGene database identifiers. In the
two stages we design a set of functional operators and make them
work together efficiently. In the first stage, a cascading error
corrector is created based on semantic and species information to
prevent the errors from flowing between the two system stages. In
the second stage, a preprocessor is designed to alleviate two main
challenges of biomedical nomenclature, variety and ambiguity, to
attain uniform gene mention expressions. These uniform expressions
make a simple and efficient exact text matcher possible to map gene
mentions to EntrezGene identifiers. A disambiguation filter making
used of the internal context information in the text to solve the
overlapping problem when mapping. Based on the set of functional
operators, our system can achieve a fairly good performance, which
can attain the precision of 79.8%, recall of 83.6% and balanced
F1 score of 81.7.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) |
Editors | William Loging, Mukesh Doble, Zhirong Sun, James Malone |
Publisher | ISRST |
Pages | 7-14 |
Number of pages | 8 |
ISBN (Print) | 978-1-60651-009-4 |
Publication status | Published - 13 Jul 2009 |
Publication series
Name | Proceedings of the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) |
---|
Bibliographical note
William Loging, Mukesh Doble, Zhirong Sun, James MaloneKeywords
- Gene Normalization