Comparative Analysis of Balanced Winnow and SVM in Large Scale Patent Categorization

Katrien Beuls, Bernhard Pflugfelder, Allan Hanbury

    Research output: Chapter in Book/Report/Conference proceedingConference paper

    Abstract

    This study investigates the effect of training different categorization algorithms on a corpus that is significantly larger than those reported in experiments in the literature. By means of machine learning techniques, a collection of 1.2 million patent applications is used to build a classifier that is able to classify documents with varyingly large feature spaces into the International Classification System (IPC) at Subclass level. The two algorithms that are compared are Balanced Winnow and Support Vector Machines (SVMs). Contrary to SVM, Balanced Winnow is frequently applied in today's patent categorization systems. Results show that SVM outperforms Winnow considerably on all four document representations that were tested. While Winnow results on the smallest sub-corpus do not necessarily hold for the full corpus, SVM results are more robust: they show smaller fluctuations in accuracy when smaller or larger feature spaces are used. The parameter tuning that was carried out for both algorithms con?rms this result. Although it is necessary to tune SVM experiments to optimize either recall or precision - whereas this can be combined when Winnow is used - e?ective parameter settings obtained on a small corpus can be used for training a larger corpus.
    Original languageEnglish
    Title of host publicationProceedings of the 10th Dutch-Belgian Information Retrieval Workshop
    Pages8-15
    Number of pages8
    Publication statusPublished - 26 Jan 2010

    Publication series

    NameProceedings of the 10th Dutch-Belgian Information Retrieval Workshop

    Keywords

    • Patent Clasification
    • Intellectual Property
    • IPC taxonomy

    Fingerprint

    Dive into the research topics of 'Comparative Analysis of Balanced Winnow and SVM in Large Scale Patent Categorization'. Together they form a unique fingerprint.

    Cite this