Samenvatting
In this article we describe our reasons for preferring an application programming interface (API) over a relational- or XML-database, to construct terminological- and lexicographical resources. We will explain how our research, aimed at developing terminological- and lexicographical databases that could be used and supported by a broad range of specialised software tools, has led to this opinion. This research has spanned several projects for which we developed, different multilingual ontological-underpinned lexical resources, and specialised software tools to support these resource-development tasks.
Because various applications should be able to use the resulting lexical resources, we wanted to structure the resources by means of an application ontology. Application ontologies can be interpreted by different applications and may thus facilitate the integration of the software tools. At first we tried to use the Protégé ontology editor API to store the required lexical information. This API facilitated the development of an ontological structure and we could easily develop software tools based on it. However, it proved difficult (if not impossible) to store all the lexical information by simply using this ontological structure. We therefore expanded the ontological structure to include the required lexical information. The resulting structure we call a Categorisation Framework (CF) and we use it to categorise lexical information. We shall demonstrate how the CF can be used to structure and store all kinds of lexical information.
Due to the multilingual- and specialised nature of the resources it was necessary for different domain experts to collaborate, while constructing the domain ontologies and gathering the lexical information. By implementing an XML-format to represent the CF, we ensured that the resources could be developed and exchanged in a modular way. The XML-format made it also possible to include existing structured information, e.g. databases, by converting them into this XML-format.
Although the XML-format proved to be extremely useful during development, it became clear that the size and complexity of the total resources required a more efficient database-format. We therefore implemented the CF also as a relational database using JavaDB.
To handle the CF and use the information in our software tools, we developed a Java API. Our software tools for corpus compilation, ontology development and terminology management all use this API. Using the CF API makes it easy to manage the CF, and to store the information in both XML and relational database format. The main advantage of the CF API is that it facilitates the development of specialised software tools for lexicography, terminography, and ontology engineering. Using the same CF API, different software tools can process the appropriate CF information. New projects may simply reuse information from previous projects, while the flexible and customisable nature of the CF enables the addition of extra lexical information.
Because various applications should be able to use the resulting lexical resources, we wanted to structure the resources by means of an application ontology. Application ontologies can be interpreted by different applications and may thus facilitate the integration of the software tools. At first we tried to use the Protégé ontology editor API to store the required lexical information. This API facilitated the development of an ontological structure and we could easily develop software tools based on it. However, it proved difficult (if not impossible) to store all the lexical information by simply using this ontological structure. We therefore expanded the ontological structure to include the required lexical information. The resulting structure we call a Categorisation Framework (CF) and we use it to categorise lexical information. We shall demonstrate how the CF can be used to structure and store all kinds of lexical information.
Due to the multilingual- and specialised nature of the resources it was necessary for different domain experts to collaborate, while constructing the domain ontologies and gathering the lexical information. By implementing an XML-format to represent the CF, we ensured that the resources could be developed and exchanged in a modular way. The XML-format made it also possible to include existing structured information, e.g. databases, by converting them into this XML-format.
Although the XML-format proved to be extremely useful during development, it became clear that the size and complexity of the total resources required a more efficient database-format. We therefore implemented the CF also as a relational database using JavaDB.
To handle the CF and use the information in our software tools, we developed a Java API. Our software tools for corpus compilation, ontology development and terminology management all use this API. Using the CF API makes it easy to manage the CF, and to store the information in both XML and relational database format. The main advantage of the CF API is that it facilitates the development of specialised software tools for lexicography, terminography, and ontology engineering. Using the same CF API, different software tools can process the appropriate CF information. New projects may simply reuse information from previous projects, while the flexible and customisable nature of the CF enables the addition of extra lexical information.
| Originele taal-2 | English |
|---|---|
| Titel | Computational Lexicography workshop 2007 |
| Status | Published - 2007 |
Publicatie series
| Naam | Computational Lexicography workshop 2007 |
|---|
Vingerafdruk
Duik in de onderzoeksthema's van 'A Categorisation Framework API for constructing ontology-based lexical resources'. Samen vormen ze een unieke vingerafdruk.Citeer dit
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver