Projects per year
Abstract
The Stack Overflow Q&A platform boasts an active community of users who often include code snippets in their questions and answers. Several development tools rely on these code snippets as a source of information. Although code snippets are intended as examples for humans, they may not form compilation units. For instance, snippets illustrating how to use an API might lack the import statements for the corresponding API types. Thus, it becomes essential to determine the fully-qualified name of API types in incomplete snippets.
We present RESICO, a machine learning-based text classification approach to resolving the simple name of API types to their fully-qualified names. RESICO is trained on a corpus of Java programs for which a compiler can determine the fully-qualified names. For four machine learning classifiers, we evaluate the type resolution accuracy of the resulting models on the original and an extended version of datasets of snippets previously used to evaluate the current state-of-the-art approach based on information retrieval. Results show that our approach outperforms the state-of-the-art one, although the training phase is slightly slower. We observe that most of the incorrect type resolutions are not due to ambiguities among the simple names for API types but due to similarities among the contexts in which these types are used, representing a future research challenge.
We present RESICO, a machine learning-based text classification approach to resolving the simple name of API types to their fully-qualified names. RESICO is trained on a corpus of Java programs for which a compiler can determine the fully-qualified names. For four machine learning classifiers, we evaluate the type resolution accuracy of the resulting models on the original and an extended version of datasets of snippets previously used to evaluate the current state-of-the-art approach based on information retrieval. Results show that our approach outperforms the state-of-the-art one, although the training phase is slightly slower. We observe that most of the incorrect type resolutions are not due to ambiguities among the simple names for API types but due to similarities among the contexts in which these types are used, representing a future research challenge.
Original language | English |
---|---|
Article number | 102941 |
Number of pages | 26 |
Journal | Science of Computer Programming |
Volume | 227 |
DOIs | |
Publication status | Published - Apr 2023 |
Bibliographical note
Funding Information:We would like to thank the authors of COSTER [5] for sharing their tool and data. This research was partially funded by the Excellence of Science project EOS 30446992 SECO-ASSIST financed by FWO-Vlaanderen and F.R.S.-FNRS .
Publisher Copyright:
© 2023 Elsevier B.V.
Copyright:
Copyright 2023 Elsevier B.V., All rights reserved.
Keywords
- Fully Qualified Name Resolution
- Machine Learning
- Text Classification
- Stack Overflow
Fingerprint
Dive into the research topics of 'A Text Classification Approach to API Type Resolution for Incomplete Code Snippets'. Together they form a unique fingerprint.Projects
- 1 Finished
-
FWOEOS10: Automated Assistance for Developing Software in Ecosystems of the Future
De Roover, C., Mens, T., Demeyer, S. & Cleve, A.
1/01/18 → 31/12/21
Project: Fundamental
Datasets
-
COSTER Dataset and model
Velazquez Rodriguez, C. E. (Creator), Di Nucci, D. (Researcher) & De Roover, C. (Researcher), Zenodo, 24 Oct 2022
Dataset
-
A Text Classification Approach to API Type Resolution for Incomplete Code Snippets
Velazquez Rodriguez, C. E. (Creator), Di Nucci, D. (Related person) & De Roover, C. (Related person), Zenodo, 2023
DOI: 10.5281/zenodo.7276757, https://zenodo.org/record/7276757 and one more link, https://github.com/softwarelanguageslab/resico-paper (show fewer)
Dataset