MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven

Activity: Talk or presentationTalk or presentation at a conference


Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting.
This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.
Period28 Sep 2020
Event title20th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAMsc 2020, September 27-28, 2020
Event typeConference
Degree of RecognitionInternational