MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven

Research output: Chapter in Book/Report/Conference proceedingConference paper

4 Citations (Scopus)
56 Downloads (Pure)

Abstract

Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting.

This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.
Original languageEnglish
Title of host publicationProceedings of the 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020)
PublisherIEEE
Pages243-247
Number of pages5
ISBN (Electronic)9781728192482
ISBN (Print)978-1-7281-9248-2
DOIs
Publication statusPublished - Sep 2020
Event20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020) - Adelaide, Australia
Duration: 27 Sep 202028 Sep 2020

Publication series

NameProceedings - 20th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2020

Conference

Conference20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020)
CountryAustralia
CityAdelaide
Period27/09/2028/09/20

Keywords

  • multi-label classification
  • libraries
  • software ecosystems
  • machine learning
  • software engineering

Fingerprint

Dive into the research topics of 'MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven'. Together they form a unique fingerprint.

Cite this