A Large-Scale Empirical Investigation Into Cross-Project Flaky Test Prediction

  • Valeria Pontillo (Data Curator)
  • Angelo Afeltra (Data Collector)
  • Alfonso Cannavale (Creator)
  • Fabiano Pecorelli (Supervisor)
  • Fabio Palomba (Supervisor)

Dataset

Description

This repository contains information on the dataset of flaky test used for the study as well as the code and the results.

Abstract

Test flakiness arises when a test case exhibits inconsistent behavior by alternating between passing and failing states when executed against the same code. Previous research showed the significance of the problem in practice, proposing empirical studies into the nature of flakiness and automated techniques for its detection. Machine learning models emerged as a promising approach for flaky test prediction. However, existing research has predominantly focused on within-project scenarios, where models are trained and tested using data from a single project. On the contrary, little is known about how flaky test prediction models may be adapted to software projects lacking sufficient historical data for effective prediction. In this paper, we address this gap by proposing a large-scale assessment of flaky test prediction in cross-project scenarios, i.e., in situations where predictive models are trained using data coming from external projects. Leveraging a dataset of 1,385 flaky tests from 29 open-source projects, we examine static test flakiness prediction models and evaluate feature- and instance-based filtering methods for cross-project predictions. Our study underscores the difficulties in utilizing cross-project flaky test data and underscores the significance of filtering methods in enhancing prediction accuracy. Notably, we find that the TrAdaBoost filtering method significantly reduces data heterogeneity, leading to an F-Measure of 70%.
Datum van beschikbaarheid2024
UitgeverGitHub
Datum van data-aanmaak2024

Format

  • Format

Citeer dit