Projects per year
Abstract
In property-based testing (PBT), developers specify properties that they expect the system under test to hold. The PBT tool generates random inputs for the system and tests for each of these inputs whether the given property holds. An advantage of this approach over testing a set of manually defined example inputs is that it enables a higher code coverage.
Machine learning (ML) projects, however, often have to process large amounts of diverse data, both for training a model and afterwards, when the trained model is deployed. Generating a sufficient amount of diverse data for the property-based tests is therefore challenging.
In this paper, we present the results of a preliminary study in which we examined a dataset of 58 open-source ML projects that have dependencies on the popular PBT library Hypothesis, to identify issues faced by developers writing property-based tests. For a subset of 28 open-source ML projects, we study the property-based tests in detail and report on the part of the ML project that is being tested as well as on the adopted data generation strategies. This way, we aim to identify issues in porting current PBT techniques to ML projects so that they can be addressed in the future.
Machine learning (ML) projects, however, often have to process large amounts of diverse data, both for training a model and afterwards, when the trained model is deployed. Generating a sufficient amount of diverse data for the property-based tests is therefore challenging.
In this paper, we present the results of a preliminary study in which we examined a dataset of 58 open-source ML projects that have dependencies on the popular PBT library Hypothesis, to identify issues faced by developers writing property-based tests. For a subset of 28 open-source ML projects, we study the property-based tests in detail and report on the part of the ML project that is being tested as well as on the adopted data generation strategies. This way, we aim to identify issues in porting current PBT techniques to ML projects so that they can be addressed in the future.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) |
Publisher | IEEE |
Pages | 648-653 |
Number of pages | 6 |
Volume | 40th |
Edition | 2024 |
ISBN (Electronic) | 979-8-3503-9568-6 |
Publication status | Published - Oct 2024 |
Event | 40th International Conference on Software Maintenance and Evolution (ICSME 2024) - Flagstaff, United States Duration: 6 Oct 2024 → 11 Oct 2024 Conference number: 40 https://conf.researchr.org/track/icsme-2024/ |
Publication series
Name | |
---|---|
ISSN (Electronic) | 2576-3148 |
Conference
Conference | 40th International Conference on Software Maintenance and Evolution (ICSME 2024) |
---|---|
Abbreviated title | ICSME |
Country/Territory | United States |
City | Flagstaff |
Period | 6/10/24 → 11/10/24 |
Internet address |
Keywords
- property-based testing
- machine learning projects
- testing machine learning
- Empirical Study
- software testing
Projects
- 1 Active
Research output
- 1 Poster
-
Property-based Testing within ML-projects: An empirical study on ML GitHub projects
Wauters, C. & De Roover, C., 9 Sep 2024, (Unpublished).Research output: Unpublished contribution to conference › Poster
File
Datasets
-
Dataset for "Property-based Testing within ML Projects: an Empirical Study" (ICSME NIER 2024)
Wauters, C. (Creator) & De Roover, C. (Creator), Zenodo, 31 Aug 2024
DOI: 10.5281/zenodo.13341915, https://zenodo.org/doi/10.5281/zenodo.13341915
Dataset