Replication Package for Ansible Sensitive Parameter Prediction Study

Dataset

Description

Replication package for the paper entitled "Smelling Secrets: Leveraging Machine Learning and Language Models for Sensitive Parameter Detection in Ansible Security Analysis", accepted for publication at the 25th IEEE International Conference on Source Code Analysis & Manipulation (SCAM 2025).

Contents

00_data_collection — Scripts and results for data collection and ground truth construction.
RQ1_ml — Scripts, results, and models for RQ1 (performance of machine learning classifiers)
RQ2_lm — Scripts, results, and models for RQ2 (performance of language model classifiers)
RQ3_comparison — Scripts and results for RQ3 (comparison of best models from RQ1 and RQ2 against baselines)
RQ4_unseen_params — Scripts and results for RQ4 (prediction of unannotated parameters)

Abstract

Replication package for the paper entitled "Smelling Secrets: Leveraging Machine Learning and Language Models for Sensitive Parameter Detection in Ansible Security Analysis", accepted for publication at the 25th IEEE International Conference on Source Code Analysis & Manipulation (SCAM 2025).

Contents

00_data_collection — Scripts and results for data collection and ground truth construction.
RQ1_ml — Scripts, results, and models for RQ1 (performance of machine learning classifiers)
RQ2_lm — Scripts, results, and models for RQ2 (performance of language model classifiers)
RQ3_comparison — Scripts and results for RQ3 (comparison of best models from RQ1 and RQ2 against baselines)
RQ4_unseen_params — Scripts and results for RQ4 (prediction of unannotated parameters)

Size

4.89GB

Version

1.0
Date made available4 Aug 2025
Publisherfigshare
Date of data production2025 -

Keywords

  • Infrastructure as Code
  • Ansible
  • Machine Learning
  • Language Models
  • Secrets
  • Security

Format

  • Format
  • py
  • ipynb
  • md
  • csv
  • gz
  • tar.xz
  • txt
  • pdf

Cite this