Cost-Sensitive Ensembles of Classifiers: Select Applications in HR Analytics

Natalie Lawrance

Research output: ThesisPhD Thesis

26 Downloads (Pure)

Abstract

Data-driven decision-making is becoming increasingly common in our economy.
Businesses are keen to apply machine learning to more and more areas of their
operations, and Human Resource (HR) management is no exception.
HR analytics aims to apply data analysis techniques to processes related to hu-
man capital management. Many examples of data mining applications in HR
exist already, relating to the recruitment and onboarding process, and even em-
ployee turnover prevention. In this dissertation we present some solutions to
the organisational problem of employee absenteeism prediction. While employee
absenteeism is a well studied topic in a number of research domains such as eco-
nomics, organisational psychology or occupational health and medicine, it has
not yet been addressed in the field of HR analytics.
Many real-world classification problems are cost-sensitive in nature, such that
the misclassification costs vary between data instances. Cost-sensitive learn-
ing adapts classification algorithms to account for differences in misclassification
costs. Application of cost-sensitive classification is known in many domains, but
it is not well known in HR analytics. In this dissertation the merits of cost-
sensitive ensembles of classifiers are demonstrated on the domain of employee
absenteeism prediction.
The first part of this dissertation shows an application of cost-sensitive ensemble
methods to the domain of employee absenteeism. We create a conceptual frame-
work allowing us to apply state-of-the-art homogeneous cost-sensitive ensemble
methods to this domain.
The abundance of classification methods that exist today make the question of
algorithm selection very relevant to practitioners. Employee absenteeism pre-
diction is a difficult problem characterised by shifting target distribution, where
v
each individual’s probability of absence depends on many factors unobservable
in the training set. In this situation selecting any single classification algorithm
for deployment carries the risk of low performance on the new data instances.
Our case study in absenteeism prediction has shown that combining classifiers
achieves better performance compared to that of a single classifier, selected as
the best on a validation set.
The methods of classifier combination we applied in our work appear to be insuf-
ficiently studied in cost-sensitive literature. Stacking (sometimes known as model
blending) is an ensemble method that uses predictions from several classifiers as
the training data for another classifier, which in turn makes the final classification
decision. While stacking is a well-known and widely applied method, applications
of cost-sensitive stacking have been very limited to date. In the final part of this
dissertation we perform a thorough empirical investigation into the performance
of different cost-sensitive stacking ensembles using a number of datasets with
real misclassification costs. We attempt to determine the recommendations for
data science practitioners on what appropriate or ‘best’ combination of classifier
types should be used in such an ensemble. Our experiments, conducted on twelve
datasets from a number of application domains, using real, instance-dependent
misclassification costs, show that for best results, both levels of stacking require
a cost-sensitive classification decision.
In summary, this dissertation contributes to several fields of research. Firstly,
we contribute to the field of HR analytics by providing a new way of looking at
the problem of employee absenteeism prediction. Secondly, we assist HR analyt-
ics and other data science practitioners by proposing a practical solution to the
problem of model selection. Finally, we contribute to the fields of classifier com-
bination and cost-sensitive learning by empirically identifying the appropriate
composition of cost-sensitive stacking ensembles on multiple real-world classifi-
cation domains.
Original languageEnglish
Awarding Institution
  • Vrije Universiteit Brussel
Supervisors/Advisors
  • Guerry, Marie, Supervisor
Award date17 Nov 2022
Publication statusPublished - 2022

Fingerprint

Dive into the research topics of 'Cost-Sensitive Ensembles of Classifiers: Select Applications in HR Analytics'. Together they form a unique fingerprint.

Cite this