Abstract

Causal classification concerns the estimation of the net effect of a treatment on an outcome of interest at the instance level, i.e., of the individual treatment effect (ITE). For binary treatment and outcome variables, causal classification models produce ITE estimates that essentially allow one to rank instances from a large positive effect to a large negative effect. Often, as in uplift modeling (UM), one is merely interested in this ranking, rather than in the ITE estimates themselves. In this regard, we investigate the potential of learning to rank (L2R) techniques to learn a ranking of the instances directly. We propose a unified formalization of different binary causal classification performance measures from the UM literature and explore how these can be integrated into the L2R framework. Additionally, we introduce a new metric for UM with L2R called the promoted cumulative gain (PCG). We employ the L2R technique LambdaMART to optimize the ranking according to PCG and show improved results over the use of standard L2R metrics and equal to improved results when compared with state-of-the-art UM. Finally, we show how L2R techniques can be used to specifically optimize for the top-k fraction of the ranking in a UM context, however, these results do not generalize to the test set.

Original languageEnglish
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusAccepted/In press - 1 Jan 2020

Keywords

  • Data preprocessing
  • Estimation
  • Predictive models
  • Sociology
  • Standards
  • Statistics
  • Vegetation

Cite this