## Abstract

Probability of default estimation via machine learning on historical data is widely studied in credit risk modeling, where risk is estimated by the probability of an entire company or person to default on a loan. In this work, we investigated the use of machine learning for a finer-grained risk estimation task, namely spot factoring. Here, the goal is to estimate the likelihood that a single invoice will be paid in an acceptable timeframe. In this case, risk is more related to when the invoice was paid (if ever), compared to the contractual payment date, that is, the overdueness of an invoice.

In order to express overdueness in a spot factoring context, we construct three new variables to predict. One such target variable directly shows the overdueness in terms of days overdue. A second, graded, target variable that we use stems from the way decision makers categorize invoices in risk groups that reflect increasingly later payment periods and more thorough collection efforts. We obtain a third and binary target variable by merging the two best and merging the three worst risk groups. This amounts to splitting the invoices between those that are paid within the first 25 days after their due day and those that are not.

We investigate the suitability of three prediction tasks when using the new target variables: classical binary classification, regression of the days overdue and learning-to-rank methods that learn to optimize the risk-related ranking of all instances rather than predicting a value. We describe different ways of modeling these tasks and evaluate them on real-life spot factoring data, with measures specific to each of the learning tasks. As the goal is to evaluate the tasks, rather than the learning methods, we compare them for three different families of learning methods: linear models, support vector machines and gradient-boosted trees. To better answer the question of the suitability of the different tasks for spot factoring, we also perform a profit-driven evaluation.

In the experiments, regression methods achieve consistently high scores across the evaluation measures and for all method families. The profit-based evaluation identified the regressor with XGBoost as an outspokenly good performer. We recommend evaluating from a profit-based perspective in a spot factoring context, as it is easily interpretable, consolidates the business objective and addresses the different possible costs associated with different invoice outcomes.

In order to express overdueness in a spot factoring context, we construct three new variables to predict. One such target variable directly shows the overdueness in terms of days overdue. A second, graded, target variable that we use stems from the way decision makers categorize invoices in risk groups that reflect increasingly later payment periods and more thorough collection efforts. We obtain a third and binary target variable by merging the two best and merging the three worst risk groups. This amounts to splitting the invoices between those that are paid within the first 25 days after their due day and those that are not.

We investigate the suitability of three prediction tasks when using the new target variables: classical binary classification, regression of the days overdue and learning-to-rank methods that learn to optimize the risk-related ranking of all instances rather than predicting a value. We describe different ways of modeling these tasks and evaluate them on real-life spot factoring data, with measures specific to each of the learning tasks. As the goal is to evaluate the tasks, rather than the learning methods, we compare them for three different families of learning methods: linear models, support vector machines and gradient-boosted trees. To better answer the question of the suitability of the different tasks for spot factoring, we also perform a profit-driven evaluation.

In the experiments, regression methods achieve consistently high scores across the evaluation measures and for all method families. The profit-based evaluation identified the regressor with XGBoost as an outspokenly good performer. We recommend evaluating from a profit-based perspective in a spot factoring context, as it is easily interpretable, consolidates the business objective and addresses the different possible costs associated with different invoice outcomes.

Original language | English |
---|---|

Title of host publication | 34th annual conference of the Belgian Operational Research Society |

Subtitle of host publication | ORBEL 34 |

Publisher | Belgian Operational Research (OR) Society |

Pages | 191-206 |

Number of pages | 16 |

Volume | 73 |

DOIs | |

Publication status | Published - Jan 2020 |

Event | 34th annual conference of the Belgian Operational Research Society - Centrale Lille, Lille, France Duration: 30 Jan 2020 → 31 Jan 2020 Conference number: 34 https://www.orbel.be/orbel34/index.php |

### Publication series

Name | Journal of the Operational Research Society |
---|---|

Publisher | Palgrave Macmillan Ltd. |

ISSN (Print) | 0160-5682 |

### Conference

Conference | 34th annual conference of the Belgian Operational Research Society |
---|---|

Abbreviated title | ORBEL |

Country/Territory | France |

City | Lille |

Period | 30/01/20 → 31/01/20 |

Internet address |