Predicting over-indebtedness on batch and streaming data.

Abstract

Detecting over-indebtedness, the difficulties meeting household payment commitments, poses multiple Big Data challenges for banking institutions. We present a novel data-driven framework for predicting over-indebtedness on real-world data. A warning mechanism that generates predictions 6 months ahead, improving the chances of financial recovery. This framework is based on the combination of feature selection and supervised learning techniques, and uses data balancing for fine-tuning the predictive models. We propose two versions of the framework based on state-of-the-art batch and streaming learning techniques. To the best of our knowledge, the proposed framework is the first to cast over-indebtedness prediction as a stream learning problem. The appeal of stream learning rises from the large amount of data continuously generated, and the fact that batch models become obsolete over time as financial data evolves, while stream models are continuously updated as new data is available. We use credit data from two banks from the Groupe BPCE (the second-largest banking institution in France) and apply multi-metric criteria to evaluate model performance and fairness. Test results show the framework’s interbank applicability and that the proposed batch and stream frameworks outperform the current solution for both single and multi-metric criteria. Additionally, the generic structure of the framework serves as a template for systematically approaching similar classification problems.

Publication
2017 IEEE International Conference on Big Data