Our paper describing scikit-multiflow has been accepted for publication at the Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR MLOSS).
Abstract: Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA.
The preprint version of our paper describing scikit-multiflow is available on arXiv
Abstract: Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing.
I am attending the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD18), to present our paper
Scalable Model-based Cascaded Imputation of Missing Data.
Abstract: Missing data is a common trait of real-world data that can negatively impact interpretability. In this paper, we present CASCADE IMPUTATION (CIM), an effective and scalable technique for automatic imputation of missing data. CIM is not restrictive on the characteristics of the data set, providing support for: Missing At Random and Missing Completely At Random data, numerical and nominal attributes, and large data sets including highly dimensional data sets.
I am attending the 2017 IEEE International Conference on Big Data in Boston, Massachusetts, to present our paper
Predicting over-indebtedness on batch and streaming data.
Abstract: Detecting over-indebtedness, the difficulties meeting household payment commitments, poses multiple Big Data challenges for banking institutions. We present a novel data-driven framework for predicting over-indebtedness on real-world data. A warning mechanism that generates predictions 6 months ahead, improving the chances of financial recovery. This framework is based on the combination of feature selection and supervised learning techniques, and uses data balancing for fine-tuning the predictive models.
I am attending the 2017 Data Science Summer School in Paris.
This Summer School is organized by the Data Science Initiative at École Polytechnique. DS3 2017 covered a broad range of topics and speakers are top researchers on their field. A recommended summer school to get a better picture of the state-of-the-art on Data Science and new trends.
I am attending the 2017 Lisbon Machine Learning School in Lisbon, Portugal.
This Summer School took place at the Instituto Superior Técnico (IST) and covered from core concepts of Machine Learning to specific techniques of NLP. I highly recommend this summer school given that it includes practical workshops that reinforce the understanding of the topics.