machine learning

scikit-multiflow has been accepted at JMLR MLOSS!

Our paper describing scikit-multiflow has been accepted for publication at the Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR MLOSS). Abstract: Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA.

scikit-multiflow preprint is available!

The preprint version of our paper describing scikit-multiflow is available on arXiv Abstract: Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing.

Talk at the University of Auckland

I am giving a talk “Missing Data Imputation and scikit-multiflow” at the Knowledge Management Group in the University of Auckland in New Zealand.

Visit to the University of Waikato

I am visiting the Machine Learning Group at the University of Waikato in New Zealand.

PAKDD 2018

I am attending the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD18), to present our paper Scalable Model-based Cascaded Imputation of Missing Data. Abstract: Missing data is a common trait of real-world data that can negatively impact interpretability. In this paper, we present CASCADE IMPUTATION (CIM), an effective and scalable technique for automatic imputation of missing data. CIM is not restrictive on the characteristics of the data set, providing support for: Missing At Random and Missing Completely At Random data, numerical and nominal attributes, and large data sets including highly dimensional data sets.

IEEE BigData 2017

I am attending the 2017 IEEE International Conference on Big Data in Boston, Massachusetts, to present our paper Predicting over-indebtedness on batch and streaming data. Abstract: Detecting over-indebtedness, the difficulties meeting household payment commitments, poses multiple Big Data challenges for banking institutions. We present a novel data-driven framework for predicting over-indebtedness on real-world data. A warning mechanism that generates predictions 6 months ahead, improving the chances of financial recovery. This framework is based on the combination of feature selection and supervised learning techniques, and uses data balancing for fine-tuning the predictive models.

DS3 2017

I am attending the 2017 Data Science Summer School in Paris. This Summer School is organized by the Data Science Initiative at École Polytechnique. DS3 2017 covered a broad range of topics and speakers are top researchers on their field. A recommended summer school to get a better picture of the state-of-the-art on Data Science and new trends.

LXMLS 2017

I am attending the 2017 Lisbon Machine Learning School in Lisbon, Portugal. This Summer School took place at the Instituto Superior Técnico (IST) and covered from core concepts of Machine Learning to specific techniques of NLP. I highly recommend this summer school given that it includes practical workshops that reinforce the understanding of the topics.