scikit-multiflow: machine learning on infinite data streams


Date
Nov 16, 2019 11:35
Location
Cambridge, UK

PyData Cambridge

In the field of machine learning on data streams, data is assumed infinite and models are trained and updated continuously, thereby adapting to changes in the data. This talk provides an overview of data stream learning and introduces scikit-multiflow, an open-source Python framework to easily implement algorithms and perform experiments.

As traditional “batch” learning struggles to keep in pace with today’s data deluge, a parallel field emerges — data stream mining. In this field, data is assumed infinite and models are trained and updated continuously, thereby adapting to changes in the data. This talk provides an overview of the core concepts of data stream learning and introduces scikit-multiflow, an open-source Python framework to implement algorithms and perform experiments in the field of machine learning on evolving data streams.

This talk is composed of two main sections:

An introduction to learning from data streams

  • How is stream learning different from the “traditional” batch learning?
  • Requirements
  • An overview of methods for supervised learning
  • Discussion of challenges from changes in the data distribution, known as concept drift
  • Evaluating model performance on infinite data streams

Scikit-multiflow

  • What is scikit-multiflow?
  • Overview of the core components of scikit-multiflow and available methods
  • Demo
Jacob Montiel
Jacob Montiel
Research Fellow

Research Fellow at the University of Waikato (NZ) and maintainer of River.