Outlier detection

Note: This is still a beta functionality.

An outlier is a data point that is significantly different from rest of the data. Presence of outliers in training data can deteriorate the generalization performance of machine learning models. In this aspect, we aim to remove outliers from signals.

Usage

signal.process_outliers(strategy='online', threshold=3)

This method processes a signal and removes outliers.

Parameters
  • signal – The signal to process.

  • strategy – Two strategies are available at the moment: 1, ‘online’: it looks at sudden changes in a signal to classify a datapoint as outlier. 2, ‘residual’: it looks at residuals from a regression model to detect an outlier.

  • threshold – It acts as a threshold on the decision scores for outliers above which a data point is flagged as an outlier.

To process outliers in sales signal, call:

actual('sales').process_outliers()

To specify a particular strategy and threshold, we can use:

actual('sales').process_outliers(strategy='online', threshold=3)