Forecasting

Exabel offers multiple univariate forecast models directly available in the DSL, from the statsmodels Python library and Facebook’s Prophet library. Forecasts are useful if you want to extend a time series into the future, e.g. for use in models.

signal.forecast(model, values, *, estimation_start, estimation_end, periods, **kwargs)

Fit a model to the time series and obtain forecasts.

Parameters:
  • model (str) – the type of model to build, one of ‘auto’ (default), ‘theta’, ‘holt_winters’, ‘sarima’, ‘unobserved’, ‘prophet’.

  • values (Sequence[str]) – an array of output series to be returned; the options depend on the selected model, but will in all cases include ‘extended’ (the default), ‘forecast’, and ‘observed’, and will for most model types include ‘prediction’ for in-sample one-step ahead predictions and ‘lower’ and ‘upper’ confidence bands. See the description below for each model type regarding exactly which options are available. If only a single output series is desired, it can be specified as ‘forecast’ instead of [‘forecast’].

  • estimation_start (str) – the start date (YYYY-MM-DD) from which to load the time series for fitting; defaults to '2000-01-01'.

  • estimation_end (str) – the date (YYYY-MM-DD) at which to cut the time series for fitting; data up to this date are in-sample, while out-of-sample forecasts are made after this date. By default, there is no end date, meaning all available data is used for fitting. The purpose of specifying an end date would be to see what forecasts would have been produced at a given point in time, in order to compare these forecasts with the actuals.

  • periods (int) – the maximum number of time periods to provide a forecast for beyond the observed time series. By default, there is no limit, which means that as many forecast data points are produced as needed to fill the requested time period.

  • kwargs – parameters to be passed to the underlying model.

For each specific model type, the Exabel implementation specifies certain model parameters by default (listed below under each model type), while the rest have default parameters that follow the library implementation (statsmodels or Prophet). All parameters can be overridden by the user, by supplying them as keyword arguments in the DSL expression.

General advice

All of these statistical methods are by default linear, meaning that they assume growth trends are linear with time and seasonality effects are additive. However, for typical financial KPIs such as revenue, it is more natural to assume that growth trends are exponential, meaning that a company’s revenue grows by e.g. 10% per year rather than projecting that it will grow by a constant such as e.g. $100M per year. Similarly, it is more natural to assume that the seasonality trends are multiplicative, meaning that revenue in Q4 is modelled to be e.g. 20% higher than in Q3, rather than modelled to be $50M higher in Q4.

Such time series can be modelled with exponential growth and multiplicative seasonality by taking the logarithm first, forecasting the logarithmic values, and finally exponentiating to get back the original values. As an example:

Sales_Actual.log().forecast('holt_winters').exp()

Note that some of these methods, Theta and Prophet, offer to possibility to model with multiplicative seasonality. When doing the log transform first, the underlying model should not use multiplicative seasonality. The Prophet model is using additive seasonality by default. The Theta model by default performs a test to determine whether to use additive or multiplicative seasonality, but this can be turned off by setting method='additive':

Sales_Actual.log().forecast('theta', method='additive').exp()

The models do not currently work well on time series with missing data - this is on our roadmap as an area for improvement.

Auto

The “auto” model is intended to give a reasonable forecast in most situations. It is currently the same as a Theta model, as the Theta model was generally the most accurate forecasting method evaluated in the M-3 Makridakis Competition. However, the specific type of modelling technique used is subject to change if Exabel finds an even better model.

These time series can be retrieved for the auto model:

Time series name

Description

extended

The actual, observed values followed by the forecast.

observed

The actual, observed values.

forecast

The forecasted values.

lower

The lower bound of the forecasted confidence interval.

upper

The upper bound of the forecasted confidence interval.

No modelling parameters can be specified for the auto model.

Examples:

Make a forecast:

Sales_Actual.forecast()

Make a forecast with confidence interval:

Sales_Actual.forecast(values=['extended', 'lower', 'upper'])

Theta

The Theta model is a simple forecasting method that combines a linear time trend with a Simple Exponential Smoother. The original publication is The theta model: A decomposition approach to forecasting by Assimakopoulos & Nikolopoulos (2000).

The Makridakis Competitions are a series of open competitions to evaluate and compare the accuracy of different time series forecasting methods. In the M-3 competition held in year 2000, the Theta model was generally the most accurate forecasting method evaluated, and the authors found that «Theta is performing very well for almost all types of data».

Exabel’s implementation relies on Statsmodels. Here is a notebook with an explanation of the model and examples. The available parameters are described in detail on the ThetaModel webpage.

The Exabel implementation specifies the following parameters by default:

Parameter

Value

period

Depending on the frequency of the time series:
• 4 for quarterly,
• 12 for monthly,
• 52 for weekly,
• 365 for daily

These time series can be retrieved:

Time series name

Description

extended

The actual, observed values followed by the forecast.

observed

The actual, observed values.

forecast

The forecasted values.

lower

The lower bound of the forecasted confidence interval.

upper

The upper bound of the forecasted confidence interval.

trend

The trend component of the forecast.

seasonal

The seasonal component of the forecast.

ses

The simple exponential smoothing.

Examples:

Make a forecast:

Sales_Actual.forecast('theta')

Make a forecast with a logarithmic model:

Sales_Actual.log().forecast('theta', method='additive').exp()

Make a forecast with confidence interval:

Sales_Actual.log().forecast('theta', ['extended', 'lower', 'upper'], method='additive').exp()

Show components of the forecast:

Sales_Actual.log().forecast('theta', ['forecast', 'trend', 'seasonal', 'ses'], method='additive').exp()

Make a forecast with the data up until 2018-12-31, to compare with the actual numbers in the following years:

Sales_Actual.log().forecast('theta', ['extended', 'lower', 'upper'], estimation_end='2018-12-31', method='additive').exp()

Make a forecast with a logarithmic model, providing exactly one forecast data point. This can be useful for dashboards or alpha signals, if you want to make sure only to get the forecast for the coming quarter:

Sales_Actual.log().forecast('theta', 'forecast', method='additive', periods=1).exp()

Holt-Winters

Holt’s Winters Seasonal Exponential Smoothing is simple exponential smoothing with the addition of trend and seasonality. The method is described in the book Forecasting: Principles and Practice by Hyndman and Athanasopoulos (2014).

Exabel’s implementation relies on Statsmodels. Here is a notebook with an explanation of the model and examples. The available parameters are described in detail on the ExponentialSmoothing webpage.

The Exabel implementation specifies the following parameters by default:

Parameter

Value

trend

‘additive’

seasonal

‘additive’

seasonal_periods

Depending on the frequency of the time series:
• 4 for quarterly,
• 12 for monthly,
• 52 for weekly,
• 7 for daily

These time series can be retrieved:

Time series name

Description

extended

The actual, observed values followed by the forecast.

observed

The actual, observed values.

forecast

The forecasted values.

prediction

The in-sample one-step ahead predicted values.

residual

The residuals of the in-sample one-step ahead predicted values.

level

The level component of the observed values.

trend

The trend component of the observed values.

seasonal

The seasonal component of the observed values.

Examples:

Make a forecast:

Sales_Actual.forecast('holt_winters')

Make a forecast with a logarithmic model:

Sales_Actual.log().forecast('holt_winters').exp()

Show components of the observed values:

Sales_Actual.log().forecast('holt_winters', ['observed', 'level', 'trend', 'seasonal']).exp()

Unobserved Components

Unobserved Components is a classical time series model that breaks the time series into a trend component, a seasonal component, and a cyclical component. The model itself is explained in detail in the paper The Unobservable Components Model by Prof. Tom Fomby.

Exabel’s implementation relies on Statsmodels. Here is a notebook with an explanation of the model and an example. The available parameters are described in detail on the UnobservedComponents webpage.

The Exabel implementation specifies the following parameters by default:

Parameter

Value

level

True

trend

True

irregular

True

stochastic_trend

True

seasonal

Depending on the frequency of the time series:
• 4 for quarterly,
• 12 for monthly,
• 52 for weekly,
• 7 for daily

cycle

False

alpha

0.05

These time series can be retrieved:

Time series name

Description

extended

The actual, observed values followed by the forecast.

observed

The actual, observed values.

forecast

The forecasted values.

lower

The lower bound of the forecasted confidence interval.

upper

The upper bound of the forecasted confidence interval.

prediction

The in-sample one-step ahead predicted values.

level.filtered

The level component of the observed values (filtered).

level.smoothed

The level component of the observed values (smoothed).

trend.filtered

The trend component of the observed values (filtered).

trend.smoothed

The trend component of the observed values (smoothed).

seasonal.filtered

The seasonal component of the observed values (filtered).

seasonal.smoothed

The seasonal component of the observed values (smoothed).

freq_seasonal.filtered

The harmonic seasonal component of the observed values (filtered).

freq_seasonal.smoothed

The harmonic seasonal component of the observed values (smoothed).

cycle.filtered

The cyclical component of the observed values (filtered).

cycle.smoothed

The cyclical component of the observed values (smoothed).

Note that each of the components (level, trend, seasonal, cycle) are only available if that component was turned on in the model. By default, level, trend and seasonal components are included. To include the cyclical component, set cycle=True.

The width of the confidence band (as shown by the ‘lower’ and ‘upper’ time series) is controlled by the alpha parameter, which defaults to 0.05 (thus the width of the confidence band is 95%).

It is possible to specify the sigma parameters of the model with the parameters sigma_trend, sigma_level, sigma_irregular and sigma_seasonal. These parameters can only be set if the corresponding component is included in the model and is set as stochastic (otherwise, an error message is produced). By default, these parameters are not specified, which means that they are fitted to the data.

The first few data points from the in-sample time series are highly unreliable as the model is “burned in”. Therefore, these data points are removed from the result. By default, the number of data points removed is what statsmodels specifies as loglikelihood_burn. This can be overridden by specifying the burn parameter, e.g. burn=10 would remove the first 10 data points from the in-sample estimated time series.

It is possible to add harmonic seasonality terms to the model with the parameter freq_seasonal (see the statsmodels documentation for how it works). This Exabel function has added a shorthand syntax for doing this with the new parameter harmonics. For a daily time series, setting e.g. harmonics=5 is the same as specifying freq_seasonal=[{"harmonics": 5, "period": 365}], whereas for a weekly time series it would correspond to freq_seasonal=[{"harmonics": 5, "period": 52}]. Note that for weekly, monthly and quarterly time series, the seasonal parameter is not set by default when harmonics is specified (the harmonic seasonality takes its place), whereas for daily timeseries the default seasonal=7 is kept to model the weekly seasonality.

Examples:

Make a forecast:

Sales_Actual.forecast('unobserved')

Make a forecast with a logarithmic model:

Sales_Actual.log().forecast('unobserved').exp()

Make a forecast with a 99% confidence interval:

Sales_Actual.forecast('unobserved', ['extended', 'lower', 'upper'], alpha=0.01)

Show components of the observed values using Kalman filtering, meaning that output at time T refers to estimates conditional on observations up through time T:

Sales_Actual.forecast('unobserved', ['observed', 'level.filtered', 'trend.filtered', 'seasonal.filtered'])

Show components of the observed values using Kalman smoothing, meaning that output at time T refers to estimates conditional on the entire set of observations in the dataset (both before and after time T):

Sales_Actual.forecast('unobserved', ['observed', 'level.smoothed', 'trend.smoothed', 'seasonal.smoothed'])

Specify the sigma_trend parameter:

Sales_Actual.log().forecast('unobserved', sigma_trend=1e-5).exp()

Including the cyclical component:

Sales_Actual.forecast('unobserved', ['extended', 'level.filtered', 'seasonal.filtered', 'cycle.filtered'], cycle=True)

SARIMA

SARIMA is a classical time series model. It’s a form of regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.

Exabel’s implementation relies on statsmodels. Here is a notebook with explanation of the model and examples.

The available parameters are described in detail on the SARIMAX webpage.

‘SARIMAX’ is an acronym for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors. Note that the .forecast() method does not accept additional signals as inputs, so there are no exogenous regressors, and thus the correct name for this model is ‘SARIMA’.

The Exabel implementation specifies the following parameters by default:

Parameter

Value

order

(1, 1, 0)

seasonal_order

Depending on the frequency of the time series:
• (0, 1, 0, 4) for quarterly,
• (0, 1, 0, 12) for monthly,
• (0, 1, 0, 52) for weekly,
• (0, 1, 0, 7) for daily

These time series can be retrieved:

Time series name

Description

extended

The actual, observed values followed by the forecast.

observed

The actual, observed values.

forecast

The forecasted values.

lower

The lower bound of the forecasted confidence interval.

upper

The upper bound of the forecasted confidence interval.

prediction

The in-sample one-step ahead predicted values.

resid

The residuals of the in-sample one-step ahead predicted values.

Examples:

Make a forecast:

Sales_Actual.forecast('sarima')

Make a forecast with a logarithmic model:

Sales_Actual.log().forecast('sarima').exp()

Make a forecast with a logarithmic model and including confidence interval:

Sales_Actual.log().forecast('sarima', ['extended', 'lower', 'upper']).exp()

Prophet

The Prophet library uses an additive model, where a time series is modelled as the sum of a trend component and yearly and weekly seasonality components. The model also takes holiday effects into account, if a country code is specified with the country parameter.

The Prophet library is highly configurable, and its parameters can be passed to the forecast signal function. You can find these parameters in the Prophet documentation.

The Exabel implementation does not override the default modelling parameters for Prophet.

These time series can be retrieved:

Time series name

Description

extended

The actual, observed values followed by the forecast.

observed

The actual, observed values.

forecast

The forecasted values.

lower

The lower bound of the forecasted confidence interval.

upper

The upper bound of the forecasted confidence interval.

prediction

The in-sample one-step ahead predicted values.

yhat

The in-sample one-step ahead predicted values followed by the forecast.

trend

The trend component of the observed values.

yearly

The yearly seasonal component of the observed values.

weekly

The weekly seasonal component of the observed values
(only applicable for time series with daily frequency).

Examples:

Make a forecast:

Sales_Actual.forecast('prophet')

Make a forecast with a logarithmic model, including confidence interval:

Sales_Actual.log().forecast('prophet', ['extended', 'lower', 'upper'], interval_width=0.9).exp()

Include US holiday effects in the model:

Sales_Actual.forecast('prophet', country='US')

Show the seasonal components (in a contrived example):

Close_Price.forecast('prophet', ['extended', 'trend', 'yearly', 'weekly'])

Make a forecast with multiplicative seasonality effect:

Sales_Actual.forecast('prophet', seasonality_mode='multiplicative')