Forecasting
Exabel offers multiple univariate forecast models directly available in the DSL, from the statsmodels Python library and Facebook’s Prophet library. Forecasts are useful if you want to extend a time series into the future, e.g. for use in models.
- signal.forecast(model, values, *, estimation_start, estimation_end, periods, **kwargs)
Fit a model to the time series and obtain forecasts.
- Parameters:
model (str) – the type of model to build, one of ‘auto’ (default), ‘theta’, ‘holt_winters’, ‘sarima’, ‘unobserved’, ‘prophet’.
values (Sequence[str]) – an array of output series to be returned; the options depend on the selected model, but will in all cases include ‘extended’ (the default), ‘forecast’, and ‘observed’, and will for most model types include ‘prediction’ for in-sample one-step ahead predictions and ‘lower’ and ‘upper’ confidence bands. See the description below for each model type regarding exactly which options are available. If only a single output series is desired, it can be specified as ‘forecast’ instead of [‘forecast’].
estimation_start (str) – the start date (YYYY-MM-DD) from which to load the time series for fitting; defaults to
'2000-01-01'
.estimation_end (str) – the date (YYYY-MM-DD) at which to cut the time series for fitting; data up to this date are in-sample, while out-of-sample forecasts are made after this date. By default, there is no end date, meaning all available data is used for fitting. The purpose of specifying an end date would be to see what forecasts would have been produced at a given point in time, in order to compare these forecasts with the actuals.
periods (int) – the maximum number of time periods to provide a forecast for beyond the observed time series. By default, there is no limit, which means that as many forecast data points are produced as needed to fill the requested time period.
kwargs – parameters to be passed to the underlying model.
For each specific model type, the Exabel implementation specifies certain model parameters by default (listed below under each model type), while the rest have default parameters that follow the library implementation (statsmodels or Prophet). All parameters can be overridden by the user, by supplying them as keyword arguments in the DSL expression.
General advice
All of these statistical methods are by default linear, meaning that they assume growth trends are linear with time and seasonality effects are additive. However, for typical financial KPIs such as revenue, it is more natural to assume that growth trends are exponential, meaning that a company’s revenue grows by e.g. 10% per year rather than projecting that it will grow by a constant such as e.g. $100M per year. Similarly, it is more natural to assume that the seasonality trends are multiplicative, meaning that revenue in Q4 is modelled to be e.g. 20% higher than in Q3, rather than modelled to be $50M higher in Q4.
Such time series can be modelled with exponential growth and multiplicative seasonality by taking the logarithm first, forecasting the logarithmic values, and finally exponentiating to get back the original values. As an example:
Sales_Actual.log().forecast('holt_winters').exp()
Note that some of these methods, Theta and Prophet, offer to possibility to model with multiplicative
seasonality. When doing the log transform first, the underlying model should not use multiplicative
seasonality. The Prophet model is using additive seasonality by default. The Theta model by default
performs a test to determine whether to use additive or multiplicative seasonality, but this can be
turned off by setting method='additive'
:
Sales_Actual.log().forecast('theta', method='additive').exp()
The models do not currently work well on time series with missing data - this is on our roadmap as an area for improvement.
Auto
The “auto” model is intended to give a reasonable forecast in most situations. It is currently the same as a Theta model, as the Theta model was generally the most accurate forecasting method evaluated in the M-3 Makridakis Competition. However, the specific type of modelling technique used is subject to change if Exabel finds an even better model.
These time series can be retrieved for the auto model:
Time series name |
Description |
---|---|
extended |
The actual, observed values followed by the forecast. |
observed |
The actual, observed values. |
forecast |
The forecasted values. |
lower |
The lower bound of the forecasted confidence interval. |
upper |
The upper bound of the forecasted confidence interval. |
No modelling parameters can be specified for the auto model.
Examples:
Make a forecast:
Sales_Actual.forecast()
Make a forecast with confidence interval:
Sales_Actual.forecast(values=['extended', 'lower', 'upper'])
Theta
The Theta model is a simple forecasting method that combines a linear time trend with a Simple Exponential Smoother. The original publication is The theta model: A decomposition approach to forecasting by Assimakopoulos & Nikolopoulos (2000).
The Makridakis Competitions are a series of open competitions to evaluate and compare the accuracy of different time series forecasting methods. In the M-3 competition held in year 2000, the Theta model was generally the most accurate forecasting method evaluated, and the authors found that «Theta is performing very well for almost all types of data».
Exabel’s implementation relies on Statsmodels. Here is a notebook with an explanation of the model and examples. The available parameters are described in detail on the ThetaModel webpage.
The Exabel implementation specifies the following parameters by default:
Parameter |
Value |
---|---|
period |
Depending on the frequency of the time series: |
These time series can be retrieved:
Time series name |
Description |
---|---|
extended |
The actual, observed values followed by the forecast. |
observed |
The actual, observed values. |
forecast |
The forecasted values. |
lower |
The lower bound of the forecasted confidence interval. |
upper |
The upper bound of the forecasted confidence interval. |
trend |
The trend component of the forecast. |
seasonal |
The seasonal component of the forecast. |
ses |
The simple exponential smoothing. |
Examples:
Make a forecast:
Sales_Actual.forecast('theta')
Make a forecast with a logarithmic model:
Sales_Actual.log().forecast('theta', method='additive').exp()
Make a forecast with confidence interval:
Sales_Actual.log().forecast('theta', ['extended', 'lower', 'upper'], method='additive').exp()
Show components of the forecast:
Sales_Actual.log().forecast('theta', ['forecast', 'trend', 'seasonal', 'ses'], method='additive').exp()
Make a forecast with the data up until 2018-12-31, to compare with the actual numbers in the following years:
Sales_Actual.log().forecast('theta', ['extended', 'lower', 'upper'], estimation_end='2018-12-31', method='additive').exp()
Make a forecast with a logarithmic model, providing exactly one forecast data point. This can be useful for dashboards or alpha signals, if you want to make sure only to get the forecast for the coming quarter:
Sales_Actual.log().forecast('theta', 'forecast', method='additive', periods=1).exp()
Holt-Winters
Holt’s Winters Seasonal Exponential Smoothing is simple exponential smoothing with the addition of trend and seasonality. The method is described in the book Forecasting: Principles and Practice by Hyndman and Athanasopoulos (2014).
Exabel’s implementation relies on Statsmodels. Here is a notebook with an explanation of the model and examples. The available parameters are described in detail on the ExponentialSmoothing webpage.
The Exabel implementation specifies the following parameters by default:
Parameter |
Value |
---|---|
trend |
‘additive’ |
seasonal |
‘additive’ |
seasonal_periods |
Depending on the frequency of the time series: |
These time series can be retrieved:
Time series name |
Description |
---|---|
extended |
The actual, observed values followed by the forecast. |
observed |
The actual, observed values. |
forecast |
The forecasted values. |
prediction |
The in-sample one-step ahead predicted values. |
residual |
The residuals of the in-sample one-step ahead predicted values. |
level |
The level component of the observed values. |
trend |
The trend component of the observed values. |
seasonal |
The seasonal component of the observed values. |
Examples:
Make a forecast:
Sales_Actual.forecast('holt_winters')
Make a forecast with a logarithmic model:
Sales_Actual.log().forecast('holt_winters').exp()
Show components of the observed values:
Sales_Actual.log().forecast('holt_winters', ['observed', 'level', 'trend', 'seasonal']).exp()
Unobserved Components
Unobserved Components is a classical time series model that breaks the time series into a trend component, a seasonal component, and a cyclical component. The model itself is explained in detail in the paper The Unobservable Components Model by Prof. Tom Fomby.
Exabel’s implementation relies on Statsmodels. Here is a notebook with an explanation of the model and an example. The available parameters are described in detail on the UnobservedComponents webpage.
The Exabel implementation specifies the following parameters by default:
Parameter |
Value |
---|---|
level |
True |
trend |
True |
irregular |
True |
stochastic_trend |
True |
seasonal |
Depending on the frequency of the time series: |
cycle |
False |
alpha |
0.05 |
These time series can be retrieved:
Time series name |
Description |
---|---|
extended |
The actual, observed values followed by the forecast. |
observed |
The actual, observed values. |
forecast |
The forecasted values. |
lower |
The lower bound of the forecasted confidence interval. |
upper |
The upper bound of the forecasted confidence interval. |
prediction |
The in-sample one-step ahead predicted values. |
level.filtered |
The level component of the observed values (filtered). |
level.smoothed |
The level component of the observed values (smoothed). |
trend.filtered |
The trend component of the observed values (filtered). |
trend.smoothed |
The trend component of the observed values (smoothed). |
seasonal.filtered |
The seasonal component of the observed values (filtered). |
seasonal.smoothed |
The seasonal component of the observed values (smoothed). |
freq_seasonal.filtered |
The harmonic seasonal component of the observed values (filtered). |
freq_seasonal.smoothed |
The harmonic seasonal component of the observed values (smoothed). |
cycle.filtered |
The cyclical component of the observed values (filtered). |
cycle.smoothed |
The cyclical component of the observed values (smoothed). |
Note that each of the components (level, trend, seasonal, cycle) are only available if that
component was turned on in the model. By default, level, trend and seasonal components are included.
To include the cyclical component, set cycle=True
.
The width of the confidence band (as shown by the ‘lower’ and ‘upper’ time series) is controlled
by the alpha
parameter, which defaults to 0.05 (thus the width of the confidence band is 95%).
It is possible to specify the sigma parameters of the model with the parameters sigma_trend
,
sigma_level
, sigma_irregular
and sigma_seasonal
. These parameters can only be set
if the corresponding component is included in the model and is set as stochastic (otherwise,
an error message is produced). By default, these parameters are not specified, which means
that they are fitted to the data.
The first few data points from the in-sample time series are highly unreliable as the model is “burned in”.
Therefore, these data points are removed from the result. By default, the number of data points removed
is what statsmodels specifies as loglikelihood_burn. This can be overridden by specifying the burn
parameter, e.g. burn=10
would remove the first 10 data points from the in-sample estimated time series.
It is possible to add harmonic seasonality terms to the model with the parameter freq_seasonal
(see the statsmodels documentation for how it works). This Exabel function has added a shorthand syntax
for doing this with the new parameter harmonics
.
For a daily time series, setting e.g. harmonics=5
is the same as specifying
freq_seasonal=[{"harmonics": 5, "period": 365}]
,
whereas for a weekly time series it would correspond to
freq_seasonal=[{"harmonics": 5, "period": 52}]
.
Note that for weekly, monthly and quarterly time series, the seasonal
parameter is not set
by default when harmonics
is specified (the harmonic seasonality takes its place),
whereas for daily timeseries the default seasonal=7
is kept to model the weekly seasonality.
Examples:
Make a forecast:
Sales_Actual.forecast('unobserved')
Make a forecast with a logarithmic model:
Sales_Actual.log().forecast('unobserved').exp()
Make a forecast with a 99% confidence interval:
Sales_Actual.forecast('unobserved', ['extended', 'lower', 'upper'], alpha=0.01)
Show components of the observed values using Kalman filtering, meaning that output at time T refers to estimates conditional on observations up through time T:
Sales_Actual.forecast('unobserved', ['observed', 'level.filtered', 'trend.filtered', 'seasonal.filtered'])
Show components of the observed values using Kalman smoothing, meaning that output at time T refers to estimates conditional on the entire set of observations in the dataset (both before and after time T):
Sales_Actual.forecast('unobserved', ['observed', 'level.smoothed', 'trend.smoothed', 'seasonal.smoothed'])
Specify the sigma_trend parameter:
Sales_Actual.log().forecast('unobserved', sigma_trend=1e-5).exp()
Including the cyclical component:
Sales_Actual.forecast('unobserved', ['extended', 'level.filtered', 'seasonal.filtered', 'cycle.filtered'], cycle=True)
SARIMA
SARIMA is a classical time series model. It’s a form of regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.
Exabel’s implementation relies on statsmodels. Here is a notebook with explanation of the model and examples.
The available parameters are described in detail on the SARIMAX webpage.
‘SARIMAX’ is an acronym for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors. Note that the .forecast() method does not accept additional signals as inputs, so there are no exogenous regressors, and thus the correct name for this model is ‘SARIMA’.
The Exabel implementation specifies the following parameters by default:
Parameter |
Value |
---|---|
order |
(1, 1, 0) |
seasonal_order |
Depending on the frequency of the time series: |
These time series can be retrieved:
Time series name |
Description |
---|---|
extended |
The actual, observed values followed by the forecast. |
observed |
The actual, observed values. |
forecast |
The forecasted values. |
lower |
The lower bound of the forecasted confidence interval. |
upper |
The upper bound of the forecasted confidence interval. |
prediction |
The in-sample one-step ahead predicted values. |
resid |
The residuals of the in-sample one-step ahead predicted values. |
Examples:
Make a forecast:
Sales_Actual.forecast('sarima')
Make a forecast with a logarithmic model:
Sales_Actual.log().forecast('sarima').exp()
Make a forecast with a logarithmic model and including confidence interval:
Sales_Actual.log().forecast('sarima', ['extended', 'lower', 'upper']).exp()
Prophet
The Prophet library uses an additive model, where a time series is modelled as the sum of a trend
component and yearly and weekly seasonality components. The model also takes holiday effects into
account, if a country code is specified with the country
parameter.
The Prophet library is highly configurable, and its parameters can be passed to the forecast
signal function. You can find these parameters in the Prophet documentation.
The Exabel implementation does not override the default modelling parameters for Prophet.
These time series can be retrieved:
Time series name |
Description |
---|---|
extended |
The actual, observed values followed by the forecast. |
observed |
The actual, observed values. |
forecast |
The forecasted values. |
lower |
The lower bound of the forecasted confidence interval. |
upper |
The upper bound of the forecasted confidence interval. |
prediction |
The in-sample one-step ahead predicted values. |
yhat |
The in-sample one-step ahead predicted values followed by the forecast. |
trend |
The trend component of the observed values. |
yearly |
The yearly seasonal component of the observed values. |
weekly |
The weekly seasonal component of the observed values |
Examples:
Make a forecast:
Sales_Actual.forecast('prophet')
Make a forecast with a logarithmic model, including confidence interval:
Sales_Actual.log().forecast('prophet', ['extended', 'lower', 'upper'], interval_width=0.9).exp()
Include US holiday effects in the model:
Sales_Actual.forecast('prophet', country='US')
Show the seasonal components (in a contrived example):
Close_Price.forecast('prophet', ['extended', 'trend', 'yearly', 'weekly'])
Make a forecast with multiplicative seasonality effect:
Sales_Actual.forecast('prophet', seasonality_mode='multiplicative')