Models

You can create your own quantitative models on the fly with Exabel’s modelling suite, which fits advanced statistical time series models to historical data.

The models can be used descriptively to understand the relation between different time series.

Another important use case is to do now-casting - for example, given data on a retail company’s web traffic, footfall (through geolocation data) and previous earnings history, what is likely to be the current quarter’s sales number?

To build a model, you go to the Modelling menu in the Exabel UI.

Unobserved Components

Unobserved Components is a classical time series model. It’s a form of regression model with time-varying coefficients and where the time series can have trends and seasonal and cyclical components.

The model itself is explained in detail in the following paper: The Unobservable Components Model by Prof. Tom Fomby.

Exabel’s implementation relies on StatsModels. Here is a notebook with an explanation of the model and an example

SARIMAX

SARIMAX is a classical time series model. It’s a form of regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.

‘SARIMAX’ is an acronym for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors.

Exabel’s implementation is based on StatsModels. Here is a notebook with explanation of the model and an example.

Exabel’s implementation extends the StatsModels model by adding Elastic Net regularization. Therefore there are two additional hyperparameters, the alpha and L1 ratio, that control the amount of regularization.

Furthermore, Exabel’s implementation allows specifying that all the coefficients for the exogenous variables (the model inputs) must be positive, which is useful as additional regularization when the inputs are known to correlate positively with the target variable.

Linear Regression

Ordinary least squares linear regression model.

See a detailed specification

Elastic Net

The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.

See a detailed specification

Huber Regression

The Huber Regression is a linear regression model that is robust to outliers.

See a detailed specification

Decision Tree

Decision Trees are a non-parametric supervised learning method, which aims to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

See a detailed specification

Gradient Boosting

Gradient Boosting builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions.

See a detailed specification

Neural Network

The Neural Network method is a multi-layer perceptron regressor. This model optimizes the squared-loss using LBFGS or stochastic gradient descent.

See a detailed specification

ARD Regression

The ARD (Automatic Relevance Determination) regression is a Bayesian linear regression method where the weights of the regression model are assumed to be in Gaussian distributions.

See a detailed specification

Extreme Gradient Boosting

Extreme gradient boosting (XGBoost) is a decision-tree based ensemble algorithm that uses a gradient boosting framework.

See a detailed specification

Ratio Prediction

This is a proprietary prediction model developed by Exabel. This type of model is suitable when the input signals are proportional to the target. The prototypical example would be a credit card spend signal used to predict the revenue of a consumer company, where you expect that a 10% change in the credit card spend corresponds to a 10% change in the revenue of the company.

If the input time series is proportional to the target time series, you can use a linear regression, which would express that target = k * input. The problem is that the proportionality constant k typically will vary over time, and that cannot be captured with a normal linear regression. When this config option is enabled, the model will rather treat k as a time-varying ratio, and build a model for k(t), which is then used to predict the target. The model we use for k(t) is an Unobserved Components model that takes into account the seasonality and level of this ratio (as the ratio will tend to drift over time, and the input time series may have a different seasonality pattern than the target time series).

The steps in this calculation are as follows:

  1. calculate the ratio between the input time series and the target time series

  2. model the resulting time series (the ratio) with an Unobserved Components model

  3. predict what the ratio will be for the next quarter to be reported

  4. multiply the predicted ratio with the input signal value for the next quarter to arrive at a prediction for the target signal for the next quarter

If there are multiple input signals in the model configuration, the above process is run for each input signal separately. That produces multiple predictions for the target value. Then the model calculates a weighted average of all the predictions as its final output. The weights for this ensemble are determined by calculating the covariance between the historical prediction errors, and then minimizing the expected error. The weights are restricted to being non-negative and must add up to 1.

Furthermore, this model allows for including a univariate forecast of the target variable (using the Theta model). This prediction is included in the ensemble along with the predictions stemming from each input, and weighted accordingly.

Retrieving model data in signals

Model predictions

model_predictions(model, run=None, label=None)

Gets updated predictions for a model.

Parameters:
  • model – The numeric model id.

  • run – The run number. If not specified, the active run will be used.

  • label – The label to fetch in the case of classification models. If not specified, all labels will be returned.

Model backtests

model_backtests(model, run=None)

Gets results from a model backtest.

Parameters:
  • model – The numeric model id.

  • run – The run number. If not specified, the active run will be used.

Rolling OLS regression with extractions of various stats

target_signal.regress_on(signal_1, signal_2, ..., signal_n, window_length_days, include_intercept, cov_type, estimate, partial_extract_idx)

Applies OLS across a fixed window of observations and then rolls (moves or slides) the window across the data set, returning data according to the parameter estimate. For each window of length window_length_days days the model

target_signal = beta_1 signal_1 + beta_2 signal_2 + … + beta_n signal_n + alpha + epsilon

is fit.

Parameters:
  • signal_j – For j=1, 2, …, n, the signals on which target_signal is regressed

  • window_length_days – The number of days with data to include in each regression.

  • include_intercept – Whether or not to include an intercept (alpha) in the model.

  • cov_type – Type of covariance estimate used in the regression.

  • estimate

    Specifies which values to return:

    • alpha, Constant term.

    • beta, [beta_1, beta_2, …,beta_n], the regression weights/coefficients.

    • alpha+beta, [beta_1, beta_2, …,beta_n, alpha], the regression weights together with the constant term.

    • resid, The residuals, target_signa- sum_j beta_j signal_j - alpha.

    • centered_tss, Centered total sum of squares.

    • ess, Explained sum of squares.

    • mse_model, Mean squared error the model.

    • mse_resid, Mean squared error of the residuals.

    • mse_total, Total mean squared error

    • rsquared, R-squared of the model

    • rsquared_adj, Adjusted R-squared of the model

    • ssr, Sum of squared (whitened) residuals.

    • tvalues, Return the t-statistic for the parameter estimates.

    • aic, Akaike’s information criteria.

    • bic, Bayes’ information criteria.

    • bse, The standard errors of the parameter estimates.

    • beta-times-sigma, The BARRA loading beta-times-sigma.

      sqrt(beta_+*sigma) sigma being the residuals standard deviation. beta_+ = max(0, beta)

More docmentation and also here.

Examples:

Get the beta against the Brent Oil futures front contract using half a year of data:

close_price.relative_change(days=7)\
  .regress_on(Oil_BrentCrudeFuturesCC1_Daily.relative_change(days=7),
              estimate='beta',
              include_intercept=True,
              window_length_days=182)