Models

You can create your own quantitative models on the fly with Exabel’s modelling suite, which fits advanced statistical time series models to historical data.

The models can be used descriptively to understand the relation between different time series.

Another important use case is to do now-casting - for example, given data on a retail company’s web traffic, footfall (through geolocation data) and previous earnings history, what is likely to be the current quarter’s sales number?

To build a model, you go to the Modelling menu in the Exabel UI.

Unobserved Components

Unobserved Components is a classical time series model. It’s a form of regression model with time-varying coefficients and where the time series can have trends and seasonal and cyclical components.

The model itself is explained in detail in the following paper: The Unobservable Components Model by Prof. Tom Fomby.

Exabel’s implementation relies on StatsModels. Here is a notebook with an explanation of the model and an example

SARIMAX

SARIMAX is a classical time series model. It’s a form of regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.

‘SARIMAX’ is an acronym for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors.

Exabel’s implementation relies on StatsModels. Here is a notebook with explanation of the model and an example.

Linear Regression

Ordinary least squares linear regression model.

See a detailed specification

Elastic Net

The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.

See a detailed specification

Huber Regression

The Huber Regression is a linear regression model that is robust to outliers.

See a detailed specification

Decision Tree

Decision Trees are a non-parametric supervised learning method, which aims to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

See a detailed specification

Gradient Boosting

Gradient Boosting builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions.

See a detailed specification

Neural Network

The Neural Network method is a multi-layer perceptron regressor. This model optimizes the squared-loss using LBFGS or stochastic gradient descent.

See a detailed specification

ARD Regression

The ARD (Automatic Relevance Determination) regression is a Bayesian linear regression method where the weights of the regression model are assumed to be in Gaussian distributions.

See a detailed specification

Extreme Gradient Boosting

Extreme gradient boosting (XGBoost) is a decision-tree based ensemble algorithm that uses a gradient boosting framework.

See a detailed specification

Retrieving model data in signals

Model predictions

model_predictions(model, run=None, label=None)

Gets updated predictions for a model.

Parameters
  • model – The numeric model id.

  • run – The run number. If not specified, the active run will be used.

  • label – The label to fetch in the case of classification models. If not specified, all labels will be returned.

Model backtests

model_backtests(model, run=None)

Gets results from a model backtest.

Parameters
  • model – The numeric model id.

  • run – The run number. If not specified, the active run will be used.

Rolling OLS regression with extractions of various stats

target_signal.regress_on(signal_1, signal_2, ..., signal_n, window_length_days, include_intercept, cov_type, estimate, partial_extract_idx)

Applies OLS across a fixed window of observations and then rolls (moves or slides) the window across the data set, returning data according to the parameter estimate. For each window of length window_length_days days the model

target_signal = beta_1 signal_1 + beta_2 signal_2 + … + beta_n signal_n + alpha + epsilon

is fit.

Parameters
  • signal_j – For j=1, 2, …, n, the signals on which target_signal is regressed

  • window_length_days – The number of days with data to include in each regression.

  • include_intercept – Whether or not to include an intercept (alpha) in the model.

  • cov_type – Type of covariance estimate used in the regression.

  • estimate

    Specifies which values to return:

    • alpha, Constant term.

    • beta, [beta_1, beta_2, …,beta_n], the regression weights/coefficients.

    • alpha+beta, [beta_1, beta_2, …,beta_n, alpha], the regression weights together with the constant term.

    • resid, The residuals, target_signa- sum_j beta_j signal_j - alpha.

    • centered_tss, Centered total sum of squares.

    • ess, Explained sum of squares.

    • mse_model, Mean squared error the model.

    • mse_resid, Mean squared error of the residuals.

    • mse_total, Total mean squared error

    • rsquared, R-squared of the model

    • rsquared_adj, Adjusted R-squared of the model

    • ssr, Sum of squared (whitened) residuals.

    • tvalues, Return the t-statistic for the parameter estimates.

    • aic, Akaike’s information criteria.

    • bic, Bayes’ information criteria.

    • bse, The standard errors of the parameter estimates.

    • beta-times-sigma, The BARRA loading beta-times-sigma.

      sqrt(beta_+*sigma) sigma being the residuals standard deviation. beta_+ = max(0, beta)

More docmentation and also here.

Examples

Get the beta against the Brent Oil futures front contract using half a year of data:

close_price.relative_change(days=7).regress_on(Oil_BrentCrudeFuturesCC1_Daily.relative_change(days=7), estimate='beta', include_intercept=True, window_length_days=182)