Models
You can create your own quantitative models on the fly with Exabel’s modelling suite, which fits advanced statistical time series models to historical data.
The models can be used descriptively to understand the relation between different time series.
Another important use case is to do now-casting - for example, given data on a retail company’s web traffic, footfall (through geolocation data) and previous earnings history, what is likely to be the current quarter’s sales number?
To build a model, you go to the Modelling menu in the Exabel UI.
Unobserved Components
Unobserved Components is a classical time series model. It’s a form of regression model with time-varying coefficients and where the time series can have trends and seasonal and cyclical components.
The model itself is explained in detail in the following paper: The Unobservable Components Model by Prof. Tom Fomby.
Exabel’s implementation relies on StatsModels. Here is a notebook with an explanation of the model and an example
SARIMAX
SARIMAX is a classical time series model. It’s a form of regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.
‘SARIMAX’ is an acronym for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors.
Exabel’s implementation is based on StatsModels. Here is a notebook with explanation of the model and an example.
Exabel’s implementation extends the StatsModels model by adding Elastic Net regularization. Therefore there are two additional hyperparameters, the alpha and L1 ratio, that control the amount of regularization.
Furthermore, Exabel’s implementation allows specifying that all the coefficients for the exogenous variables (the model inputs) must be positive, which is useful as additional regularization when the inputs are known to correlate positively with the target variable.
Linear Regression
Ordinary least squares linear regression model.
Elastic Net
The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.
Huber Regression
The Huber Regression is a linear regression model that is robust to outliers.
Decision Tree
Decision Trees are a non-parametric supervised learning method, which aims to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
Gradient Boosting
Gradient Boosting builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions.
Neural Network
The Neural Network method is a multi-layer perceptron regressor. This model optimizes the squared-loss using LBFGS or stochastic gradient descent.
ARD Regression
The ARD (Automatic Relevance Determination) regression is a Bayesian linear regression method where the weights of the regression model are assumed to be in Gaussian distributions.
Extreme Gradient Boosting
Extreme gradient boosting (XGBoost) is a decision-tree based ensemble algorithm that uses a gradient boosting framework.
Ratio Prediction
This is a proprietary prediction model developed by Exabel. This type of model is suitable when the input signals are proportional to the target. The prototypical example would be a credit card spend signal used to predict the revenue of a consumer company, where you expect that a 10% change in the credit card spend corresponds to a 10% change in the revenue of the company.
If the input time series is proportional to the target time series, you can use a linear regression,
which would express that target = k * input
. The problem is that the proportionality constant
k
typically will vary over time, and that cannot be captured with a normal linear regression.
When this config option is enabled, the model will rather treat k
as a time-varying ratio, and
build a model for k(t)
, which is then used to predict the target. The model we use for k(t)
is an Unobserved Components model that takes into account the seasonality and level of this ratio
(as the ratio will tend to drift over time, and the input time series may have a different
seasonality pattern than the target time series).
The steps in this calculation are as follows:
calculate the ratio between the input time series and the target time series
model the resulting time series (the ratio) with an Unobserved Components model
predict what the ratio will be for the next quarter to be reported
multiply the predicted ratio with the input signal value for the next quarter to arrive at a prediction for the target signal for the next quarter
If there are multiple input signals in the model configuration, the above process is run for each input signal separately. That produces multiple predictions for the target value. Then the model calculates a weighted average of all the predictions as its final output. The weights for this ensemble are determined by calculating the covariance between the historical prediction errors, and then minimizing the expected error. The weights are restricted to being non-negative and must add up to 1.
Furthermore, this model allows for including a univariate forecast of the target variable (using the Theta model). This prediction is included in the ensemble along with the predictions stemming from each input, and weighted accordingly.
Retrieving model data in signals
Model predictions
- model_predictions(model, run=None, label=None)
Gets updated predictions for a model.
- Parameters:
model – The numeric model id.
run – The run number. If not specified, the active run will be used.
label – The label to fetch in the case of classification models. If not specified, all labels will be returned.
Model backtests
- model_backtests(model, run=None)
Gets results from a model backtest.
- Parameters:
model – The numeric model id.
run – The run number. If not specified, the active run will be used.
Rolling OLS regression with extractions of various stats
- target_signal.regress_on(signal_1, signal_2, ..., signal_n, window_length_days, include_intercept, cov_type, estimate, partial_extract_idx)
Applies OLS across a fixed window of observations and then rolls (moves or slides) the window across the data set, returning data according to the parameter estimate. For each window of length window_length_days days the model
target_signal = beta_1 signal_1 + beta_2 signal_2 + … + beta_n signal_n + alpha + epsilon
is fit.
- Parameters:
signal_j – For j=1, 2, …, n, the signals on which target_signal is regressed
window_length_days – The number of days with data to include in each regression.
include_intercept – Whether or not to include an intercept (alpha) in the model.
cov_type – Type of covariance estimate used in the regression.
estimate –
Specifies which values to return:
alpha, Constant term.
beta, [beta_1, beta_2, …,beta_n], the regression weights/coefficients.
alpha+beta, [beta_1, beta_2, …,beta_n, alpha], the regression weights together with the constant term.
resid, The residuals, target_signa- sum_j beta_j signal_j - alpha.
centered_tss, Centered total sum of squares.
ess, Explained sum of squares.
mse_model, Mean squared error the model.
mse_resid, Mean squared error of the residuals.
mse_total, Total mean squared error
rsquared, R-squared of the model
rsquared_adj, Adjusted R-squared of the model
ssr, Sum of squared (whitened) residuals.
tvalues, Return the t-statistic for the parameter estimates.
aic, Akaike’s information criteria.
bic, Bayes’ information criteria.
bse, The standard errors of the parameter estimates.
- beta-times-sigma, The BARRA loading beta-times-sigma.
sqrt(beta_+*sigma) sigma being the residuals standard deviation. beta_+ = max(0, beta)
More docmentation and also here.
Examples:
Get the beta against the Brent Oil futures front contract using half a year of data:
close_price.relative_change(days=7)\
.regress_on(Oil_BrentCrudeFuturesCC1_Daily.relative_change(days=7),
estimate='beta',
include_intercept=True,
window_length_days=182)