Models
You can create your own quantitative models on the fly with Exabel’s modelling suite, which fits advanced statistical time series models to historical data.
The models can be used descriptively to understand the relation between different time series.
Another important use case is to do now-casting - for example, given data on a retail company’s web traffic, footfall (through geolocation data) and previous earnings history, what is likely to be the current quarter’s sales number?
To build a model, you go to the Modelling menu in the Exabel UI.
Unobserved Components
Unobserved Components is a classical time series model. It’s a form of regression model with time-varying coefficients and where the time series can have trends and seasonal and cyclical components.
The model itself is explained in detail in the following paper: The Unobservable Components Model by Prof. Tom Fomby.
Exabel’s implementation relies on StatsModels. Here is a notebook with an explanation of the model and an example
SARIMAX
SARIMAX is a classical time series model. It’s a form of regression model with autoregressive terms, meaning that the predictions at one time step depend on the values at previous time steps, and optionally seasonal components.
‘SARIMAX’ is an acronym for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors.
Exabel’s implementation relies on StatsModels. Here is a notebook with explanation of the model and an example.
Elastic Net
The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.
Huber Regression
The Huber Regression is a linear regression model that is robust to outliers.
Decision Tree
Decision Trees are a non-parametric supervised learning method, which aims to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
Gradient Boosting
Gradient Boosting builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions.
Neural Network
The Neural Network method is a multi-layer perceptron regressor. This model optimizes the squared-loss using LBFGS or stochastic gradient descent.
ARD Regression
The ARD (Automatic Relevance Determination) regression is a Bayesian linear regression method where the weights of the regression model are assumed to be in Gaussian distributions.
Extreme Gradient Boosting
Extreme gradient boosting (XGBoost) is a decision-tree based ensemble algorithm that uses a gradient boosting framework.
Rolling OLS regression with extractions of various stats
- target_signal.regress_on(signal_1, signal_2, ..., signal_n, window_length_days, include_intercept, cov_type, estimate, partial_extract_idx)
Applies OLS across a fixed window of observations and then rolls (moves or slides) the window across the data set, returning data according to the parameter estimate. For each window of length window_length_days days the model
target_signal = beta_1 signal_1 + beta_2 signal_2 + … + beta_n signal_n + alpha + epsilon
is fit.
- Parameters
signal_j – For j=1, 2, …, n, the signals on which target_signal is regressed
window_length_days – The number of days with data to include in each regression.
include_intercept – Whether or not to include an intercept (alpha) in the model.
cov_type – Type of covariance estimate used in the regression.
estimate –
Specifies which values to return:
alpha, Constant term.
beta, [beta_1, beta_2, …,beta_n], the regression weights/coefficients.
alpha+beta, [beta_1, beta_2, …,beta_n, alpha], the regression weights together with the constant term.
resid, The residuals, target_signa- sum_j beta_j signal_j - alpha.
centered_tss, Centered total sum of squares.
ess, Explained sum of squares.
mse_model, Mean squared error the model.
mse_resid, Mean squared error of the residuals.
mse_total, Total mean squared error
rsquared, R-squared of the model
rsquared_adj, Adjusted R-squared of the model
ssr, Sum of squared (whitened) residuals.
tvalues, Return the t-statistic for the parameter estimates.
aic, Akaike’s information criteria.
bic, Bayes’ information criteria.
bse, The standard errors of the parameter estimates.
- beta-times-sigma, The BARRA loading beta-times-sigma.
sqrt(beta_+*sigma) sigma being the residuals standard deviation. beta_+ = max(0, beta)
More docmentation and also here.
Examples
Get the beta against the Brent Oil futures front contract using half a year of data:
close_price.relative_change(days=7).regress_on(Oil_BrentCrudeFuturesCC1_Daily.relative_change(days=7), estimate='beta', include_intercept=True, window_length_days=182)