Custom Models
You can write and test your own quantitative models on the fly with Exabel’s DSL.
An important use case is to do now-casting - for example, given data on a retail company’s credit card transactions, what is likely to be the current quarter’s sales number?
Ratio modelling
Ratio modelling can be used with predictor time series that correlate highly with the target time series we’re trying to predict. An example would be an alternative data set of credit or debit card transactions, which is expected to correlate directly with the total sales for a retail company.
The core idea is to look at the historical ratio between the predictor and the target time series, which is expected to be fairly smooth given the high correlation, and based on this, predict the ratio for the current quarter. Then, a prediction for the current target value is obtained by dividing the predictor value for the current quarter with the estimated ratio.
If the ‘regression’ parameter is not provided, the ratio is predicted to be the same this quarter as it was in the same quarter a year earlier. That means if the predictor has increased by 10%, it is predicted that the target has also increased by 10%. This is a basic model that is commonly used.
By providing the ‘regression’ parameter, we take into account how the ratio has changed for the previous quarters. If the ratio increased for the previous quarter relative to the same quarter last year, then it is more likely to have increased in the current quarter as well. That is to say, in the basic model there is autocorrelation between the prediction errors, and the regression parameter can be specified to take advantage of that and reduce the prediction errors. The regression parameters are specified as a list with the AR(1) parameter first, then the AR(2) parameter and so on. For example, regression = [0.3, 0.1] means AR(1)=0.3 and AR(2)=0.1, which gives the model:
The model makes adjustments to the data to account for holiday effects. If you want to avoid this, set the country parameter to None.
- ratio_prediction(target, predictor, min_window=50, periods_per_year=4, regression=[], calculate_yoy=False, use_accounting_quarters=True, years=1, country='US', path=None)
Calculate a prediction for the target variable, based on the given predictor variable.
- Parameters:
target (signal) – The target signal to model, such as reported sales numbers.
predictor – A predictor signal, such as card transactions.
min_window – The minimum number of samples required to make a prediction for the current quarter. For daily predictor values, the default of 50 means we need a bit more than half the quarter to contain data in order to make a prediction. In this case, the YoY change for the partial quarter is what’s used to make the prediction.
periods_per_year – Use the default value of 4 for quarterly target values, or 12 for monthly.
regression – If not provided, the prediction is a basic YoY change model. If provided, a regression model is used, as described above.
calculate_yoy – Whether the returned values should be year over year estimates.
use_accounting_quarters – Whether accounting quarters should be used. Only takes effect if
periods_per_year
is 4.years – The number of years between the values in the numberator and denominator.
country – The country which holidays should be used for holiday adjustments, or None for no holiday adjustments. Supported countries: US, NO.
path – A path from the entities which the signal is evaluated for, to the corresponding companies. Only necessary if the signal is evaluated for non-company entities and accounting quarters shuld be used to determine the next date.
Example without regression coefficents, i.e. a basic YoY change model:
ratio_prediction(Sales_Actual, Sales_Index)
Example with regression coefficents:
ratio_prediction(Sales_Actual, Sales_Index, regression=[0.3, 0.1])
Example with path, assuming that companies are connected to brands through the ns.HAS_BRAND relationship:
ratio_prediction(Brand_Signal, Brand_Predictor, path=['ns.HAS_BRAND'])
Combining multiple predictions
Multiple predictions can be combined into one, by making a linear combination where each prediction is given a weight between 0 and 1, for a total weight of 1. The optimal weights are estimated through a constrained regression on the historical predictions.
The syntax for combining predictions is:
- combine_predictions(target, predictions, start_date='2000-01-01', end_date='2039-12-31', minimum_observations=3, score_threshold=25, minimum_first_signal_weight=0)
Combine the given predictions into one.
The time period used to estimate the optimal weights can be specified. By default the window from year 2000 to 2039 is used, which in practice means all available data. If an out-of-sample evaluation is desired, the end date can be set to some date in the past, and then the data after that can be used to evaluate the performance.
- Parameters:
target – The target signal to predict, such as reported sales numbers.
predictions – A list of two or more predictions.
start_date – The first date of the time period used to estimate the optimal weights.
end_date – The last date of the time period used to estimate the optimal weights.
minimum_observations – The minimum number of data points required within the estimation period .
score_threshold – A measure for how much worse any prediction is allowed to perform relative to the best prediction before it is excluded from the model. The metric used is the sum of the squared prediction errrors. The default value of 25 (=5x5) essentially means that if the average prediction error of a prediction is five times higher than the best prediction in the set, then it is automatically excluded, and gets a weight of 0. All predictions with better scores than this are included in the regression, but may still end up with an assigned weight of 0, if that is the weight determined to be optimal.
minimum_first_signal_weight – If the optimal weight assigned to the first prediction in the list is below this threshold, then no prediction is produced at all. The use case for this would be if you are combining a prediction based on alternative data (e.g. card transactions) with consensus estimates, and if there is almost no weight assigned to the alternative data prediction then it essentially means that it doesn’t add any informational value beyond the consensus estimate, in which case we may not want to show the prediction at all (which would just have been the consensus estimate).
Example combining a prediction based on card spending data with the analysts’ consensus estimates:
combine_predictions(Sales_Actual, [Sales_Prediction_Card_Spending, Sales_Estimate_fiscal])
Performance metrics for predictions
Several metrics are provided for evaluating how well the predictions match the target time series. The metrics are correlation, MAE, WAPE and R^{2}, as described below.
- All the metrics can be evaluated on either of:
the time series as they are
the change in the time series versus the previous target value
the YoY change of the time series versus the target value of the same quarter in the previous year
Note that for the latter two transformations, also for predictions we are calculating the relative change from the previous target (actual) value to the current prediction, not the change from the previous prediction to the current prediction.
- correlation(target, prediction, growth_period=None, delta=False, start_date='2000-01-01', min_points=5)
Calculate the correlation between the two time series.
The output is a time series, where the value for a given date is the correlation for all the observations from the start_date up to and including the given date.
- Parameters:
target – The target signal that was predicted, such as reported sales numbers.
prediction – The prediction time series.
growth_period – If specified, transforms both the target and the prediction time series into percentage change versus a previous target value before calculating the correlation. If growth_period=1, the change versus the previous target value is calculated. For quarterly data, if growth_period=4, the change versus the target value for the same quarter in the previous year is calculated.
delta – If set to True, subtracts the previous target value from both the target and the prediction before calculating the correlation. This option cannot be combined with growth_period.
start_date – Predictions on or after this date are used to calculate the correlation.
min_points – The minimum number of data points needed to calculate the correlation.
Example calculating the correlation between the actual YoY growth in sales versus the predicted YoY growth in sales:
correlation(Sales_Actual, Sales_Prediction_Card_Spending, growth_period=4)
- mae(target, prediction, growth_period=None, start_date='2000-01-01', min_points=1)
Calculate the Mean Absolute Error (MAE) of the predictions.
The output is a time series, where the value for a given date is the MAE for all the observations from the start_date up to and including the given date.
- Parameters:
target – The target signal that was predicted, such as reported sales numbers.
prediction – The prediction time series.
growth_period – If specified, transforms both the target and the prediction time series into percentage change versus a previous target value before calculating the MAE. If growth_period=1, the change versus the previous target value is calculated. For quarterly data, if growth_period=4, the change versus the target value for the same quarter in the previous year is calculated.
start_date – Predictions on or after this date are used to calculate the correlation.
min_points – The minimum number of data points needed to calculate the correlation.
Example calculating the MAE of the sales predictions:
mae(Sales_Actual, Sales_Prediction_Card_Spending)
- wape(target, prediction, growth_period=None, start_date='2000-01-01', min_points=1)
Calculate the Weighted Absolute Percentage Error (WAPE) of the predictions. This is the average error divided by the average target value.
The output is a time series, where the value for a given date is the WAPE for all the observations from the start_date up to and including the given date.
- Parameters:
target – The target signal that was predicted, such as reported sales numbers.
prediction – The prediction time series.
growth_period – If specified, transforms both the target and the prediction time series into percentage change versus a previous target value before calculating the WAPE. If growth_period=1, the change versus the previous target value is calculated. For quarterly data, if growth_period=4, the change versus the target value for the same quarter in the previous year is calculated.
start_date – Predictions on or after this date are used to calculate the correlation.
min_points – The minimum number of data points needed to calculate the correlation.
Example calculating the WAPE of the sales predictions:
wape(Sales_Actual, Sales_Prediction_Card_Spending)
Example calculating the WAPE of the sales estimates from 2018 onwards:
wape(Sales_Actual, Sales_Estimate_fiscal, start_date='2018-01-01')
- r2_score(target, prediction, growth_period=None, delta=False, start_date='2000-01-01', min_points=3)
Calculate the R-squared (R^{2}) metric of the predictions. This represents the proportion of the variance of the target that’s explained by the predictions.
The output is a time series, where the value for a given date is the R2 score for all the observations from the start_date up to and including the given date.
- Parameters:
target – The target signal that was predicted, such as reported sales numbers.
prediction – The prediction time series.
growth_period – If specified, transforms both the target and the prediction time series into percentage change versus a previous target value before calculating the R^{2}. If growth_period=1, the change versus the previous target value is calculated. For quarterly data, if growth_period=4, the change versus the target value for the same quarter in the previous year is calculated.
delta – If set to True, subtracts the previous target value from both the target and the prediction before calculating the R^{2}. This is the calculation shown in the Prediction Model UI. This option cannot be combined with growth_period.
start_date – Predictions on or after this date are used to calculate the correlation.
min_points – The minimum number of data points needed to calculate the correlation.
Example calculating the R^{2} of the sales predictions, in terms of their ability to predict the delta versus the previous quarter:
r2_score(Sales_Actual, Sales_Prediction_Card_Spending, delta=True)
Example calculating the R^{2} of the sales predictions, in terms of their ability to predict the growth rate of the sales numbers versus a year ago:
r2_score(Sales_Actual, Sales_Prediction_Card_Spending, growth_period=4)