Transformations

Arithmetic operations

Signals can be added (+), subtracted (-), multiplied (*) or divided (/) using the normal arithmetic operators.

In addition, the following arithmetic operations are supported:

signal.log()

Calculate the natural logarithm of the signal.

signal.exp()

Calculate the exponentiation of the signal.

For each value in the original signal, the result is calculated as the base of the natural logarithm, e ≈ 2.71, raised to the power of the original value.

signal.abs()

Calculate the absolute values of the signal.

signal.sign()

Calculate the sign of the signal values.

For each value in the original signal, the sign is +1 if the original value is positive, -1 if the original value is negative, and 0 if the original value is 0.

signal.clip(lower: float | None = None, upper: float | None = None)

Trim the values at the given threshold(s).

If a lower threshold is given, then all values below it are set to this value.

If an upper threshold is given, then all values above it are set to this value.

Truncation

signal.truncate(before: str | None = None, after: str | None = None)

Truncate the time series before and/or after the given date(s).

If a before date is given, then all values before this date are removed.

If an after date is given, then all values after this date are removed.

signal.drop_last(points: int = 1, /)

Drop the last data point(s) in the time series.

For signals producing multiple time series for each evaluation entity, note that it is the last points rows with a valid value in any column that are removed. That is, it is the same dates that are removed from all the time series.

Absolute change

signal.change(method, limit, days, weeks, months, years)

Express the signal as the absolute change between the signal’s value, and the value it had at a prior date. The prior date is determined by an offset specified in years, months, weeks or days.

The method argument specifies how we find the prior data point to subtract. When the method argument is:

  • 'ffill' we forward fill the prior data a maximum of limit days to match the dates.

  • 'bfill' we backfill the prior data a maximum of limit days to match the dates.

  • 'nearest' we move the prior data forward or backward a maximum of limit days to match the dates. We match with the prior data point which is moved the shortest.

  • None we do not move the past data at all.

When the method argument is 'auto' we select the method and limit based on the given offset (unless they are explicitly set). The method and limit is:

  • 'nearest' and 31 days if the offset is more than or equal to 3 months

  • 'nearest' and 10 days if the offset is between 1 and 3 months

  • 'ffill' and 10 days if the offset is less than 1 month

This is because forward filling makes more sense when calculating the short windowed change in for example the close price, while the lenient matching of the 'nearest' method makes more sense for less frequent data like quarterly data.

Parameters:
  • method – The method to use when matching with past data points, either 'auto' (default), 'nearest', 'ffill' / 'pad', 'bfill' / 'backfill' or None.

  • limit – The limit on how far the matching method can move the data points.

  • days – The number of days between current and prior period, for example days=1.

  • weeks – The number of weeks, for example weeks=3.

  • months – The number of months, for example months=12.

  • years – The number of years, for example years=1.

To get the day-over-day change in closing share prices:

close_price.change(days=1)

Note that forward filling with a limit of 10 days is the default for such short offsets, so the value a regular Monday will be the change against the price on the Friday before.

Relative (percent) change

signal.relative_change(method, limit, days, weeks, months, years)

Express the signal as the relative (percent) change from a prior date. The prior date is determined by an offset in years, months, weeks or days. Note that an output of 1 corresponds to a 100% change from the prior value.

See the above signal.change(...) documentation for a deeper explanation of the method and limit arguments.

Parameters:
  • method – The method to use when matching with past data points, either 'auto' (default), 'nearest', 'ffill' / 'pad', 'bfill' / 'backfill' or None.

  • limit – The limit on how far the matching method can move the data points.

  • days – The number of days between current and prior period, for example days=1.

  • weeks – The number of weeks, for example weeks=3.

  • months – The number of months, for example months=12.

  • years – The number of years, for example years=1.

To get the year-over-year relative change in monthly trading volume data:

monthly(trading_volume).relative_change(years=1)

Similarly, to get the month-over-month relative change in US housing prices:

US_PurchaseOnlyHousePriceIndex_SA_Monthly.relative_change(months=1)
signal.yoy_holidays(country, window, years=1, *, min_periods=1, min_denominator=1e-6)

Calculate the year-over-year ratio for a signal, adjusted for holidays.

Note that this returns a ratio, so a value of 1.1 represents a 10% increase. You should subtract 1 to get the yoy change.

The year-over-year values are calculated by calculating a moving average whose length is given by the argument window, and dividing it by the corresponding moving average from last year.

However, some modifications are applied to the data in order to account for the effects of moveable holidays. That is, if the moving average window this year contains, say, Black Friday, the corresponding moving average window for last year is adjusted so that it, too, contains Black Friday.

Currently, only the holidays for the United States and Norway are supported. The days that are treated as holidays for the United States are all the federal holidays in addition to Black Friday and Cyber Monday.

Parameters:
  • country – The country whose calendar should be used to determine holidays. Currently only US and NO are supported.

  • window – The number of days in the moving window when calculating the moving average. This can be 1 or any integer divisible by 7. The default is 1.

  • years – The number of years between the window used in the numerator and the denominator

  • min_periods – The minimum number of non-NaN values present (in each of the numerator and denominator) required. If there are fewer non-NaN values than this, NaN is produced.

  • min_denominator – Produce NaN if the sum of the values in the denominator is smaller than this.

To get the year-over-year change in a signal with a seven-day moving average window:

signal.yoy_holidays(country='US', window=7) - 1

Aggregate change

Aggregate a high-frequency signal to a given frequency and calculate absolute or relative change.

signal.agg_change(freq, method, *, upsample_daily=False, min_points=1, last_period_days=None, last_period_fraction=None, allow_partial_start=False, allow_partial_end=False, weeks=None, months=None, years=None)
signal.agg_relative_change(freq, method, *, upsample_daily=False, min_points=1, last_period_days=None, last_period_fraction=None, allow_partial_start=False, allow_partial_end=False, weeks=None, months=None, years=None)

By default, sequential change is calculated. Sequential change is change since the last equivalent period, which for a quarterly frequency corresponds to quarter-over-quarter change and for an annual frequency corresponds to year-over-year change. Change may optionally be calculated relative to a period that is a multiple of the frequency. Use the arguments weeks, months and years to control this. For example, to aggregate data to a quarterly frequency and then calculate year-over-year change, use freq='Q' and years=1.

The last period typically has partial data (only data in the beginning of the period) and there are different ways of handling this period:

  • Do not include the period in change calculation. This is the default behavior.

  • Period-to-date calculation. Specifying last_period_days or last_period_fraction enables period-to-date calculation. When doing period-to-date calculation, the change for the last period is calculated by determining the fraction of days for which there is data in the last period and comparing it to the same fraction of days in the comparable period. The min_points constraint does not apply to the last period and its comparable period when doing period-to-date calculation.

  • Normal change calculation. Setting allow_partial_end=True enables normal change calculation for the last period with partial data. The last period is then subject to the min_points constraint as any other period.

If upsample_daily is set to True, then the underlying signal is upsampled to daily frequency before downsampling to the desired frequency. If the method is sum or mean_times_days, then the upsample operation uses divide=True. If the frequency of the underlying time series is not detected, it is assumed that the time series already has daily resolution, and it is not altered before the downsampling.

Note that if the frequency of the original data doesn’t evenly divide the frequency being converted to, it is recommended to set upsample_daily=True. Daily data can be aggregated up to any other frequency, and monthly data can be aggregated up to standard calendar quarters. But weekly data generally cannot be directly converted to any other frequency (monthly/quarterly/yearly) because there will be weeks crossing the boundaries between those other periods. By setting upsample_daily=True, the data is first upsampled to daily resolution, before being downsampled to the desired frequency. Whenever weekly, monthly or quarterly data is converted to fiscal frequencies ('FQ' or 'FQ/FS' or 'FY') it is also recommended to set upsample_daily=True, as any calendar period may cross two fiscal periods.

Parameters:
  • freq – The frequency to aggregate on. Examples include standard Pandas frequencies 'W' (weekly), 'M' (monthly), 'MS' (monthly, aligned to first date of the month), 'Q' (quarterly) as well as the fiscal calendars like 'FQ', 'FQ/FS' or 'FY' (see company_calendar for explanation of all available options).

  • method – Method to use for aggregation. Standard Pandas methods like 'mean', 'median', 'sum' are supported, as well as the special 'mean_times_days' method. See the table under Time aggregations below for available methods, where all except the up-sampling methods may be used.

  • upsample_daily – Whether to upsample the data to daily resolution before downsampling.

  • min_points – Minimum number of points needed in order to calculate an aggregated value for a period. Note that if upsample_daily=True, the data points are counted after the upsampling, so the requirement applies to the number of daily data points.

  • last_period_days – Day threshold for enabling period-to-date calculation for the last period with partial data. If specified, the number of days from the period start to the last non-NaN data point must be equal to or greater than the specified value. It is only allowed to specify one of last_period_days and last_period_fraction.

  • last_period_fraction – Fraction threshold for enabling period-to-date calculation for the last period with partial data. If specified, the fraction of days, calculated as the number of days from the period start to the last non-NaN data point, divided by the total number of days in the period, must be equal to or greater than the specified value. It is only allowed to specify one of last_period_days and last_period_fraction.

  • allow_partial_start – Whether to include a period with partial data at the start of the period in the change calculation. A period has partial data at the start of the period if there are no non-NaN data points on or before the period start date.

  • allow_partial_end – Whether to include a period with partial data at the end of the period in the change calculation. A period has partial data at the end of the period if there are no non-NaN data points on or after the period end date. It is not allowed to set this to True if last_period_days or last_period_fraction is not None.

  • weeks – The number of weeks to determine the change offset.

  • months – The number of months to determine the change offset.

  • years – The number of years to determine the change offset.

To sum up a signal to fiscal quarters and calculate quarter-over-quarter absolute change, use:

signal.agg_change('FQ', 'sum')

To upsample a signal to daily resolution and sum it up to fiscal quarters or semesters and calculate year-over-year relative change, use:

signal.agg_relative_change('FQ/FS', 'sum', years=1, upsample_daily=True)

To sum up a signal to forced fiscal quarters and calculate quarter-over-quarter relative change, where we require at least 70 days with data in each quarter and data for the first 15 days in the last partial quarter, use:

signal.agg_relative_change('FQ+', 'sum', min_points=70, last_period_days=15)

To take the monthly sum and calculate month-over-month relative change, where we require data for at least half of the last month, use:

signal.agg_relative_change('M', 'sum', last_period_fraction=0.5)

To take the weekly sum and calculate week-over-week relative change, allowing a partial start and partial end, use:

signal.agg_relative_change('W', 'sum', allow_partial_start=True, allow_partial_end=True)

Change relative to other signal

signal.change_relative_to(from_signal, periods, pre_extend_months, shift_offset_days, shift_offset_months, shift_offset_tolerance_days, relative_change)

Calculate the change of one signal relative to another signal’s past values.

There are two ways of calculating the change:

  • Using the periods argument: The from_signal is re-indexed to the index of signal and the values are shifted by the given number of periods before the change is calculated.

  • Using the shift_offset_days/shift_offset_months arguments: The index of the from_signal is shifted by the given offset and then re-indexed to the index of signal using method="nearest" to match the nearest value. shift_offset_tolerance_days specifies the maximum number of days a value can be moved when determining nearest value. If there are two data points in the from_signal with the same distance to a data point in signal, the newest one will be used.

Note that for signals that may produce time series with missing values, it is preferable to use shift_offset_days/shift_offset_months. This method takes into account the absolute number of days a value can be shifted, whereas the periods method will blindly shift values.

All arguments except for signal are optional. If no other arguments than signal are given, periods will be used with a default value of 1.

Parameters:
  • from_signal – The signal to calculate the relative change from.

  • periods – The number of periods to shift the from_signal. The default value is None.

  • pre_extend_months – If periods is used, this specifies the number of months to pre-extend the signal evaluation period. The default value is None.

  • shift_offset_days – The number of days to shift from_signal. The default value is None.

  • shift_offset_months – The number of months to shift from_signal. The default value is None.

  • shift_offset_tolerance_days – If shift_offset_days/shift_offset_months is used, this specifies the maximum number of days a value can be moved when determining the nearest value. The default value is 31.

  • relative_change – Whether to calculate the relative change. Set to True to calculate relative change and False to calculate actual change. The default value is True.

Calculate year-over-year relative change in sales:

sales().change_relative_to(sales(), shift_offset_months=12)

Time aggregations

Signals can be aggregated up to monthly or weekly time resolution:

monthly(signal)

Aggregate the given signal to monthly resolution. All the values within each calendar month are summed up (or averaged).

Parameters:
  • signal – The signal to aggregate.

  • how (str) – How to aggregate the signal with each month. Defaults to 'sum', but alternatively 'mean' or 'median' can be provided to yield respectively the arithmetic mean or the median of all the values within each month.

To get the monthly total trading volume:

monthly(trading_volume)

To get the monthly average close price:

monthly(close_price, 'mean')
weekly(signal)

Aggregate the given signal to weekly resolution. All the values within each calendar week (starting on Monday) are summed up (or averaged).

Parameters:
  • signal – The signal to aggregate.

  • how (str) – How to aggregate the signal with each week. Defaults to 'sum'; alternative values are 'mean' and 'median'.

This function is used in the same way as monthly.

For other frequencies and control over how partial data is handled, use the resample function. The syntax is:

signal.resample(freq, method, *, upsample_daily=False, min_points=1, allow_partial_start=True, allow_partial_end=True, ffill=0)

Aggregate the given signal to a resolution of choice. The interval used for for resampling is defined by the freq parameter, and the method defines how to aggregate the signal.

If upsample_daily is set to True, then the underlying signal is upsampled to daily frequency before downsampling to the desired frequency. If the method is sum or mean_times_days, then the upsample operation uses divide=True. If the frequency of the underlying time series is not detected, it is assumed that the time series already has daily resolution, and it is not altered before the downsampling.

Note that if the frequency of the original data doesn’t evenly divide the frequency being converted to, it is recommended to set upsample_daily=True. Daily data can be aggregated up to any other frequency, and monthly data can be aggregated up to standard calendar quarters. But weekly data generally cannot be directly converted to any other frequency (monthly/quarterly/yearly) because there will be weeks crossing the boundaries between those other periods. By setting upsample_daily=True, the data is first upsampled to daily resolution, before being downsampled to the desired frequency. Whenever weekly, monthly or quarterly data is converted to fiscal frequencies ('FQ' or 'FQ/FS' or 'FY') it is also recommended to set upsample_daily=True, as any calendar period may cross two fiscal periods.

Parameters:
  • freq – The frequency to aggregate on. Examples include standard Pandas frequencies 'W' (weekly), 'M' (monthly), 'MS' (monthly, aligned to first date of the month), 'Q' (quarterly) as well as the fiscal calendars like 'FQ' or 'FQ/FS' or 'FY' (see company_calendar for explanation of all available options).

  • method – Method to use for aggregation. Standard Pandas methods like 'mean', 'median', 'sum', 'bfill' (backfill) are supported, as well as the special 'mean_times_days' method. See table below.

  • upsample_daily – Whether to upsample the data to daily resolution before downsampling.

  • min_points – Minimum number of data points required within a time period for it to be included. Note that if upsample_daily=True, the data points are counted after the upsampling, so the requirement applies to the number of daily data points.

  • allow_partial_start – Whether to include the first period if it is only partially covered by the time series.

  • allow_partial_end – Whether to include the last period if it is only partially covered by the time series.

  • ffill – The number of periods to forward fill the result. The forward fill is performed after aggregation. This can be used to fill gaps where there were not any or not enough data points to calculate the aggregate.

Supported methods:

method

Description

mean

The mean of the values.

median

The median of the values.

sum

The sum of the values.

mean_times_days

The mean of the values multiplied by the number of days within the period.
This is useful for handling missing data points in daily time series,
which is a common situation with alternative data time series.
The result can be seen as representing the sum over the period,
where the missing values have been imputed with the mean value.

count

The number of values.

std

The standard deviation of the values.

var

The variance of the values.

sem

The unbiased standard error of the mean.

first

The first value.

last

The last value.

min

The minimum value.

max

The maximum value.

bfill

Backfill the values. This is typically done to upsample a time series
where the dates are aligned to the end date of each period,
like 'W' or 'M' or 'Q'.

ffill

Forward fill the values. This is typically done to upsample a time series
where the dates are aligned to the start date of each period,
like 'MS' or 'QS'.

nearest

Take the value from the nearest date.

interpolate

Interpolate the values between the two nearest dates.

Note that the statistical measures 'std', 'var' and 'sem' normalize by N-1 by default (delta degrees of freedom is 1).

Note that the parameters min_points, allow_partial_start and allow_partial_end only apply to aggregation / downsampling. It is an error to specify any of these parameters with the upsampling methods 'bfill', 'ffill', 'nearest' or 'interpolate'.

To sum up daily data to monthly frequency, use:

signal.resample('M', 'sum')

To sum up weekly data to fiscal quarters, use:

signal.resample('FQ', 'sum', upsample_daily=True)

To sum up daily data to fiscal quarters, where missing data points are imputed with the mean, and we require at least 70 data points in each quarter, use:

signal.resample('FQ', 'mean_times_days', min_points=70)

To upsample quarterly data ('Q') to monthly, where we are certain there are no missing data points, use:

signal.resample('M', 'bfill')
signal.upsample(from_freq='auto', to_freq='D', divide=False)

Upsample the given signal to a higher resolution (daily by default). This is typically necessary as the first step when resampling between two frequencies which do not evenly divide each other.

For example, when resampling from weekly to monthly resolution, there will be weeks which are overlapping the month boundaries. In such a situation, naively doing a sum over all the values with dates within the month, would mean that the value for week 52 of 2022, which runs from Dec 26, 2022 - Jan 1, 2023, would be included in the sum for January 2023, because the week is represented by the last day (Jan 1). The better approach would be to divide the value for that week over the seven days of the week, and include 6/7 in the sum for December 2022, and only 1/7 in the sum for January 2023.

The same situation arises when resampling from either weekly or monthly time series to fiscal quarters.

The upsample logic needs to know which frequency the original data represents. The default setting of 'auto' for the from_freq argument will attempt to detect the frequency of the original data, but then the signal evaluation fails if the frequency cannot be determined from the data. If the frequency is known, it is better to specify an explicit frequency to upsample from, as that is more robust. For instance, if there’s only a single data point for the date 2022-12-31, it is impossible to determine if that data point represents the 2022’Q4 or December 2022 or the week ending 2022-12-31. The 'auto' setting only works for regular calendars such as weekly or monthly data, not for fiscal calendars.

Parameters:
  • from_freq – The frequency of the data being upsampled. Examples include standard Pandas frequencies like 'W' (weekly), 'M' (monthly), 'MS' (monthly, aligned to month start) as well as the fiscal calendars like 'FQ' or 'FQ/FS' or 'FY' (see company_calendar for explanation of all available options).

  • to_freq – The frequency to upsample to, default daily ('D').

  • divide – Whether to divide the value by the number of days in the period. If set to False, then the original value will be used for all dates within the corresponding period. If set to True, the original value will be divided by the number of days in the period (so that e.g. signal.upsample('M').resample('M', 'sum') brings back the original time series). Generally, if the data represents a sum of some sort, then divide should be set to True, whereas if it represents an average, it should be set to False.

To upsample weekly data to daily, use:

signal.upsample('W')

To resample weekly sums to a monthly sum, use:

signal.upsample('W', divide=True).resample('M', 'sum')

To upsample monthly sums to daily numbers, use:

signal.upsample('M', divide=True)

To upsample a quarterly time series to monthly, use:

signal.upsample('Q', 'M')

To resample monthly sums to fiscal quarters, where we also handle partial data with 'mean_times_days', use:

signal.upsample('M', divide=True).resample('FQ', 'mean_times_days')

To resample monthly averages to fiscal quarter averages, use:

signal.upsample('M').resample('FQ', 'mean')

To calendarize fiscal quarterly numbers to standard calendar quarters, use:

signal.upsample('FQ', divide=True).resample('Q', 'sum', min_points=90)

Moving average

signal.moving_average(window, freq=None, min_periods=1)

Calculate the moving average of a signal. This is typically done with noisy data in order to get a cleaner signal. For instance, looking at daily credit card transaction data doesn’t make much sense because of the noise level, but smoothing over say 90 days gives a more informative signal.

Parameters:
  • signal – The signal to calculate.

  • window – The number of calendar days to calculate moving average over (if ‘freq’ is not set), or the number of data points to calculate the moving average if ‘freq’ is set.

  • freq – Leave it with the default setting of “None” to interpret the window as number of days. For e.g. monthly signals, set it to ‘M’ (or ‘MS’) to interpret the ‘window’ argument as number of months instead of number of days. Note that the ‘freq’, if set, must be the same as the frequency of the signal.

  • min_periods – The minimum number of data points to require in order to calculate a value. Defaults to 1, which means that a “moving average” is calculated from the very first data point in the time series, even though it’s just an “average” of one data point (and then the next one is an average of 2 and so forth). To avoid noisy data in the beginning of the time series, increase this setting.

To get the 90-day moving average of close price data, with a minimum of 70 data points required to average over:

close_price.moving_average(90, min_periods=70)

To get the 3 month moving average of US housing price index, with a minimum of 3 data points to average over:

US_PurchaseOnlyHousePriceIndex_SA_Monthly.moving_average(3, 'MS', 3)
signal.smooth(window: int = 7)

Smoothens the signal to remove noise and more easily discern the underlying trends.

param window:

The number of calendar days to calculate moving average over. The moving average will be shifted backwards by floor((window-1)/2) days. Defaults to 7, which is suitable for averaging out weekly seasonality in the signal.

The signal is smoothened with three processing steps:

  1. fill missing values with zeros

  2. calculate the moving average with the given window length

  3. shift the time series to center the moving average in the middle of the window it’s calculated over

To smoothen a daily card spend signal with a 21 day window size:

daily_spend_signal.smooth(21)

Rolling window aggregations

signal.rolling_aggregation(window, operation, freq=None, min_periods=1)

Calculate a rolling window operation in the time direction, this is a generalization of the moving_average operation.

Parameters:
  • window – The number of calendar days to calculate moving average over (if ‘freq’ is not set), or the number of data points to calculate the moving average if ‘freq’ is set. The current point is included in the window.

  • operation – The operation carried out on the window. This can be represented as a function, e.g. np.std, lambda expressions or strings like “mean”, “sum”, “max”, “min”, “std”.

  • freq – Leave it with the default setting of “None” to interpret the window as number of days. For e.g. monthly signals, set it to ‘M’ (or ‘MS’) to interpret the ‘window’ argument as number of months instead of number of days. Note that the ‘freq’, if set, must be the same as the frequency of the signal.

  • min_periods – The minimum number of data points in the window for the transform to return value for each data-point. Typically one will lose some data-points in the beginning of the interval.

Examples

The the largest daily absolute percentage-wise price movement the last four weeks:

close_price.relative_change(days=1).rolling_aggregation(28,  lambda w: np.max(np.abs(w)), freq="D")

The largest reported sales this year, assuming standard quarterly releases:

actual('sales').rolling_aggregation(4,  "max", freq="Q")

Delay (lag)

signal.delay(align, days, weeks, months, years)

Delay (lag) a signal by a specified number of days, weeks, months or years.

Parameters:
  • align – Whether to align the delayed signal to the original (default False)

  • days – Number of days to delay

  • weeks – Number of weeks to delay

  • months – Number of months to delay

  • years – Number of years to delay

Rolling z-score

signal.z_score(num_periods, min_periods=None, delay_periods=1)

Given a stationary time series (signal), calculate a rolling window z-score. The signal is assumed to be stationary and normally distributed.

Parameters:
  • num_periods – The number of time-periods of the signal to include in the estimate. I.e. for a daily signal like “close_price.relative_change(days=1)” num_periods is the number of days.

  • min_periods – The minimum number of actual data-points before estimate is produced. If min_periods is not specified, then min_periods is set equal to num_periods.

  • delay_periods – The number of periods before the estimated model is applied to the current data-point.

Example

Calculate the z-scores of the price movements over the past 90 days:

close_price.relative_change(days=1).z_score(num_periods=90)

Rolling p-value

signal.p_value(num_periods, min_periods=None, delay_periods=1, p_cap=0.0)

Given a stationary time series (signal), calculate rolling p-values. The signal is assumed to be stationary and normally distributed.

Parameters:
  • num_periods – The number of time-periods of the signal to include in the estimate. I.e. for a daily signal like “close_price.relative_change(days=1)” num_periods is the number of days.

  • min_periods – The minimum number of actual data-points before estimate is produced. If min_periods is not specified, then min_periods is set equal to num_periods.

  • delay_periods – The number of periods before the estimated model is applied to the current data-point.

  • p_cap – A lower threshold on the p-values to be returned (lower values are removed)

Example

Calculate the p-values of the price movements over the past 90 days:

close_price.relative_change(days=1).p_value(num_periods=90, min_periods=50)

A simple outlier detector:

close_price.relative_change(days=1).p_value(num_periods=90, min_periods=50, p_cap=0.9999)

Surge

signal.surge(short_period, long_period, how, decay=None)

Calculate the surge of a signal. The surge is calculated as the fraction between a moving average of a short window and a moving average of a longer window. The moving average can be exponentially weighted.

When how is 'ma', the short and long period arguments are the window sizes in number of data points.

When how is 'ewm', the short and long period arguments are the parameters sent to the Pandas ewm function, and which decay method to use is controlled with the decay parameter.

The signal is not resampled before the surge is calculated, so the parameters specifying the window periods specify a number of data points, and not a number of days.

Parameters:
  • short_period – The parameter to use for the (exponentially weighted) moving average in the numerator.

  • long_period – The parameter to use for the (exponentially weighted) moving average in the denominator.

  • how – A string specifying what kind of moving average to use, either ‘ewm’ for an exponentially weighted moving average, or ‘ma’ for a regular moving average.

  • decay – The type of decay parameter to use in the ewm function. It can only be set when the how parameter is ewm, and it can be one of 'com', 'halflife', 'span' and 'alpha'. See the documentation of the Pandas ewm function for further details.

Example

Calculate the surge in close price using an exponentially weighted mean with half-lives of 5 and 20 data points:

close_price.surge(5, 20, 'ewm', 'halflife')

Calculate the surge in transactions using a regular moving average with windows 28 and 91:

TransactionVolume.surge(28, 91, 'ma')

Seasonal adjustment

seasonal_adjust(signal, how=None)

Makes a seasonal adjustment to the given signal. The signal must have quarterly or monthly frequency.

The adjustment can be either multiplicative (meaning that the signal is multiplied by a certain factor for each seasonal period) or additive (meaning that a constant is added to the signal for each period). By default, a multiplicative adjustment is applied if all of the signal values are positive, while an additive adjustment is applied if any of the signal values are zero or negative. If an additive adjustment is desired even for strictly positive values, then this can be specified by providing the extra argument ‘additive’. If ‘multiplicative’ is specified as the method, then the adjustment will give an error if any value is zero or negative.

Parameters:
  • signal – The signal to transform.

  • how – Force the adjustment method to either 'additive' or 'multiplicative'.

To do seasonal adjustment of sales numbers (default to multiplicative):

seasonal_adjust(Sales_Actual)

To force an additive adjustment of the sales numbers:

seasonal_adjust(Sales_Actual, 'additive')

To force multiplicative adjustment of the sales numbers (fails if any value is zero or negative):

seasonal_adjust(Sales_Actual, 'multiplicative')

Momentum

signal.momentum(days, limit=10)

Calculate the “momentum” of a signal, defined as its relative change versus a certain number of days ago. This method is closely related to the relative_change function, but specialized for daily time series, and automatically forward fills the underlying signal to get a smooth momentum signal without gaps.

Parameters:
  • signal – The signal to transform.

  • days – The number of days between current and prior period. For example 365 for YoY or 91 for approximately 3 months.

  • limit – The maximum number of days to forward fill the underlying signal.

To get the 3 month price momentum of the share price:

close_price.momentum(91)

To get the year-over-year change in close price (smoothed with moving average):

close_price.moving_average(90, min_periods=70).momentum(365)

Exponentially weighted mean

signal.ewm(halflife=None, *, span=None)

Calculate the exponentially weighted mean of the signal. This is a wrapper around the pandas ewm method.

Parameters:
  • halflife – The number of data points over which the weight should decay to its half.

  • span – Decay specified in terms of span.

Nan is removed from the timeseries before calling the pandas ewm function. It is therefore recommended to ensure that the data is on a known frequency, without missing values, before performing this operation.

The exponentially weighted mean of the close price:

close_price.ewm(halflife=14)

Normalization

signal.normalize(normalization_period)

Normalize the signal to zero mean and unit variance. Each time series is normalized separately. You have to specify the normalization period, which is the time period over which the mean and the variance of the signal will be estimated. These values will then be used to normalize the signal across any time period, to ensure that the normalized signal is consistent.

Parameters:

normalization_period – The time period over which the mean and variance of the signal are estimated. The period is specified with the start and end dates.

A use case for normalizing data is to get better properties when creating a model. Some models perform better when the input and output variables are normalized:

predict(Airlines_US_AirRevenuePassengerMiles_Monthly.normalize(('2017-01-01', '2018-12-31')))
signal.sector_neutral(level, transform_type='winsorized_robust')

Cross-sectional normalization of the signal, applied separately for each sector and each date. The sectors refer to the FactSet RBICS classification, where the level 1 - 6 must be specified with the level argument.

Note that the signal is normalized across the set of companies that it is evaluated for. This means that if the signal is evaluated in Signal Explorer, the result will depend on which companies are selected to be plotted. If only a single company is selected, then the result will be a flat line with the value 0, because that’s what a single value is normalized to. For sensible results, select at least three companies within the same sector when plotting in Signal Explorer.

The intended usage is for alpha signals. When evaluated in an alpha test or a portfolio strategy, the signal is evaluated across all the companies included in such alpha test / strategy, which means the alpha signal will be sector neutral for that run.

There are different methods available for performing the normalization:

  • ‘standard’: sklearn StandardScaler/Z-score

  • ‘robust’: sklearn RobustScaler

  • ‘winsorized_standard’: ‘standard’ followed by a soft capping

  • ‘winsorized_robust’: ‘robust’ followed by a soft capping

  • ‘uniform’: sklearn QuantileScaler

  • ‘minmax’: sklearn MinMaXScaler (-1,1)

Parameters:
  • level – The level (1-6) of the FactSet RBICS classification to use.

  • transform_type – Defaults to ‘winsorized_robust’.

A use case for normalizing data by sector is to avoid sector biases in alpha tests and portfolio strategies. By making an alpha signal neutral by sector, the overall portfolio will be better balanced across sectors:

transactions_yoy.sector_neutral(level=2)
signal.country_neutral(transform_type='winsorized_robust')

Cross-sectional normalization of the signal, applied separately for each country and each date. Each company is assigned to the country of the exchange where it has its primary listing.

Parameters:

transform_type – Defaults to ‘winsorized_robust’. See sector_neutral above for available options.

A use case for normalizing data by country is to avoid country biases in alpha tests and portfolio strategies. By making an alpha signal neutral by country, the overall portfolio will be better balanced across countries.

signal.group_normalize(group_signal, transform_type='winsorized_robust')

Cross-sectional normalization of the signal, applied separately for each group of companies and each date. A separate signal, group_signal, is used to determine the groups.

The most typical use case would be to group companies by sector (using the sector_revenue() signal as the group_signal). However, for this use case there is the shorthand sector_neutral() method above.

Parameters:
  • group_signal – The signal that determines the groups by which the signal will be normalized.

  • transform_type – Defaults to ‘winsorized_robust’. See sector_neutral above for available options.

A use case for normalizing data by groups is to avoid biases in alpha tests and portfolio strategies. Typical use cases would be to normalize by sectors or by countries.

signal.factor_neutral(tag, *factors, screen_frequency)

Neutralizes the effect of one or more factors on a signal by estimating a linear regression with the main signal as the target variable and the factors as the regressors. The output is the residual of the regression.

The set of companies to estimate the regression over must be specified with the tag argument. The tag can be either a fixed set of companies or a screen (where the set of companies changes over time). The signal can be evaluated for companies that were not part of the estimation. Typically, you would use the same tag as the one that is used in the alpha test or portfolio strategy, so that the alpha signal is neutralized for the same set of companies.

The regression is run separately per day. For each date and each factor, the factor values are taken from the latest date where that factor is available (for any entities). This means that e.g. a monthly factor signal can be used with a daily alpha signal as the main signal. However, no forward filling is performed, so the user is responsible for forward filling the factors if necessary.

A typical use case is to subtract the effect of style factors from an alpha signal.

param tag:

the resource name of the tag or screen that defines the group of companies.

param factors:

one or several factors to neutralize.

param screen_frequency:

the frequency with which to evaluate the screen, if a screen is used to define the group of companies. Defaults to 'M' for monthly evaluation. Alternatives include 'W' for weekly or 'Q' for quarterly update of the screen.

Example

Remove the growth style factor from an alternative data YoY growth signal:

TransactionDataYoY.factor_neutral('tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61', factor_loading_growth)

Remove the size and momentum style factors from an alternative data YoY surge signal, using a screen that is updated quarterly:

TransactionDataYoY.factor_neutral('screens/1265', factor_loading_short_term_momentum, factor_loading_size, screen_frequency='Q')

Cross-sectional correlation

cross_sectional_correlation(signal_a, signal_b, tag, screen_frequency)

Calculate the cross-sectional correlation between two signals. The result is a single time series, where the value for a given date is the correlation between the signal values across a set of entities.

Note that no forward filling is applied to the signals, and the correlation will only be calculated on days where both signals have values. The user can apply forward filling to the input signals as desired before applying this function.

Parameters:
  • signal_a – one of the signals

  • signal_b – the other signal

  • tag – the resource name of the screen or tag that defines the group of entities.

  • screen_frequency – the frequency with which to evaluate the screen, if a screen is used to define the group of entities. Defaults to 'M' for monthly evaluation. Alternatives include 'W' for weekly or 'Q' for quarterly update of the screen.

Example

Calculate correlation between an alternative data signal and the growth style factor:

cross_sectional_correlation(TransactionDataYoY, factor_loading_growth.filled_daily(limit=31), 'tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61')

Elementwise transforms

signal.apply(function)

Apply a function to each element of the signal.

Parameters:

function – The name of the function to apply. A non-exhaustive list of standard function transform is: log (logarithm), exp (exponentiation), sqrt (square root), abs (absolute value), tanh (hyperbolic tangent).

A use case for transforming data is to get better properties when creating a model. In many cases, a model performs better if the data has been transformed with the logarithm before estimating the model. Here is an example on how to transform the signal:

US_CivilianUnemploymentRate_Monthly.apply('log')

It is also possible to do this in a modelling context. Then doing something like this can work:

predict(US_CivilianUnemploymentRate_Monthly.apply('log')).apply('exp')

We apply “exp” in the end to transform the predictions back to the original scale.

Time-axis operations

signal.combine_first(other, extend_only=False)

Update null elements with value from the same location in other. This is a wrapper around the pandas combine_first() method.

Parameters:
  • other – Signal to use for filling null values.

  • extend_only – If True, null values are only replaced after the last non-null value.

Example

Combine FactSet actual sales with FactSet sales estimate:

actual('sales', alignment='afp').combine_first(sales_estimate(before_release=True, alignment='afp'))
signal.reindex_like(index_signal, fill_method='ffill')

Returns the signal re-indexed so that it has the same time-index as index_signal given the method fill_method. This is a wrapper around the pandas reindex method.

Parameters:
  • index_signal – A supplied signal whose time-index is used for the resampling of signal.

  • fill_method – The operation used to align series to index_signal before sampling. Valid values are ‘None’, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’.

Example

The price return on earnings release dates:

close_price.relative_change(days=1).reindex_like(actual('sales',alignment='rd'))
signal.filled_daily(fetch_prior_data=120, fetch_prior_data_from=None, stop_at_last_valid_value=False, fetch_later_data=7, limit=None, allow_forward_fill_for_current_dates=False, *, fill_value=None)

Transforms a signal by changing the frequency to daily and forward filling missing values.

Parameters:
  • fetch_prior_data – the number of days of prior data to retrieve in order to forward fill; the default amount is sufficient for quarterly data (with quarters up to seventeen weeks), as long as there are no missing data

  • fetch_prior_data_from – the start date to use for forward filling. If set, it overrides the ‘fetch_prior_data’ argument.

  • stop_at_last_valid_value – if True values will not be forward filled after the last available non-null value

  • fetch_later_data – the number of days after the eval period to retrieve data for to determine the last available non-null value. Only used if ‘stop_at_last_valid_value’ is True.

  • limit – the maximum number of consecutive null values that are filled. If not set, all null values are filled (assuming there is a non-null value before it). If set, must be set to 1 or higher.

  • allow_forward_fill_for_current_dates – if True, and the difference in days between the current date and the date with the last non-NaN value is less than the given limit, values will be forward filled even if stop_at_last_valid_value is True. When evaluating with a version, the current date is assumed to be the version.

  • fill_value – By default, the previous value is forward filled. If fill_value is specified, then this value will be used instead. The dates filled are exactly the same, and all the other parameters such as ‘limit’ and ‘allow_forward_fill_for_current_dates’ apply as usual.

Example

Forward fill data in a monthly signal up to 35 days to make it a daily signal:

my_monthly_data.filled_daily(limit=35)

Fill in missing values with 0, up to 6 days forwards, but not past the last value in the series:

my_daily_signal.filled_daily(limit=6, fill_value=0, stop_at_last_valid_value=True)
signal.align_to_dates(index_signal, max_forward=None, max_backward=None, pre_extend=None, post_extend=None)

Returns the signal with the values aligned to the dates of the index_signal.

For each value in the signal we find the date in the index_signal which is closest in time and assigns that date to it, provided that it satisfies the movement constraints (given by max_forward and max_backward).

If there are two dates that are equally far away, the value is moved forwards.

If there is no date available within the movement constraints for some value, the value is discarded.

If there are multiple values in the signal that have the same the date as its closest one, the value that is closest in time one is aligned to the date; the other values are discarded. If there is a tie between two dates, the one that would be move forward is used, while the other one is discarded.

Parameters:
  • index_signal – A signal whose time-index is used for the resampling of signal.

  • max_forward – Maximum number of days a data point can be moved forwards. Default is None, which means no limit.

  • max_backward – Maximum number of days a data point can be moved backwards. Default is None, which means no limit.

  • pre_extend – Offset to pre-extend the signal evaluation period with. By default, this offset is set equal to the max_forward constraint if max_forward is not None and 1 year otherwise.

  • post_extend – Offset to post-extend the signal evaluation period with. By default, this offset is set equal to the max_backward constraint if max_backward is not None and 1 year otherwise.

Example

Align the signal ‘my_quarterly_signal’ to fundamental sales, allowing data points to move forwards 10 days and backwards 5 days:

my_quarterly_signal.align_to_dates(fundamental('sales'), max_forward=10, max_backward=5)

Align the signal ‘my_quarterly_signal’ to fundamental sales, allowing data points to move unrestricted:

my_quarterly_signal.align_to_dates(fundamental('sales'), pre_extend=pandas.DateOffset(months=6))
signal.aggregate_over(aggregate_signal, aggregation_method='sum', *, max_window=None, min_data_points=None, include_first_period=False)

Returns the signal with the time index of the aggregate signal, where the values of the signal are aggregated according to the given method.

The values of the aggregate_signal are not used.

Parameters:
  • aggregate_signal – a supplied signal whose time index is used in the returned signal

  • aggregate_method – the method used by the aggregation (e.g. 'mean', 'sum', 'median', 'prod', 'std', 'var')

  • max_window – Maximum number of days to look back when aggregating data. This can be useful if the aggregation signal may have missing data.

  • min_data_points – Minimum number of data points per aggregation date. If there are not enough data points, the aggregated value is set to NaN. This can be useful if the signal has missing data.

  • include_first_period – Whether the first aggregation period should be included. The reason it is useful to exclude first aggregation period, is that it does not have a known starting point.

Example

Aggregating a signal data_signal (which is an arbitrary signal you have access to), taking the sum over the fiscal quarters. (The extend argument ensures that the signal also produces values for the present quarter):

data_signal.aggregate_over(fiscal_calendar(extend=1), max_window=92)

Handling missing values in weighted sums

weighted_sum(signal_1, signal_2, ..., signal_n, weights = [w_1, w_2,...w_3], nan_when_missing, normalize)

This function provides a method for handling weighted sums with possible missing values. When the signals have numerical values for a given time the value of weighted_sum is

w_1*signal_1 + w_2*signal_2 + … + w_n*signal_n.

When some signals are not numbers, the weighted sum is taken over only the signals with numerical values.

Parameters:
  • signal_j – For j=1, 2, …, n, the signals which are combined.

  • weights – When a list of n numerical weights is supplied this is the weights in the sum. When no weights are supplied it is assumed that w_j=1.

  • nan_when_missing – If one of the signals have a missing value (NaN) the value of the sum is set to missing (NaN) if nan_when_missing=True, otherwise the missing values are skipped in the sum.

  • normalize – If normalize=True the weights w_j are normalized so that the active weights sum to 1.

Example

Using last reported EPS as a proxy for missing EPS estimate:

weighted_sum(estimate('eps'), actual('eps',alignment='rd'), weights=[100,1], normalize=True)

Estimate changes

estimate_change(estimate, next_estimate, crossover_month)

Calculates the change in analysts’ estimates, taking account of the fact that the estimates refer to changing fiscal periods.

The estimate argument should be a signal for estimates in a given period such as this year, next year or in two years. The next_estimate argument should be a signal for estimates in the following year. The crossover_month should be a number between 1 and 12 denoting the month in which the estimate rolls over to the next fiscal year. The next_estimate and crossover_month arguments are both optional.

In all months except the crossover month the result is obtained by calculating the change in the estimate signal. In the crossover month the result is obtained by calculating the change relative to the next_estimate signal instead. If the next_estimate signal is not provided, the value is set to 0 in the crossover month.

If the crossover month is not provided, the last month in the fiscal year is used as the crossover month.

Parameters:
  • estimate – a signal producing estimates for a given period

  • next_estimate – a signal producing estimates for the following period

  • crossover_month – the month in which the estimate rolls over to the next fiscal year

Company group normalization

signal.group_transform(transform_type, centering_type, centering_weight_signal, tag, screen_frequency)

The group_transform operation does a cross-sectional normalization of a signal across a set of companies.

Parameters:
  • transform_type – the transform to use, either 'robust', 'winsorized_robust', 'standard', winsorized_standard', 'uniform', 'minmax' or 'identity'. Defaults to 'identity'.

  • centering_type – the centering to use, either 'weighted_mean', 'mean', 'median' or 'none'. Defaults to 'none', which results in a centering given by the transform_type.

  • centering_weight_signal – the signal used as weights when specifying 'weighted_mean' for the centering_type. Must be specified when centering_type='weighted_mean', otherwise ignored.

  • tag – the resource name of the tag or screen that defines the group of companies.

  • screen_frequency – the frequency with which to evaluate the screen, if a screen is used to define the group of companies. Defaults to 'M' for monthly evaluation. Alternatives include 'W' for weekly or 'Q' for quarterly update of the screen.

Transform type

Description

robust

Applies sklearn’s RobustScaler. This transform subtracts the median, and then scales the data according to the quantile range from 25th to 75th percentile.

winsorized_robust

  • First applies sklearn’s RobustScaler, which subtracts the mean and then the data according to the quantile range from 25th to 75th percentile.

  • Then soft-clipping is performed at ±3 standard deviations, by applying the tanh function.
    The number of standard deviations can be customized by specifying stdev_lim.

standard

Applies sklearn’s StandardScaler, which subtracts the mean and then scales to unit variance.

winsorized_standard

  • Optionally, outliers can be removed at the very beginning, by setting the parameter q_remove to the fraction of the data that should be removed at both ends.
    E.g. q_remove=0.01 will remove the first and the last percentiles.
    By default, this step is not applied.

  • Then applies sklearn’s StandardScaler, which subtracts the mean and then scales to unit variance.

  • Finally soft-clipping is performed at ±3 standard deviations, by applying the tanh function.
    The number of standard deviations can be customized by specifying stdev_lim.

uniform

Transforms the data to percentiles using sklearn’s QuantileTransformer.
By default a uniform distribution is produced (evenly spread between 0 and 1).
Alternatively, a normal distribution can be obtained by specifying output_distribution='normal'.

minmax

Scales the data linearly to the range [-1, 1].

identity

No transform, which means that only centering is applied. Rarely used in practice.

Example

When doing a group transform, you must specify which group of companies should be used. This is done in the following way, using the ID of a company tag:

Market_Cap_mUSD.group_transform('winsorized_robust', tag='tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61')

It can also be done using the ID of a company screen. In this case, the set of companies included in the group will be updated periodically based on the criteria of the screen. By default the screen is updated monthly, but this can changed with the screen_frequency parameter, for instance set to quarterly update:

Market_Cap_mUSD.group_transform('winsorized_robust', tag='screens/1265', screen_frequency='Q')

Apply the uniform transform:

Market_Cap_mUSD.group_transform('uniform', tag='tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61')