Transformations
Arithmetic operations
Signals can be added (+
), subtracted (-
), multiplied (*
) or divided (/
) using the normal arithmetic
operators.
In addition, the following arithmetic operations are supported:
- signal.log()
Calculate the natural logarithm of the signal.
- signal.exp()
Calculate the exponentiation of the signal.
For each value in the original signal, the result is calculated as the base of the natural logarithm, e ≈ 2.71, raised to the power of the original value.
- signal.abs()
Calculate the absolute values of the signal.
- signal.sign()
Calculate the sign of the signal values.
For each value in the original signal, the sign is +1 if the original value is positive, -1 if the original value is negative, and 0 if the original value is 0.
- signal.clip(lower: float | None = None, upper: float | None = None)
Trim the values at the given threshold(s).
If a
lower
threshold is given, then all values below it are set to this value.If an
upper
threshold is given, then all values above it are set to this value.
Truncation
- signal.truncate(before: str | None = None, after: str | None = None)
Truncate the time series before and/or after the given date(s).
If a
before
date is given, then all values before this date are removed.If an
after
date is given, then all values after this date are removed.
Absolute change
- signal.change(method, limit, days, weeks, months, years)
Express the signal as the absolute change between the signal’s value, and the value it had at a prior date. The prior date is determined by an offset specified in years, months, weeks or days.
The
method
argument specifies how we find the prior data point to subtract. When themethod
argument is:'ffill'
we forward fill the prior data a maximum oflimit
days to match the dates.'bfill'
we backfill the prior data a maximum oflimit
days to match the dates.'nearest'
we move the prior data forward or backward a maximum oflimit
days to match the dates. We match with the prior data point which is moved the shortest.None
we do not move the past data at all.
When the
method
argument is'auto'
we select the method and limit based on the given offset (unless they are explicitly set). The method and limit is:'nearest'
and 31 days if the offset is more than or equal to 3 months'nearest'
and 10 days if the offset is between 1 and 3 months'ffill'
and 10 days if the offset is less than 1 month
This is because forward filling makes more sense when calculating the short windowed change in for example the close price, while the lenient matching of the
'nearest'
method makes more sense for less frequent data like quarterly data.- Parameters
method – The method to use when matching with past data points, either
'auto'
(default),'nearest'
,'ffill'
/'pad'
,'bfill'
/'backfill'
orNone
.limit – The limit on how far the matching method can move the data points.
days – The number of days between current and prior period, for example
days=1
.weeks – The number of weeks, for example
weeks=3
.months – The number of months, for example
months=12
.years – The number of years, for example
years=1
.
To get the day-over-day change in closing share prices:
close_price.change(days=1)
Note that forward filling with a limit of 10 days is the default for such short offsets, so the value a regular Monday will be the change against the price on the Friday before.
Relative (percent) change
- signal.relative_change(method, limit, days, weeks, months, years)
Express the signal as the relative (percent) change from a prior date. The prior date is determined by an offset in years, months, weeks or days. Note that an output of 1 corresponds to a 100% change from the prior value.
See the above
signal.change(...)
documentation for a deeper explanation of themethod
andlimit
arguments.- Parameters
method – The method to use when matching with past data points, either
'auto'
(default),'nearest'
,'ffill'
/'pad'
,'bfill'
/'backfill'
orNone
.limit – The limit on how far the matching method can move the data points.
days – The number of days between current and prior period, for example
days=1
.weeks – The number of weeks, for example
weeks=3
.months – The number of months, for example
months=12
.years – The number of years, for example
years=1
.
To get the year-over-year relative change in monthly trading volume data:
monthly(trading_volume).relative_change(years=1)
Similarly, to get the month-over-month relative change in US housing prices:
US_PurchaseOnlyHousePriceIndex_SA_Monthly.relative_change(months=1)
- signal.yoy_holidays(country, window, years=1, *, min_periods=1, min_denominator=1e-6)
Calculate the year-over-year ratio for a signal, adjusted for holidays.
Note that this returns a ratio, so a value of 1.1 represents a 10% increase. You should subtract 1 to get the yoy change.
The year-over-year values are calculated by calculating a moving average whose length is given by the argument
window
, and dividing it by the corresponding moving average from last year.However, some modifications are applied to the data in order to account for the effects of moveable holidays. That is, if the moving average window this year contains, say, Black Friday, the corresponding moving average window for last year is adjusted so that it, too, contains Black Friday.
Currently, only the holidays for the United States and Norway are supported. The days that are treated as holidays for the United States are all the federal holidays in addition to Black Friday and Cyber Monday.
- Parameters
country – The country whose calendar should be used to determine holidays. Currently only
US
andNO
are supported.window – The number of days in the moving window when calculating the moving average. This can be 1 or any integer divisible by 7. The default is 1.
years – The number of years between the window used in the numerator and the denominator
min_periods – The minimum number of non-NaN values present (in each of the numerator and denominator) required. If there are fewer non-NaN values than this, NaN is produced.
min_denominator – Produce NaN if the sum of the values in the denominator is smaller than this.
To get the year-over-year change in a signal with a seven-day moving average window:
signal.yoy_holidays(country='US', window=7) - 1
Aggregate change
Aggregate a high-frequency signal to a given frequency and calculate absolute or relative change.
- signal.agg_change(freq, method, *, upsample_daily=False, min_points=1, last_period_days=None, last_period_fraction=None, allow_partial_start=False, allow_partial_end=False, weeks=None, months=None, years=None)
- signal.agg_relative_change(freq, method, *, upsample_daily=False, min_points=1, last_period_days=None, last_period_fraction=None, allow_partial_start=False, allow_partial_end=False, weeks=None, months=None, years=None)
By default, sequential change is calculated. Sequential change is change since the last equivalent period, which for a quarterly frequency corresponds to quarter-over-quarter change and for an annual frequency corresponds to year-over-year change. Change may optionally be calculated relative to a period that is a multiple of the frequency. Use the arguments
weeks
,months
andyears
to control this. For example, to aggregate data to a quarterly frequency and then calculate year-over-year change, usefreq='Q'
andyears=1
.The last period typically has partial data (only data in the beginning of the period) and there are different ways of handling this period:
Do not include the period in change calculation. This is the default behavior.
Period-to-date calculation. Specifying
last_period_days
orlast_period_fraction
enables period-to-date calculation. When doing period-to-date calculation, the change for the last period is calculated by determining the fraction of days for which there is data in the last period and comparing it to the same fraction of days in the comparable period. Themin_points
constraint does not apply to the last period and its comparable period when doing period-to-date calculation.Normal change calculation. Setting
allow_partial_end=True
enables normal change calculation for the last period with partial data. The last period is then subject to themin_points
constraint as any other period.
If upsample_daily is set to True, then the underlying signal is upsampled to daily frequency before downsampling to the desired frequency. If the method is sum or mean_times_days, then the upsample operation uses divide=True. If the frequency of the underlying time series is not detected, it is assumed that the time series already has daily resolution, and it is not altered before the downsampling.
Note that if the frequency of the original data doesn’t evenly divide the frequency being converted to, it is recommended to set
upsample_daily=True
. Daily data can be aggregated up to any other frequency, and monthly data can be aggregated up to standard calendar quarters. But weekly data generally cannot be directly converted to any other frequency (monthly/quarterly/yearly) because there will be weeks crossing the boundaries between those other periods. By settingupsample_daily=True
, the data is first upsampled to daily resolution, before being downsampled to the desired frequency. Whenever weekly, monthly or quarterly data is converted to fiscal frequencies ('FQ'
or'FQ/FS'
or'FY'
) it is also recommended to setupsample_daily=True
, as any calendar period may cross two fiscal periods.- Parameters
freq – The frequency to aggregate on. Examples include standard Pandas frequencies
'W'
(weekly),'M'
(monthly),'MS'
(monthly, aligned to first date of the month),'Q'
(quarterly) as well as the fiscal calendars like'FQ'
,'FQ/FS'
or'FY'
(seecompany_calendar
for explanation of all available options).method – Method to use for aggregation. Standard Pandas methods like
'mean'
,'median'
,'sum'
are supported, as well as the special'mean_times_days'
method. See the table underTime aggregations
below for available methods, where all except the up-sampling methods may be used.upsample_daily – Whether to upsample the data to daily resolution before downsampling.
min_points – Minimum number of points needed in order to calculate an aggregated value for a period. Note that if
upsample_daily=True
, the data points are counted after the upsampling, so the requirement applies to the number of daily data points.last_period_days – Day threshold for enabling period-to-date calculation for the last period with partial data. If specified, the number of days from the period start to the last non-NaN data point must be equal to or greater than the specified value. It is only allowed to specify one of
last_period_days
andlast_period_fraction
.last_period_fraction – Fraction threshold for enabling period-to-date calculation for the last period with partial data. If specified, the fraction of days, calculated as the number of days from the period start to the last non-NaN data point, divided by the total number of days in the period, must be equal to or greater than the specified value. It is only allowed to specify one of
last_period_days
andlast_period_fraction
.allow_partial_start – Whether to include a period with partial data at the start of the period in the change calculation. A period has partial data at the start of the period if there are no non-NaN data points on or before the period start date.
allow_partial_end – Whether to include a period with partial data at the end of the period in the change calculation. A period has partial data at the end of the period if there are no non-NaN data points on or after the period end date. It is not allowed to set this to True if
last_period_days
orlast_period_fraction
is not None.weeks – The number of weeks to determine the change offset.
months – The number of months to determine the change offset.
years – The number of years to determine the change offset.
To sum up a signal to fiscal quarters and calculate quarter-over-quarter absolute change, use:
signal.agg_change('FQ', 'sum')
To upsample a signal to daily resolution and sum it up to fiscal quarters or semesters and calculate year-over-year relative change, use:
signal.agg_relative_change('FQ/FS', 'sum', years=1, upsample_daily=True)
To sum up a signal to forced fiscal quarters and calculate quarter-over-quarter relative change, where we require at least 70 days with data in each quarter and data for the first 15 days in the last partial quarter, use:
signal.agg_relative_change('FQ+', 'sum', min_points=70, last_period_days=15)
To take the monthly sum and calculate month-over-month relative change, where we require data for at least half of the last month, use:
signal.agg_relative_change('M', 'sum', last_period_fraction=0.5)
To take the weekly sum and calculate week-over-week relative change, allowing a partial start and partial end, use:
signal.agg_relative_change('W', 'sum', allow_partial_start=True, allow_partial_end=True)
Change relative to other signal
- signal.change_relative_to(from_signal, periods, pre_extend_months, shift_offset_days, shift_offset_months, shift_offset_tolerance_days, relative_change)
Calculate the change of one signal relative to another signal’s past values.
There are two ways of calculating the change:
Using the
periods
argument: Thefrom_signal
is re-indexed to the index ofsignal
and the values are shifted by the given number of periods before the change is calculated.Using the
shift_offset_days
/shift_offset_months
arguments: The index of thefrom_signal
is shifted by the given offset and then re-indexed to the index ofsignal
usingmethod="nearest"
to match the nearest value.shift_offset_tolerance_days
specifies the maximum number of days a value can be moved when determining nearest value. If there are two data points in thefrom_signal
with the same distance to a data point insignal
, the newest one will be used.
Note that for signals that may produce time series with missing values, it is preferable to use
shift_offset_days
/shift_offset_months
. This method takes into account the absolute number of days a value can be shifted, whereas theperiods
method will blindly shift values.All arguments except for
signal
are optional. If no other arguments thansignal
are given,periods
will be used with a default value of1
.- Parameters
from_signal – The signal to calculate the relative change from.
periods – The number of periods to shift the
from_signal
. The default value isNone
.pre_extend_months – If
periods
is used, this specifies the number of months to pre-extend the signal evaluation period. The default value isNone
.shift_offset_days – The number of days to shift
from_signal
. The default value isNone
.shift_offset_months – The number of months to shift
from_signal
. The default value isNone
.shift_offset_tolerance_days – If
shift_offset_days
/shift_offset_months
is used, this specifies the maximum number of days a value can be moved when determining the nearest value. The default value is31
.relative_change – Whether to calculate the relative change. Set to
True
to calculate relative change andFalse
to calculate actual change. The default value isTrue
.
Calculate year-over-year relative change in sales:
sales().change_relative_to(sales(), shift_offset_months=12)
Time aggregations
Signals can be aggregated up to monthly or weekly time resolution:
- monthly(signal)
Aggregate the given signal to monthly resolution. All the values within each calendar month are summed up (or averaged).
- Parameters
signal – The signal to aggregate.
how (str) – How to aggregate the signal with each month. Defaults to
'sum'
, but alternatively'mean'
or'median'
can be provided to yield respectively the arithmetic mean or the median of all the values within each month.
To get the monthly total trading volume:
monthly(trading_volume)
To get the monthly average close price:
monthly(close_price, 'mean')
- weekly(signal)
Aggregate the given signal to weekly resolution. All the values within each calendar week (starting on Monday) are summed up (or averaged).
- Parameters
signal – The signal to aggregate.
how (str) – How to aggregate the signal with each week. Defaults to
'sum'
; alternative values are'mean'
and'median'
.
This function is used in the same way as
monthly
.
For other frequencies and control over how partial data is handled, use the resample function. The syntax is:
- signal.resample(freq, method, *, upsample_daily=False, min_points=1, allow_partial_start=True, allow_partial_end=True, ffill=0)
Aggregate the given signal to a resolution of choice. The interval used for for resampling is defined by the freq parameter, and the method defines how to aggregate the signal.
If upsample_daily is set to True, then the underlying signal is upsampled to daily frequency before downsampling to the desired frequency. If the method is sum or mean_times_days, then the upsample operation uses divide=True. If the frequency of the underlying time series is not detected, it is assumed that the time series already has daily resolution, and it is not altered before the downsampling.
Note that if the frequency of the original data doesn’t evenly divide the frequency being converted to, it is recommended to set
upsample_daily=True
. Daily data can be aggregated up to any other frequency, and monthly data can be aggregated up to standard calendar quarters. But weekly data generally cannot be directly converted to any other frequency (monthly/quarterly/yearly) because there will be weeks crossing the boundaries between those other periods. By settingupsample_daily=True
, the data is first upsampled to daily resolution, before being downsampled to the desired frequency. Whenever weekly, monthly or quarterly data is converted to fiscal frequencies ('FQ'
or'FQ/FS'
or'FY'
) it is also recommended to setupsample_daily=True
, as any calendar period may cross two fiscal periods.- Parameters
freq – The frequency to aggregate on. Examples include standard Pandas frequencies
'W'
(weekly),'M'
(monthly),'MS'
(monthly, aligned to first date of the month),'Q'
(quarterly) as well as the fiscal calendars like'FQ'
or'FQ/FS'
or'FY'
(seecompany_calendar
for explanation of all available options).method – Method to use for aggregation. Standard Pandas methods like
'mean'
,'median'
,'sum'
,'bfill'
(backfill) are supported, as well as the special'mean_times_days'
method. See table below.upsample_daily – Whether to upsample the data to daily resolution before downsampling.
min_points – Minimum number of data points required within a time period for it to be included. Note that if
upsample_daily=True
, the data points are counted after the upsampling, so the requirement applies to the number of daily data points.allow_partial_start – Whether to include the first period if it is only partially covered by the time series.
allow_partial_end – Whether to include the last period if it is only partially covered by the time series.
ffill – The number of periods to forward fill the result. The forward fill is performed after aggregation. This can be used to fill gaps where there were not any or not enough data points to calculate the aggregate.
Supported methods:
method |
Description |
---|---|
mean |
The mean of the values. |
median |
The median of the values. |
sum |
The sum of the values. |
mean_times_days |
The mean of the values multiplied by the number of days within the period. |
count |
The number of values. |
std |
The standard deviation of the values. |
var |
The variance of the values. |
sem |
The unbiased standard error of the mean. |
first |
The first value. |
last |
The last value. |
min |
The minimum value. |
max |
The maximum value. |
bfill |
Backfill the values. This is typically done to upsample a time series |
ffill |
Forward fill the values. This is typically done to upsample a time series |
nearest |
Take the value from the nearest date. |
interpolate |
Interpolate the values between the two nearest dates. |
Note that the statistical measures 'std'
, 'var'
and 'sem'
normalize by N-1 by default (delta degrees of freedom is 1).
Note that the parameters min_points
, allow_partial_start
and allow_partial_end
only apply to aggregation / downsampling.
It is an error to specify any of these parameters with the upsampling methods 'bfill'
, 'ffill'
, 'nearest'
or 'interpolate'
.
To sum up daily data to monthly frequency, use:
signal.resample('M', 'sum')To sum up weekly data to fiscal quarters, use:
signal.resample('FQ', 'sum', upsample_daily=True)To sum up daily data to fiscal quarters, where missing data points are imputed with the mean, and we require at least 70 data points in each quarter, use:
signal.resample('FQ', 'mean_times_days', min_points=70)To upsample quarterly data (
'Q'
) to monthly, where we are certain there are no missing data points, use:signal.resample('M', 'bfill')
- signal.upsample(from_freq='auto', to_freq='D', divide=False)
Upsample the given signal to a higher resolution (daily by default). This is typically necessary as the first step when resampling between two frequencies which do not evenly divide each other.
For example, when resampling from weekly to monthly resolution, there will be weeks which are overlapping the month boundaries. In such a situation, naively doing a sum over all the values with dates within the month, would mean that the value for week 52 of 2022, which runs from Dec 26, 2022 - Jan 1, 2023, would be included in the sum for January 2023, because the week is represented by the last day (Jan 1). The better approach would be to divide the value for that week over the seven days of the week, and include 6/7 in the sum for December 2022, and only 1/7 in the sum for January 2023.
The same situation arises when resampling from either weekly or monthly time series to fiscal quarters.
The upsample logic needs to know which frequency the original data represents. The default setting of
'auto'
for the from_freq argument will attempt to detect the frequency of the original data, but then the signal evaluation fails if the frequency cannot be determined from the data. If the frequency is known, it is better to specify an explicit frequency to upsample from, as that is more robust. For instance, if there’s only a single data point for the date 2022-12-31, it is impossible to determine if that data point represents the 2022’Q4 or December 2022 or the week ending 2022-12-31. The'auto'
setting only works for regular calendars such as weekly or monthly data, not for fiscal calendars.- Parameters
from_freq – The frequency of the data being upsampled. Examples include standard Pandas frequencies like
'W'
(weekly),'M'
(monthly),'MS'
(monthly, aligned to month start) as well as the fiscal calendars like'FQ'
or'FQ/FS'
or'FY'
(seecompany_calendar
for explanation of all available options).to_freq – The frequency to upsample to, default daily (
'D'
).divide – Whether to divide the value by the number of days in the period. If set to
False
, then the original value will be used for all dates within the corresponding period. If set toTrue
, the original value will be divided by the number of days in the period (so that e.g.signal.upsample('M').resample('M', 'sum')
brings back the original time series). Generally, if the data represents a sum of some sort, thendivide
should be set toTrue
, whereas if it represents an average, it should be set toFalse
.
To upsample weekly data to daily, use:
signal.upsample('W')
To resample weekly sums to a monthly sum, use:
signal.upsample('W', divide=True).resample('M', 'sum')
To upsample monthly sums to daily numbers, use:
signal.upsample('M', divide=True)
To upsample a quarterly time series to monthly, use:
signal.upsample('Q', 'M')
To resample monthly sums to fiscal quarters, where we also handle partial data with
'mean_times_days'
, use:signal.upsample('M', divide=True).resample('FQ', 'mean_times_days')
To resample monthly averages to fiscal quarter averages, use:
signal.upsample('M').resample('FQ', 'mean')
To calendarize fiscal quarterly numbers to standard calendar quarters, use:
signal.upsample('FQ', divide=True).resample('Q', 'sum', min_points=90)
Moving average
- signal.moving_average(window, freq=None, min_periods=1)
Calculate the moving average of a signal. This is typically done with noisy data in order to get a cleaner signal. For instance, looking at daily credit card transaction data doesn’t make much sense because of the noise level, but smoothing over say 90 days gives a more informative signal.
- Parameters
signal – The signal to calculate.
window – The number of calendar days to calculate moving average over (if ‘freq’ is not set), or the number of data points to calculate the moving average if ‘freq’ is set.
freq – Leave it with the default setting of “None” to interpret the window as number of days. For e.g. monthly signals, set it to ‘M’ (or ‘MS’) to interpret the ‘window’ argument as number of months instead of number of days. Note that the ‘freq’, if set, must be the same as the frequency of the signal.
min_periods – The minimum number of data points to require in order to calculate a value. Defaults to 1, which means that a “moving average” is calculated from the very first data point in the time series, even though it’s just an “average” of one data point (and then the next one is an average of 2 and so forth). To avoid noisy data in the beginning of the time series, increase this setting.
To get the 90-day moving average of close price data, with a minimum of 70 data points required to average over:
close_price.moving_average(90, min_periods=70)
To get the 3 month moving average of US housing price index, with a minimum of 3 data points to average over:
US_PurchaseOnlyHousePriceIndex_SA_Monthly.moving_average(3, 'MS', 3)
Rolling window aggregations
- signal.rolling_aggregation(window, operation, freq=None, min_periods=1)
Calculate a rolling window operation in the time direction, this is a generalization of the moving_average operation.
- Parameters
window – The number of calendar days to calculate moving average over (if ‘freq’ is not set), or the number of data points to calculate the moving average if ‘freq’ is set. The current point is included in the window.
operation – The operation carried out on the window. This can be represented as a function, e.g. np.std, lambda expressions or strings like “mean”, “sum”, “max”, “min”, “std”.
freq – Leave it with the default setting of “None” to interpret the window as number of days. For e.g. monthly signals, set it to ‘M’ (or ‘MS’) to interpret the ‘window’ argument as number of months instead of number of days. Note that the ‘freq’, if set, must be the same as the frequency of the signal.
min_periods – The minimum number of data points in the window for the transform to return value for each data-point. Typically one will lose some data-points in the beginning of the interval.
Examples
The the largest daily absolute percentage-wise price movement the last four weeks:
close_price.relative_change(days=1).rolling_aggregation(28, lambda w: np.max(np.abs(w)), freq="D")The largest reported sales this year, assuming standard quarterly releases:
actual('sales').rolling_aggregation(4, "max", freq="Q")
Delay (lag)
- signal.delay(align, days, weeks, months, years)
Delay (lag) a signal by a specified number of days, weeks, months or years.
- Parameters
align – Whether to align the delayed signal to the original (default False)
days – Number of days to delay
weeks – Number of weeks to delay
months – Number of months to delay
years – Number of years to delay
Rolling z-score
- signal.z_score(num_periods, min_periods=None, delay_periods=1)
Given a stationary time series (signal), calculate a rolling window z-score. The signal is assumed to be stationary and normally distributed.
- Parameters
num_periods – The number of time-periods of the signal to include in the estimate. I.e. for a daily signal like “close_price.relative_change(days=1)” num_periods is the number of days.
min_periods – The minimum number of actual data-points before estimate is produced. If min_periods is not specified, then min_periods is set equal to num_periods.
delay_periods – The number of periods before the estimated model is applied to the current data-point.
Example
Calculate the z-scores of the price movements over the past 90 days:
close_price.relative_change(days=1).z_score(num_periods=90)
Rolling p-value
- signal.p_value(num_periods, min_periods=None, delay_periods=1, p_cap=0.0)
Given a stationary time series (signal), calculate rolling p-values. The signal is assumed to be stationary and normally distributed.
- Parameters
num_periods – The number of time-periods of the signal to include in the estimate. I.e. for a daily signal like “close_price.relative_change(days=1)” num_periods is the number of days.
min_periods – The minimum number of actual data-points before estimate is produced. If min_periods is not specified, then min_periods is set equal to num_periods.
delay_periods – The number of periods before the estimated model is applied to the current data-point.
p_cap – A lower threshold on the p-values to be returned (lower values are removed)
Example
Calculate the p-values of the price movements over the past 90 days:
close_price.relative_change(days=1).p_value(num_periods=90, min_periods=50)
A simple outlier detector:
close_price.relative_change(days=1).p_value(num_periods=90, min_periods=50, p_cap=0.9999)
Surge
- signal.surge(short_period, long_period, how, decay=None)
Calculate the surge of a signal. The surge is calculated as the fraction between a moving average of a short window and a moving average of a longer window. The moving average can be exponentially weighted.
When
how
is'ma'
, the short and long period arguments are the window sizes in number of data points.When
how
is'ewm'
, the short and long period arguments are the parameters sent to the Pandasewm
function, and which decay method to use is controlled with thedecay
parameter.The signal is not resampled before the surge is calculated, so the parameters specifying the window periods specify a number of data points, and not a number of days.
- Parameters
short_period – The parameter to use for the (exponentially weighted) moving average in the numerator.
long_period – The parameter to use for the (exponentially weighted) moving average in the denominator.
how – A string specifying what kind of moving average to use, either ‘ewm’ for an exponentially weighted moving average, or ‘ma’ for a regular moving average.
decay – The type of decay parameter to use in the
ewm
function. It can only be set when thehow
parameter isewm
, and it can be one of'com'
,'halflife'
,'span'
and'alpha'
. See the documentation of the Pandas ewm function for further details.
Example
Calculate the surge in close price using an exponentially weighted mean with half-lives of 5 and 20 data points:
close_price.surge(5, 20, 'ewm', 'halflife')
Calculate the surge in transactions using a regular moving average with windows 28 and 91:
TransactionVolume.surge(28, 91, 'ma')
Seasonal adjustment
- seasonal_adjust(signal, how=None)
Makes a seasonal adjustment to the given signal. The signal must have quarterly or monthly frequency.
The adjustment can be either multiplicative (meaning that the signal is multiplied by a certain factor for each seasonal period) or additive (meaning that a constant is added to the signal for each period). By default, a multiplicative adjustment is applied if all of the signal values are positive, while an additive adjustment is applied if any of the signal values are zero or negative. If an additive adjustment is desired even for strictly positive values, then this can be specified by providing the extra argument ‘additive’. If ‘multiplicative’ is specified as the method, then the adjustment will give an error if any value is zero or negative.
- Parameters
signal – The signal to transform.
how – Force the adjustment method to either
'additive'
or'multiplicative'
.
To do seasonal adjustment of sales numbers (default to multiplicative):
seasonal_adjust(Sales_Actual)
To force an additive adjustment of the sales numbers:
seasonal_adjust(Sales_Actual, 'additive')
To force multiplicative adjustment of the sales numbers (fails if any value is zero or negative):
seasonal_adjust(Sales_Actual, 'multiplicative')
Momentum
- signal.momentum(days, limit=10)
Calculate the “momentum” of a signal, defined as its relative change versus a certain number of days ago. This method is closely related to the
relative_change
function, but specialized for daily time series, and automatically forward fills the underlying signal to get a smooth momentum signal without gaps.- Parameters
signal – The signal to transform.
days – The number of days between current and prior period. For example 365 for YoY or 91 for approximately 3 months.
limit – The maximum number of days to forward fill the underlying signal.
To get the 3 month price momentum of the share price:
close_price.momentum(91)
To get the year-over-year change in close price (smoothed with moving average):
close_price.moving_average(90, min_periods=70).momentum(365)
Exponentially weighted mean
- signal.ewm(halflife=None, *, span=None)
Calculate the exponentially weighted mean of the signal. This is a wrapper around the pandas ewm method.
- Parameters
halflife – The number of data points over which the weight should decay to its half.
span – Decay specified in terms of span.
Nan is removed from the timeseries before calling the pandas ewm function. It is therefore recommended to ensure that the data is on a known frequency, without missing values, before performing this operation.
The exponentially weighted mean of the close price:
close_price.ewm(halflife=14)
Normalization
- signal.normalize(normalization_period)
Normalize the signal to zero mean and unit variance. Each time series is normalized separately. You have to specify the normalization period, which is the time period over which the mean and the variance of the signal will be estimated. These values will then be used to normalize the signal across any time period, to ensure that the normalized signal is consistent.
- Parameters
normalization_period – The time period over which the mean and variance of the signal are estimated. The period is specified with the start and end dates.
A use case for normalizing data is to get better properties when creating a model. Some models perform better when the input and output variables are normalized:
predict(Airlines_US_AirRevenuePassengerMiles_Monthly.normalize(('2017-01-01', '2018-12-31')))
- signal.sector_neutral(level, transform_type='winsorized_robust')
Cross-sectional normalization of the signal, applied separately for each sector and each date. The sectors refer to the FactSet RBICS classification, where the level 1 - 6 must be specified with the level argument.
Note that the signal is normalized across the set of companies that it is evaluated for. This means that if the signal is evaluated in Signal Explorer, the result will depend on which companies are selected to be plotted. If only a single company is selected, then the result will be a flat line with the value 0, because that’s what a single value is normalized to. For sensible results, select at least three companies within the same sector when plotting in Signal Explorer.
The intended usage is for alpha signals. When evaluated in an alpha test or a portfolio strategy, the signal is evaluated across all the companies included in such alpha test / strategy, which means the alpha signal will be sector neutral for that run.
There are different methods available for performing the normalization:
‘standard’: sklearn StandardScaler/Z-score
‘robust’: sklearn RobustScaler
‘winsorized_standard’: ‘standard’ followed by a soft capping
‘winsorized_robust’: ‘robust’ followed by a soft capping
‘uniform’: sklearn QuantileScaler
‘minmax’: sklearn MinMaXScaler (-1,1)
- Parameters
level – The level (1-6) of the FactSet RBICS classification to use.
transform_type – Defaults to ‘winsorized_robust’.
A use case for normalizing data by sector is to avoid sector biases in alpha tests and portfolio strategies. By making an alpha signal neutral by sector, the overall portfolio will be better balanced across sectors:
transactions_yoy.sector_neutral(level=2)
- signal.country_neutral(transform_type='winsorized_robust')
Cross-sectional normalization of the signal, applied separately for each country and each date. Each company is assigned to the country of the exchange where it has its primary listing.
- Parameters
transform_type – Defaults to ‘winsorized_robust’. See sector_neutral above for available options.
A use case for normalizing data by country is to avoid country biases in alpha tests and portfolio strategies. By making an alpha signal neutral by country, the overall portfolio will be better balanced across countries.
- signal.group_normalize(group_signal, transform_type='winsorized_robust')
Cross-sectional normalization of the signal, applied separately for each group of companies and each date. A separate signal, group_signal, is used to determine the groups.
The most typical use case would be to group companies by sector (using the
sector_revenue()
signal as the group_signal). However, for this use case there is the shorthandsector_neutral()
method above.- Parameters
group_signal – The signal that determines the groups by which the signal will be normalized.
transform_type – Defaults to ‘winsorized_robust’. See sector_neutral above for available options.
A use case for normalizing data by groups is to avoid biases in alpha tests and portfolio strategies. Typical use cases would be to normalize by sectors or by countries.
- signal.factor_neutral(tag, *factors, screen_frequency)
Neutralizes the effect of one or more factors on a signal by estimating a linear regression with the main signal as the target variable and the factors as the regressors. The output is the residual of the regression.
The set of companies to estimate the regression over must be specified with the
tag
argument. The tag can be either a fixed set of companies or a screen (where the set of companies changes over time). The signal can be evaluated for companies that were not part of the estimation. Typically, you would use the same tag as the one that is used in the alpha test or portfolio strategy, so that the alpha signal is neutralized for the same set of companies.The regression is run separately per day. For each date and each factor, the factor values are taken from the latest date where that factor is available (for any entities). This means that e.g. a monthly factor signal can be used with a daily alpha signal as the main signal. However, no forward filling is performed, so the user is responsible for forward filling the factors if necessary.
A typical use case is to subtract the effect of style factors from an alpha signal.
- param tag
the resource name of the tag or screen that defines the group of companies.
- param factors
one or several factors to neutralize.
- param screen_frequency
the frequency with which to evaluate the screen, if a screen is used to define the group of companies. Defaults to
'M'
for monthly evaluation. Alternatives include'W'
for weekly or'Q'
for quarterly update of the screen.
Example
Remove the growth style factor from an alternative data YoY growth signal:
TransactionDataYoY.factor_neutral('tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61', factor_loading_growth)Remove the size and momentum style factors from an alternative data YoY surge signal, using a screen that is updated quarterly:
TransactionDataYoY.factor_neutral('screens/1265', factor_loading_short_term_momentum, factor_loading_size, screen_frequency='Q')
Cross-sectional correlation
- cross_sectional_correlation(signal_a, signal_b, tag, screen_frequency)
Calculate the cross-sectional correlation between two signals. The result is a single time series, where the value for a given date is the correlation between the signal values across a set of entities.
Note that no forward filling is applied to the signals, and the correlation will only be calculated on days where both signals have values. The user can apply forward filling to the input signals as desired before applying this function.
- Parameters
signal_a – one of the signals
signal_b – the other signal
tag – the resource name of the screen or tag that defines the group of entities.
screen_frequency – the frequency with which to evaluate the screen, if a screen is used to define the group of entities. Defaults to
'M'
for monthly evaluation. Alternatives include'W'
for weekly or'Q'
for quarterly update of the screen.
Example
Calculate correlation between an alternative data signal and the growth style factor:
cross_sectional_correlation(TransactionDataYoY, factor_loading_growth.filled_daily(limit=31), 'tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61')
Elementwise transforms
- signal.apply(function)
Apply a function to each element of the signal.
- Parameters
function – The name of the function to apply. A non-exhaustive list of standard function transform is:
log
(logarithm),exp
(exponentiation),sqrt
(square root),abs
(absolute value),tanh
(hyperbolic tangent).
A use case for transforming data is to get better properties when creating a model. In many cases, a model performs better if the data has been transformed with the logarithm before estimating the model. Here is an example on how to transform the signal:
US_CivilianUnemploymentRate_Monthly.apply('log')
It is also possible to do this in a modelling context. Then doing something like this can work:
predict(US_CivilianUnemploymentRate_Monthly.apply('log')).apply('exp')
We apply “exp” in the end to transform the predictions back to the original scale.
Time-axis operations
- signal.combine_first(other, extend_only=False)
Update null elements with value from the same location in
other
. This is a wrapper around the pandascombine_first()
method.- Parameters
other – Signal to use for filling null values.
extend_only – If
True
, null values are only replaced after the last non-null value.
Example
Combine FactSet actual sales with FactSet sales estimate:
actual('sales', alignment='afp').combine_first(sales_estimate(before_release=True, alignment='afp'))
- signal.reindex_like(index_signal, fill_method='ffill')
Returns the signal re-indexed so that it has the same time-index as index_signal given the method fill_method. This is a wrapper around the pandas reindex method.
- Parameters
index_signal – A supplied signal whose time-index is used for the resampling of signal.
fill_method – The operation used to align series to index_signal before sampling. Valid values are ‘None’, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’.
Example
The price return on earnings release dates:
close_price.relative_change(days=1).reindex_like(actual('sales',alignment='rd'))
- signal.filled_daily(fetch_prior_data=120, fetch_prior_data_from=None, stop_at_last_valid_value=False, fetch_later_data=7, limit=None, allow_forward_fill_for_current_dates=False, *, fill_value=None)
Transforms a signal by changing the frequency to daily and forward filling missing values.
- Parameters
fetch_prior_data – the number of days of prior data to retrieve in order to forward fill; the default amount is sufficient for quarterly data (with quarters up to seventeen weeks), as long as there are no missing data
fetch_prior_data_from – the start date to use for forward filling. If set, it overrides the ‘fetch_prior_data’ argument.
stop_at_last_valid_value – if True values will not be forward filled after the last available non-null value
fetch_later_data – the number of days after the eval period to retrieve data for to determine the last available non-null value. Only used if ‘stop_at_last_valid_value’ is True.
limit – the maximum number of consecutive null values that are filled. If not set, all null values are filled (assuming there is a non-null value before it). If set, must be set to 1 or higher.
allow_forward_fill_for_current_dates – if True, and the difference in days between the current date and the date with the last non-NaN value is less than the given limit, values will be forward filled even if stop_at_last_valid_value is True. When evaluating with a version, the current date is assumed to be the version.
fill_value – By default, the previous value is forward filled. If fill_value is specified, then this value will be used instead. The dates filled are exactly the same, and all the other parameters such as ‘limit’ and ‘allow_forward_fill_for_current_dates’ apply as usual.
Example
Forward fill data in a monthly signal up to 35 days to make it a daily signal:
my_monthly_data.filled_daily(limit=35)Fill in missing values with 0, up to 6 days forwards, but not past the last value in the series:
my_daily_signal.filled_daily(limit=6, fill_value=0, stop_at_last_valid_value=True)
- signal.align_to_dates(index_signal, max_forward=None, max_backward=None, pre_extend=None, post_extend=None)
Returns the signal with the values aligned to the dates of the index_signal.
For each value in the signal we find the date in the index_signal which is closest in time and assigns that date to it, provided that it satisfies the movement constraints (given by max_forward and max_backward).
If there are two dates that are equally far away, the value is moved forwards.
If there is no date available within the movement constraints for some value, the value is discarded.
If there are multiple values in the signal that have the same the date as its closest one, the value that is closest in time one is aligned to the date; the other values are discarded. If there is a tie between two dates, the one that would be move forward is used, while the other one is discarded.
- Parameters
index_signal – A signal whose time-index is used for the resampling of signal.
max_forward – Maximum number of days a data point can be moved forwards. Default is
None
, which means no limit.max_backward – Maximum number of days a data point can be moved backwards. Default is
None
, which means no limit.pre_extend – Offset to pre-extend the signal evaluation period with. By default, this offset is set equal to the
max_forward
constraint ifmax_forward
is not None and 1 year otherwise.post_extend – Offset to post-extend the signal evaluation period with. By default, this offset is set equal to the
max_backward
constraint ifmax_backward
is not None and 1 year otherwise.
Example
Align the signal ‘my_quarterly_signal’ to fundamental sales, allowing data points to move forwards 10 days and backwards 5 days:
my_quarterly_signal.align_to_dates(fundamental('sales'), max_forward=10, max_backward=5)Align the signal ‘my_quarterly_signal’ to fundamental sales, allowing data points to move unrestricted:
my_quarterly_signal.align_to_dates(fundamental('sales'), pre_extend=pandas.DateOffset(months=6))
- signal.aggregate_over(aggregate_signal, aggregation_method='sum', *, max_window=None, min_data_points=None, include_first_period=False)
Returns the signal with the time index of the aggregate signal, where the values of the signal are aggregated according to the given method.
The values of the aggregate_signal are not used.
- Parameters
aggregate_signal – a supplied signal whose time index is used in the returned signal
aggregate_method – the method used by the aggregation (e.g.
'mean'
,'sum'
,'median'
,'prod'
,'std'
,'var'
)max_window – Maximum number of days to look back when aggregating data. This can be useful if the aggregation signal may have missing data.
min_data_points – Minimum number of data points per aggregation date. If there are not enough data points, the aggregated value is set to NaN. This can be useful if the signal has missing data.
include_first_period – Whether the first aggregation period should be included. The reason it is useful to exclude first aggregation period, is that it does not have a known starting point.
Example
Aggregating a signal data_signal (which is an arbitrary signal you have access to), taking the sum over the fiscal quarters. (The extend argument ensures that the signal also produces values for the present quarter):
data_signal.aggregate_over(fiscal_calendar(extend=1), max_window=92)
Handling missing values in weighted sums
- weighted_sum(signal_1, signal_2, ..., signal_n, weights = [w_1, w_2,...w_3], nan_when_missing, normalize)
This function provides a method for handling weighted sums with possible missing values. When the signals have numerical values for a given time the value of weighted_sum is
w_1*signal_1 + w_2*signal_2 + … + w_n*signal_n.
When some signals are not numbers, the weighted sum is taken over only the signals with numerical values.
- Parameters
signal_j – For j=1, 2, …, n, the signals which are combined.
weights – When a list of n numerical weights is supplied this is the weights in the sum. When no weights are supplied it is assumed that w_j=1.
nan_when_missing – If one of the signals have a missing value (NaN) the value of the sum is set to missing (NaN) if nan_when_missing=True, otherwise the missing values are skipped in the sum.
normalize – If normalize=True the weights w_j are normalized so that the active weights sum to 1.
Example
Using last reported EPS as a proxy for missing EPS estimate:
weighted_sum(estimate('eps'), actual('eps',alignment='rd'), weights=[100,1], normalize=True)
Estimate changes
- estimate_change(estimate, next_estimate, crossover_month)
Calculates the change in analysts’ estimates, taking account of the fact that the estimates refer to changing fiscal periods.
The
estimate
argument should be a signal for estimates in a given period such as this year, next year or in two years. Thenext_estimate
argument should be a signal for estimates in the following year. The crossover_month should be a number between 1 and 12 denoting the month in which the estimate rolls over to the next fiscal year. Thenext_estimate
andcrossover_month
arguments are both optional.In all months except the crossover month the result is obtained by calculating the change in the estimate signal. In the crossover month the result is obtained by calculating the change relative to the next_estimate signal instead. If the
next_estimate
signal is not provided, the value is set to 0 in the crossover month.If the crossover month is not provided, the last month in the fiscal year is used as the crossover month.
- Parameters
estimate – a signal producing estimates for a given period
next_estimate – a signal producing estimates for the following period
crossover_month – the month in which the estimate rolls over to the next fiscal year
Company group normalization
- signal.group_transform(transform_type, centering_type, centering_weight_signal, tag, screen_frequency)
The group_transform operation does a cross-sectional normalization of a signal across a set of companies.
- Parameters
transform_type – the transform to use, either
'robust'
,'winsorized_robust'
,'standard'
,winsorized_standard'
,'uniform'
,'minmax'
or'identity'
. Defaults to'identity'
.centering_type – the centering to use, either
'weighted_mean'
,'mean'
,'median'
or'none'
. Defaults to'none'
, which results in a centering given by the transform_type.centering_weight_signal – the signal used as weights when specifying
'weighted_mean'
for thecentering_type
. Must be specified whencentering_type='weighted_mean'
, otherwise ignored.tag – the resource name of the tag or screen that defines the group of companies.
screen_frequency – the frequency with which to evaluate the screen, if a screen is used to define the group of companies. Defaults to
'M'
for monthly evaluation. Alternatives include'W'
for weekly or'Q'
for quarterly update of the screen.
Transform type |
Description |
---|---|
robust |
Applies sklearn’s RobustScaler. This transform subtracts the median, and then scales the data according to the quantile range from 25th to 75th percentile. |
winsorized_robust |
|
standard |
Applies sklearn’s StandardScaler, which subtracts the mean and then scales to unit variance. |
winsorized_standard |
|
uniform |
Transforms the data to percentiles using sklearn’s QuantileTransformer. |
minmax |
Scales the data linearly to the range [-1, 1]. |
identity |
No transform, which means that only centering is applied. Rarely used in practice. |
Example
When doing a group transform, you must specify which group of companies should be used. This is done in the following way, using the ID of a company tag:
Market_Cap_mUSD.group_transform('winsorized_robust', tag='tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61')It can also be done using the ID of a company screen. In this case, the set of companies included in the group will be updated periodically based on the criteria of the screen. By default the screen is updated monthly, but this can changed with the screen_frequency parameter, for instance set to quarterly update:
Market_Cap_mUSD.group_transform('winsorized_robust', tag='screens/1265', screen_frequency='Q')Apply the uniform transform:
Market_Cap_mUSD.group_transform('uniform', tag='tags/user:2a46627e-4e03-49f2-808e-d6fdadebbc61')