Transformations

Arithmetic operators

Signals can be added (+), subtracted (-), multiplied (*) or divided (/) using the normal arithmetic operators.

Absolute change

signal.change(method, limit, days, weeks, months, years)

Express the signal as the absolute change between the signal’s value, and the value it had at a prior date. The prior date is determined by an offset specified in years, months, weeks or days.

The method argument specifies how we find the prior data point to subtract. When the method argument is:

  • 'ffill' we forward fill the prior data a maximum of limit days to match the dates.

  • 'bfill' we backfill the prior data a maximum of limit days to match the dates.

  • 'nearest' we move the prior data forward or backward a maximum of limit days to match the dates. We match with the prior data point which is moved the shortest.

  • None we do not move the past data at all.

When the method argument is 'auto' we select the method and limit based on the given offset (unless they are explicitly set). The method and limit is:

  • 'nearest' and 31 days if the offset is more than or equal to 3 months

  • 'nearest' and 10 days if the offset is between 1 and 3 months

  • 'ffill' and 10 days if the offset is less than 1 month

This is because forward filling makes more sense when calculating the short windowed change in for example the close price, while the lenient matching of the 'nearest' method makes more sense for less frequent data like quarterly data.

Parameters
  • method – The method to use when matching with past data points, either 'auto' (default), 'nearest', 'ffill' / 'pad', 'bfill' / 'backfill' or None.

  • limit – The limit on how far the matching method can move the data points.

  • days – The number of days between current and prior period, for example days=1.

  • weeks – The number of weeks, for example weeks=3.

  • months – The number of months, for example months=12.

  • years – The number of years, for example years=1.

To get the day-over-day change in closing share prices:

close_price.change(days=1)

Note that forward filling with a limit of 10 days is the default for such short offsets, so the value a regular Monday will be the change against the price on the Friday before.

Relative (percent) change

signal.relative_change(method, limit, days, weeks, months, years)

Express the signal as the relative (percent) change from a prior date. The prior date is determined by an offset in years, months, weeks or days. Note that an output of 1 corresponds to a 100% change from the prior value.

See the above signal.change(...) documentation for a deeper explanation of the method and limit arguments.

Parameters
  • method – The method to use when matching with past data points, either 'auto' (default), 'nearest', 'ffill' / 'pad', 'bfill' / 'backfill' or None.

  • limit – The limit on how far the matching method can move the data points.

  • days – The number of days between current and prior period, for example days=1.

  • weeks – The number of weeks, for example weeks=3.

  • months – The number of months, for example months=12.

  • years – The number of years, for example years=1.

To get the year-over-year relative change in monthly trading volume data:

monthly(trading_volume).relative_change(years=1)

Similarly, to get the month-over-month relative change in US housing prices:

US_PurchaseOnlyHousePriceIndex_SA_Monthly.relative_change(months=1)
signal.yoy_holidays(country, window, years=1, *, min_periods=1, min_denominator=1e-6)

Calculate the year-over-year ratio for a signal, adjusted for holidays.

Note that this returns a ratio, so a value of 1.1 represents a 10% increase. You should subtract 1 to get the yoy change.

The year-over-year values are calculated by calculating a moving average whose length is given by the argument window, and dividing it by the corresponding moving average from last year.

However, some modifications are applied to the data in order to account for the effects of moveable holidays. That is, if the moving average window this year contains, say, Black Friday, the corresponding moving average window for last year is adjusted so that it, too, contains Black Friday.

Currently, only the holidays for the United States and Norway are supported. The days that are treated as holidays for the United States are all the federal holidays in addition to Black Friday and Cyber Monday.

Parameters
  • country – The country whose calendar should be used to determine holidays. Currently only US and NO are supported.

  • window – The number of days in the moving window when calculating the moving average. This can be 1 or any integer divisible by 7. The default is 1.

  • years – The number of years between the window used in the numerator and the denominator

  • min_periods – The minimum number of non-NaN values present (in each of the numerator and denominator) required. If there are fewer non-NaN values than this, NaN is produced.

  • min_denominator – Produce NaN if the sum of the values in the denominator is smaller than this.

To get the year-over-year change in a signal with a seven-day moving average window:

signal.yoy_holidays(country='US', window=7) - 1

Change relative to other signal

signal.change_relative_to(from_signal, periods, pre_extend_months, shift_offset_days, shift_offset_months, shift_offset_tolerance_days, relative_change)

Calculate the change of one signal relative to another signal’s past values.

There are two ways of calculating the change:

  • Using the periods argument: The from_signal is re-indexed to the index of signal and the values are shifted by the given number of periods before the change is calculated.

  • Using the shift_offset_days/shift_offset_months arguments: The index of the from_signal is shifted by the given offset and then re-indexed to the index of signal using method="nearest" to match the nearest value. shift_offset_tolerance_days specifies the maximum number of days a value can be moved when determining nearest value. If there are two data points in the from_signal with the same distance to a data point in signal, the newest one will be used.

Note that for signals that may produce time series with missing values, it is preferable to use shift_offset_days/shift_offset_months. This method takes into account the absolute number of days a value can be shifted, whereas the periods method will blindly shift values.

All arguments except for signal are optional. If no other arguments than signal are given, periods will be used with a default value of 1.

Parameters
  • from_signal – The signal to calculate the relative change from.

  • periods – The number of periods to shift the from_signal. The default value is None.

  • pre_extend_months – If periods is used, this specifies the number of months to pre-extend the signal evaluation period. The default value is None.

  • shift_offset_days – The number of days to shift from_signal. The default value is None.

  • shift_offset_months – The number of months to shift from_signal. The default value is None.

  • shift_offset_tolerance_days – If shift_offset_days/shift_offset_months is used, this specifies the maximum number of days a value can be moved when determining the nearest value. The default value is 31.

  • relative_change – Whether to calculate the relative change. Set to True to calculate relative change and False to calculate actual change. The default value is True.

Calculate year-over-year relative change in sales:

sales().change_relative_to(sales(), shift_offset_months=12)

Time aggregations

Signals can be aggregated up to monthly or weekly time resolution:

monthly(signal)

Aggregate the given signal to monthly resolution. All the values within each calendar month are summed up (or averaged).

Parameters
  • signal – The signal to aggregate.

  • how (str) – How to aggregate the signal with each month. Defaults to 'sum', but alternatively 'mean' or 'median' can be provided to yield respectively the arithmetic mean or the median of all the values within each month.

To get the monthly total trading volume:

monthly(trading_volume)

To get the monthly average close price:

monthly(close_price, 'mean')
weekly(signal)

Aggregate the given signal to weekly resolution. All the values within each calendar week (starting on Monday) are summed up (or averaged).

Parameters
  • signal – The signal to aggregate.

  • how (str) – How to aggregate the signal with each week. Defaults to 'sum'; alternative values are 'mean' and 'median'.

This function is used in the same way as monthly.

If more control of the resampling is required, it is possible to use the resample function. The syntax is:

signal.resample(resampling_strategy, aggregation_type)

Aggregate the given signal to a resolution of choice. The interval used for for resampling is defined by the resampling strategy, and the aggregation type defines how to aggregate the signal.

Parameters
  • resampling_strategy – The frequency to aggregate on. Examples include 'W' (weekly), 'M' (monthly), 'D' (daily), 'W-{weekday}', where {weekday} indicates on which day the week starts.

  • aggregation_type – How to aggregate the data. Examples include 'mean', 'median', 'sum', 'bfill' (backfill).

To get the mean of weekly volume data, where the week starts on Monday, use:

trading_volume.resample('W-MON', 'mean')

Moving average

signal.moving_average(window, freq=None, min_periods=1)

Calculate the moving average of a signal. This is typically done with noisy data in order to get a cleaner signal. For instance, looking at daily credit card transaction data doesn’t make much sense because of the noise level, but smoothing over say 90 days gives a more informative signal.

Parameters
  • signal – The signal to calculate.

  • window – The number of calendar days to calculate moving average over (if ‘freq’ is not set), or the number of data points to calculate the moving average if ‘freq’ is set.

  • freq – Leave it with the default setting of “None” to interpret the window as number of days. For e.g. monthly signals, set it to ‘M’ (or ‘MS’) to interpret the ‘window’ argument as number of months instead of number of days. Note that the ‘freq’, if set, must be the same as the frequency of the signal.

  • min_periods – The minimum number of data points to require in order to calculate a value. Defaults to 1, which means that a “moving average” is calculated from the very first data point in the time series, even though it’s just an “average” of one data point (and then the next one is an average of 2 and so forth). To avoid noisy data in the beginning of the time series, increase this setting.

To get the 90-day moving average of close price data, with a minimum of 70 data points required to average over:

close_price.moving_average(90, min_periods=70)

To get the 3 month moving average of US housing price index, with a minimum of 3 data points to average over:

US_PurchaseOnlyHousePriceIndex_SA_Monthly.moving_average(3, 'MS', 3)

Rolling window aggregations

signal.rolling_aggregation(window, operation, freq=None, min_periods=1)

Calculate a rolling window operation in the time direction, this is a generalization of the moving_average operation.

Parameters
  • window – The number of calendar days to calculate moving average over (if ‘freq’ is not set), or the number of data points to calculate the moving average if ‘freq’ is set. The current point is included in the window.

  • operation – The operation carried out on the window. This can be represented as a function, e.g. np.std, lambda expressions or strings like “mean”, “sum”, “max”, “min”, “std”.

  • freq – Leave it with the default setting of “None” to interpret the window as number of days. For e.g. monthly signals, set it to ‘M’ (or ‘MS’) to interpret the ‘window’ argument as number of months instead of number of days. Note that the ‘freq’, if set, must be the same as the frequency of the signal.

  • min_periods – The minimum number of data points in the window for the transform to return value for each data-point. Typically one will lose some data-points in the beginning of the interval.

Examples

The the largest daily absolute percentage-wise price movement the last four weeks:

close_price.relative_change(days=1).rolling_aggregation(28,  lambda w: np.max(np.abs(w)), freq="D")

The largest reported sales this year, assuming standard quarterly releases:

actual('sales').rolling_aggregation(4,  "max", freq="Q")

Delay (lag)

signal.delay(align, days, weeks, months, years)

Delay (lag) a signal by a specified number of days, weeks, months or years.

Parameters
  • align – Whether to align the delayed signal to the original (default False)

  • days – Number of days to delay

  • weeks – Number of weeks to delay

  • months – Number of months to delay

  • years – Number of years to delay

Rolling z-score

signal.z_score(num_periods, min_periods=None, delay_periods=1)

Given a stationary time series (signal), calculate a rolling window z-score. The signal is assumed to be stationary and normally distributed.

Parameters
  • num_periods – The number of time-periods of the signal to include in the estimate. I.e. for a daily signal like “close_price.relative_change(days=1)” num_periods is the number of days.

  • min_periods – The minimum number of actual data-points before estimate is produced. If min_periods is not specified, then min_periods is set equal to num_periods.

  • delay_periods – The number of periods before the estimated model is applied to the current data-point.

Example

Calculate the z-scores of the price movements over the past 90 days:

close_price.relative_change(days=1).z_score(num_periods=90)

Rolling p-value

signal.p_value(num_periods, min_periods=None, delay_periods=1, p_cap=0.0)

Given a stationary time series (signal), calculate rolling p-values. The signal is assumed to be stationary and normally distributed.

Parameters
  • num_periods – The number of time-periods of the signal to include in the estimate. I.e. for a daily signal like “close_price.relative_change(days=1)” num_periods is the number of days.

  • min_periods – The minimum number of actual data-points before estimate is produced. If min_periods is not specified, then min_periods is set equal to num_periods.

  • delay_periods – The number of periods before the estimated model is applied to the current data-point.

  • p_cap – A lower threshold on the p-values to be returned (lower values are removed)

Example

Calculate the p-values of the price movements over the past 90 days:

close_price.relative_change(days=1).p_value(num_periods=90, min_periods=50)

A simple outlier detector:

close_price.relative_change(days=1).p_value(num_periods=90, min_periods=50, p_cap=0.9999)

Surge

signal.surge(short_period, long_period, how, decay=None)

Calculate the surge of a signal. The surge is calculated as the fraction between a moving average of a short window and a moving average of a longer window. The moving average can be exponentially weighted.

When how is 'ma', the short and long period arguments are the window sizes in number of data points.

When how is 'ewm', the short and long period arguments are the parameters sent to the Pandas ewm function, and which decay method to use is controlled with the decay parameter.

The signal is not resampled before the surge is calculated, so the parameters specifying the window periods specify a number of data points, and not a number of days.

Parameters
  • short_period – The parameter to use for the (exponentially weighted) moving average in the numerator.

  • long_period – The parameter to use for the (exponentially weighted) moving average in the denominator.

  • how – A string specifying what kind of moving average to use, either ‘ewm’ for an exponentially weighted moving average, or ‘ma’ for a regular moving average.

  • decay – The type of decay parameter to use in the ewm function. It can only be set when the how parameter is ewm, and it can be one of 'com', 'halflife', 'span' and 'alpha'. See the documentation of the Pandas ewm function for further details.

Example

Calculate the surge in close price using an exponentially weighted mean with half-lives of 5 and 20 data points:

close_price.surge(5, 20, 'ewm', 'halflife')

Calculate the surge in transactions using a regular moving average with windows 28 and 91:

TransactionVolume.surge(28, 91, 'ma')

Seasonal adjustment

seasonal_adjust(signal, how=None)

Makes a seasonal adjustment to the given signal. The signal must have quarterly or monthly frequency.

The adjustment can be either multiplicative (meaning that the signal is multiplied by a certain factor for each seasonal period) or additive (meaning that a constant is added to the signal for each period). By default, a multiplicative adjustment is applied if all of the signal values are positive, while an additive adjustment is applied if any of the signal values are zero or negative. If an additive adjustment is desired even for strictly positive values, then this can be specified by providing the extra argument ‘additive’. If ‘multiplicative’ is specified as the method, then the adjustment will give an error if any value is zero or negative.

Parameters
  • signal – The signal to transform.

  • how – Force the adjustment method to either 'additive' or 'multiplicative'.

To do seasonal adjustment of sales numbers (default to multiplicative):

seasonal_adjust(Sales_Actual)

To force an additive adjustment of the sales numbers:

seasonal_adjust(Sales_Actual, 'additive')

To force multiplicative adjustment of the sales numbers (fails if any value is zero or negative):

seasonal_adjust(Sales_Actual, 'multiplicative')

Momentum

signal.momentum(days, limit=10)

Calculate the “momentum” of a signal, defined as its relative change versus a certain number of days ago. This method is closely related to the relative_change function, but specialized for daily time series, and automatically forward fills the underlying signal to get a smooth momentum signal without gaps.

Parameters
  • signal – The signal to transform.

  • days – The number of days between current and prior period. For example 365 for YoY or 91 for approximately 3 months.

  • limit – The maximum number of days to forward fill the underlying signal.

To get the 3 month price momentum of the share price:

close_price.momentum(91)

To get the year-over-year change in close price (smoothed with moving average):

close_price.moving_average(90, min_periods=70).momentum(365)

Exponentially weighted mean

signal.ewm(halflife=None, *, span=None)

Calculate the exponentially weighted mean of the signal. This is a wrapper around the pandas ewm method.

Parameters
  • halflife – The number of data points over which the weight should decay to its half.

  • span – Decay specified in terms of span.

Nan is removed from the timeseries before calling the pandas ewm function. It is therefore recommended to ensure that the data is on a known frequency, without missing values, before performing this operation.

The exponentially weighted mean of the close price:

close_price.ewm(halflife=14)

Normalization

signal.normalize(normalization_period)

Normalize the signal to zero mean and unit variance. Each time series is normalized separately. You have to specify the normalization period, which is the time period over which the mean and the variance of the signal will be estimated. These values will then be used to normalize the signal across any time period, to ensure that the normalized signal is consistent.

Parameters

normalization_period – The time period over which the mean and variance of the signal are estimated. The period is specified with the start and end dates.

A use case for normalizing data is to get better properties when creating a model. Some models perform better when the input and output variables are normalized:

predict(Airlines_US_AirRevenuePassengerMiles_Monthly.normalize(('2017-01-01', '2018-12-31')))
signal.sector_neutral(level, transform_type='winsorized_robust')

Cross-sectional normalization of the signal, applied separately for each sector and each date. The sectors refer to the FactSet RBICS classification, where the level 1 - 6 must be specified with the level argument.

Note that the signal is normalized across the set of companies that it is evaluated for. This means that if the signal is evaluated in Signal Explorer, the result will depend on which companies are selected to be plotted. If only a single company is selected, then the result will be a flat line with the value 0, because that’s what a single value is normalized to. For sensible results, select at least three companies within the same sector when plotting in Signal Explorer.

The intended usage is for alpha signals. When evaluated in an alpha test or a portfolio strategy, the signal is evaluated across all the companies included in such alpha test / strategy, which means the alpha signal will be sector neutral for that run.

There are different methods available for performing the normalization:

  • ‘standard’: sklearn StandardScaler/Z-score

  • ‘robust’: sklearn RobustScaler

  • ‘winsorized_standard’: ‘standard’ followed by a soft capping

  • ‘winsorized_robust’: ‘robust’ followed by a soft capping

  • ‘uniform’: sklearn QuantileScaler

  • ‘minmax’: sklearn MinMaXScaler (-1,1)

Parameters
  • level – The level (1-6) of the FactSet RBICS classification to use.

  • transform_type – Defaults to ‘winsorized_robust’.

A use case for normalizing data by sector is to avoid sector biases in alpha tests and portfolio strategies. By making an alpha signal neutral by sector, the overall portfolio will be better balanced across sectors:

transactions_yoy.sector_neutral(level=2)
signal.country_neutral(transform_type='winsorized_robust')

Cross-sectional normalization of the signal, applied separately for each country and each date. Each company is assigned to the country of the exchange where it has its primary listing.

Parameters

transform_type – Defaults to ‘winsorized_robust’. See sector_neutral above for available options.

A use case for normalizing data by country is to avoid country biases in alpha tests and portfolio strategies. By making an alpha signal neutral by country, the overall portfolio will be better balanced across countries.

signal.group_normalize(group_signal, transform_type='winsorized_robust')

Cross-sectional normalization of the signal, applied separately for each group of companies and each date. A separate signal, group_signal, is used to determine the groups.

The most typical use case would be to group companies by sector (using the sector_revenue() signal as the group_signal). However, for this use case there is the shorthand sector_neutral() method above.

Parameters
  • group_signal – The signal that determines the groups by which the signal will be normalized.

  • transform_type – Defaults to ‘winsorized_robust’. See sector_neutral above for available options.

A use case for normalizing data by groups is to avoid biases in alpha tests and portfolio strategies. Typical use cases would be to normalize by sectors or by countries.

signal.factor_neutral(tag, *factors, screen_frequency)

Neutralizes the effect of one or more factors on a signal by estimating a linear regression with the main signal as the target variable and the factors as the regressors. The output is the residual of the regression.

The set of companies to estimate the regression over must be specified with the tag argument. The tag can be either a fixed set of companies or a screen (where the set of companies changes over time). The signal can be evaluated for companies that were not part of the estimation. Typically, you would use the same tag as the one that is used in the alpha test or portfolio strategy, so that the alpha signal is neutralized for the same set of companies.

The regression is run separately per day. For each date and each factor, the factor values are taken from the latest date where that factor is available (for any entities). This means that e.g. a monthly factor signal can be used with a daily alpha signal as the main signal. However, no forward filling is performed, so the user is responsible for forward filling the factors if necessary.

A typical use case is to subtract the effect of style factors from an alpha signal.

param tag

the ID of the tag that defines the group of companies. Can be either a static tag or a screen (dynamic tag).

param factors

one or several factors to neutralize.

param screen_frequency

the frequency with which to evaluate the screen, if a screen is used to define the group of companies. Defaults to 'M' for monthly evaluation. Alternatives include 'W' for weekly or 'Q' for quarterly update of the screen.

Example

Remove the growth style factor from an alternative data YoY growth signal:

TransactionDataYoY.factor_neutral('graph:tag:user:2a46627e-4e03-49f2-808e-d6fdadebbc61', factor_loading_growth)

Remove the size and momentum style factors from an alternative data YoY surge signal, using a screen that is updated quarterly:

TransactionDataYoY.factor_neutral('signal:dynamicTag:1265', factor_loading_short_term_momentum, factor_loading_size, screen_frequency='Q')

Cross-sectional correlation

cross_sectional_correlation(signal_a, signal_b, tag, screen_frequency)

Calculate the cross-sectional correlation between two signals. The result is a single time series, where the value for a given date is the correlation between the signal values across a set of entities.

Note that no forward filling is applied to the signals, and the correlation will only be calculated on days where both signals have values. The user can apply forward filling to the input signals as desired before applying this function.

Parameters
  • signal_a – one of the signals

  • signal_b – the other signal

  • tag – the ID of the tag that defines the group of entities. Can be either a static tag or a screen (dynamic tag).

  • screen_frequency – the frequency with which to evaluate the screen, if a screen is used to define the group of entities. Defaults to 'M' for monthly evaluation. Alternatives include 'W' for weekly or 'Q' for quarterly update of the screen.

Example

Calculate correlation between an alternative data signal and the growth style factor:

cross_sectional_correlation(TransactionDataYoY, factor_loading_growth.filled_daily(limit=31), 'graph:tag:user:2a46627e-4e03-49f2-808e-d6fdadebbc61')

Elementwise transforms

signal.apply(function)

Apply a function to each element of the signal.

Parameters

function – The name of the function to apply. A non-exhaustive list of standard function transform is: log (logarithm), exp (exponentiation), sqrt (square root), abs (absolute value), tanh (hyperbolic tangent).

A use case for transforming data is to get better properties when creating a model. In many cases, a model performs better if the data has been transformed with the logarithm before estimating the model. Here is an example on how to transform the signal:

US_CivilianUnemploymentRate_Monthly.apply('log')

It is also possible to do this in a modelling context. Then doing something like this can work:

predict(US_CivilianUnemploymentRate_Monthly.apply('log')).apply('exp')

We apply “exp” in the end to transform the predictions back to the original scale.

Time-axis operations

signal.reindex_like(index_signal, fill_method='ffill')

Returns the signal re-indexed so that it has the same time-index as index_signal given the method fill_method. This is a wrapper around the pandas reindex method.

Parameters
  • index_signal – A supplied signal whose time-index is used for the resampling of signal.

  • fill_method – The operation used to align series to index_signal before sampling. Valid values are ‘None’, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’.

Example

The price return on earnings release dates:

close_price.relative_change(days=1).reindex_like(actual('sales',alignment='rd'))
signal.filled_daily(fetch_prior_data=120, fetch_prior_data_from=None, stop_at_last_valid_value=False, fetch_later_data=7, limit=None, allow_forward_fill_for_current_dates=False)

Transforms a signal by changing the frequency to daily and forward filling missing values.

Parameters
  • fetch_prior_data – the number of days of prior data to retrieve in order to forward fill; the default amount is sufficient for quarterly data (with quarters up to seventeen weeks), as long as there are no missing data

  • fetch_prior_data_from – the start date to use for forward filling. If set, it overrides the ‘fetch_prior_data’ argument.

  • stop_at_last_valid_value – if True values will not be forward filled after the last available non-null value

  • fetch_later_data – the number of days after the eval period to retrieve data for to determine the last available non-null value. Only used if ‘stop_at_last_valid_value’ is True.

  • limit – the maximum number of consecutive null values that are filled. If not set, all null values are filled (assuming there is a non-null value before it).

  • allow_forward_fill_for_current_dates – if True, and the difference in days between the current date and the date with the last non-NaN value is less than the given limit, values will be forward filled even if stop_at_last_valid_value is True. When evaluating with a version, the current date is assumed to be the version.

Example

Forward fill data in a monthly signal up to 35 days to make it a daily signal:

my_monthly_data.filled_daily(limit=35)
signal.align_to_dates(index_signal, max_forward=None, max_backward=None, pre_extend=None, post_extend=None)

Returns the signal with the values aligned to the dates of the index_signal.

For each value in the signal we find the date in the index_signal which is closest in time and assigns that date to it, provided that it satisfies the movement constraints (given by max_forward and max_backward).

If there are two dates that are equally far away, the value is moved forwards.

If there is no date available within the movement constraints for some value, the value is discarded.

If there are multiple values in the signal that have the same the date as its closest one, the value that is closest in time one is aligned to the date; the other values are discarded. If there is a tie between two dates, the one that would be move forward is used, while the other one is discarded.

Parameters
  • index_signal – A signal whose time-index is used for the resampling of signal.

  • max_forward – Maximum number of days a data point can be moved forwards. Default is None, which means no limit.

  • max_backward – Maximum number of days a data point can be moved backwards. Default is None, which means no limit.

  • pre_extend – Offset to pre-extend the signal evaluation period with. By default, this offset is set equal to the max_forward constraint if max_forward is not None and 1 year otherwise.

  • post_extend – Offset to post-extend the signal evaluation period with. By default, this offset is set equal to the max_backward constraint if max_backward is not None and 1 year otherwise.

Example

Align the signal ‘my_quarterly_signal’ to fundamental sales, allowing data points to move forwards 10 days and backwards 5 days:

my_quarterly_signal.align_to_dates(fundamental('sales'), max_forward=10, max_backward=5)

Align the signal ‘my_quarterly_signal’ to fundamental sales, allowing data points to move unrestricted:

my_quarterly_signal.align_to_dates(fundamental('sales'), pre_extend=pandas.DateOffset(months=6))
signal.aggregate_over(aggregate_signal, aggregation_method='sum')

Returns the signal with the time index of the aggregate signal, where the values of the signal are aggregated according to the given method.

The values of the aggregate_signal are not used.

Parameters
  • aggregate_signal – a supplied signal whose time index is used in the returned signal

  • aggregate_method – the method used by the aggregation (e.g. 'mean', 'sum', 'median', 'prod', 'std', 'var')

  • max_window – Maximum number of days to look back when aggregating data. This can be useful if the aggregation signal may have missing data.

  • min_data_points – Minimum number of data points per aggregation date. If there are not enough data points, the aggregated value is set to NaN. This can be useful if the signal has missing data.

Example

Aggregating a signal data_signal (which is an arbitrary signal you have access to), taking the sum over the fiscal quarters. (The extend argument ensures that the signal also produces values for the present quarter):

data_signal.aggregate_over(fiscal_calendar(extend=1), max_window=92)

Handling missing values in weighted sums

weighted_sum(signal_1, signal_2, ..., signal_n, weights = [w_1, w_2,...w_3], nan_when_missing, normalize)

This function provides a method for handling weighted sums with possible missing values. When the signals have numerical values for a given time the value of weighted_sum is

w_1*signal_1 + w_2*signal_2 + … + w_n*signal_n.

When some signals are not numbers, the weighted sum is taken over only the signals with numerical values.

Parameters
  • signal_j – For j=1, 2, …, n, the signals which are combined.

  • weights – When a list of n numerical weights is supplied this is the weights in the sum. When no weights are supplied it is assumed that w_j=1.

  • nan_when_missing – If one of the signals have a missing value (NaN) the value of the sum is set to missing (NaN) if nan_when_missing=True, otherwise the missing values are skipped in the sum.

  • normalize – If normalize=True the weights w_j are normalized so that the active weights sum to 1.

Example

Using last reported EPS as a proxy for missing EPS estimate:

weighted_sum(estimate('eps'), actual('eps',alignment='rd'), weights=[100,1], normalize=True)

Estimate changes

estimate_change(estimate, next_estimate, crossover_month)

Calculates the change in analysts’ estimates, taking account of the fact that the estimates refer to changing fiscal periods.

The estimate argument should be a signal for estimates in a given period such as this year, next year or in two years. The next_estimate argument should be a signal for estimates in the following year. The crossover_month should be a number between 1 and 12 denoting the month in which the estimate rolls over to the next fiscal year. The next_estimate and crossover_month arguments are both optional.

In all months except the crossover month the result is obtained by calculating the change in the estimate signal. In the crossover month the result is obtained by calculating the change relative to the next_estimate signal instead. If the next_estimate signal is not provided, the value is set to 0 in the crossover month.

If the crossover month is not provided, the last month in the fiscal year is used as the crossover month.

Parameters
  • estimate – a signal producing estimates for a given period

  • next_estimate – a signal producing estimates for the following period

  • crossover_month – the month in which the estimate rolls over to the next fiscal year

Company group normalization

signal.group_transform(transform_type, centering_type, centering_weight_signal, tag, screen_frequency)

The group_transform operation does a cross-sectional normalization of a signal across a set of companies.

Parameters
  • transform_type – the transform to use, either 'robust', 'winsorized_robust', 'standard', winsorized_standard', 'uniform', 'minmax' or 'identity'. Defaults to 'identity'.

  • centering_type – the centering to use, either 'weighted_mean', 'mean', 'median' or 'none'. Defaults to 'none', which results in a centering given by the transform_type.

  • centering_weight_signal – the signal used as weights when specifying 'weighted_mean' for the centering_type. Must be specified when centering_type='weighted_mean', otherwise ignored.

  • tag – the ID of the tag that defines the group of companies. Can be either a static tag or a screen (dynamic tag).

  • screen_frequency – the frequency with which to evaluate the screen, if a screen is used to define the group of companies. Defaults to 'M' for monthly evaluation. Alternatives include 'W' for weekly or 'Q' for quarterly update of the screen.

Transform type

Description

robust

Applies sklearn’s RobustScaler. This transform subtracts the median, and then scales the data according to the quantile range from 25th to 75th percentile.

winsorized_robust

  • First applies sklearn’s RobustScaler, which subtracts the mean and then the data according to the quantile range from 25th to 75th percentile.

  • Then soft-clipping is performed at ±3 standard deviations, by applying the tanh function.
    The number of standard deviations can be customized by specifying stdev_lim.

standard

Applies sklearn’s StandardScaler, which subtracts the mean and then scales to unit variance.

winsorized_standard

  • Optionally, outliers can be removed at the very beginning, by setting the parameter q_remove to the fraction of the data that should be removed at both ends.
    E.g. q_remove=0.01 will remove the first and the last percentiles.
    By default, this step is not applied.

  • Then applies sklearn’s StandardScaler, which subtracts the mean and then scales to unit variance.

  • Finally soft-clipping is performed at ±3 standard deviations, by applying the tanh function.
    The number of standard deviations can be customized by specifying stdev_lim.

uniform

Transforms the data to percentiles using sklearn’s QuantileTransformer.
By default a uniform distribution is produced (evenly spread between 0 and 1).
Alternatively, a normal distribution can be obtained by specifying output_distribution='normal'.

minmax

Scales the data linearly to the range [-1, 1].

identity

No transform, which means that only centering is applied. Rarely used in practice.

Example

When doing a group transform, you must specify which group of companies should be used. This is done in the following way, using the ID of a company tag:

Market_Cap_mUSD.group_transform('winsorized_robust', tag='graph:tag:user:2a46627e-4e03-49f2-808e-d6fdadebbc61')

It can also be done using the ID of a company screen. In this case, the set of companies included in the group will be updated periodically based on the criteria of the screen. By default the screen is updated monthly, but this can changed with the screen_frequency parameter, for instance set to quarterly update:

Market_Cap_mUSD.group_transform('winsorized_robust', tag='signal:dynamicTag:1265', screen_frequency='Q')

Apply the uniform transform:

Market_Cap_mUSD.group_transform('uniform', tag='graph:tag:user:2a46627e-4e03-49f2-808e-d6fdadebbc61')