Resampling and aggregations

Signals which modify the timestamps and possibly the values of the time series. We use the following terminology:

reindex

To change the index of the time series without changing the values.

resample

To change the index of the time series and apply some rule to calculate new values based on the original time series.

upsample

To change the index so that it has higher resolution, for example, modifying a monthly time series to a daily time series.

downsample

To change the index so that it has lower resolution, for example modifying a daily time series to a monthly time series.

signal.resample(freq, method, *, upsample_daily=False, min_points=1, allow_partial_start=True, allow_partial_end=True, partial, ffill=0)

Resample the signal to the given frequency. The interval used for for resampling is defined by the freq parameter, and the method defines how to aggregate the signal.

Intermediate upsampling

If upsample_daily is set to True, the resampling is performed by first upsampling the underlying signal to daily frequency, and then downsampling it to the desired frequency.

If the method is sum or mean_times_days, the upsample operation divides the value in each interval by the number of days in the interval, so that the value is “spread over” the interval. The consequence of this is that when the signal is downsampled again, the contribution of each interval in the old time series is proportional to the overlap between the new and the old intervals.

If the frequency of the underlying time series cannot be detected, the time series is assumed to already have daily resolution, and it is not altered before downsampling.

Exabel recommends setting upsample_daily=True if the frequency of the original data does not evenly divide the frequency being converted to. For example, if resampling a weekly time series weekly to months, there will be weeks crossing the boundaries of the months. Similarly, if resampling from a weeks/months to a fiscal frequency (like FQ), it is advisable to first upsample to daily.

Parameters:
  • freq – The frequency to aggregate on, for example M or FQ.

  • method – Method to use for aggregation. Standard Pandas methods like 'mean', 'median', 'sum', 'bfill' (backfill) are supported, as well as the special 'mean_times_days' method. See table below.

  • upsample_daily – Whether to upsample the data to daily resolution before downsampling.

  • min_points – Minimum number of data points required within a time period for it to be included. Note that if upsample_daily=True, the data points are counted after the upsampling, so the requirement applies to the number of daily data points.

  • allow_partial_start – Whether to include the first period if it is only partially covered by the time series.

  • allow_partial_end – Whether to include the last period if it is only partially covered by the time series.

  • partial – Whether to include the first and the last period if they are only partially covered by the time series. (Included as a shortcut to set both allow_partial_start and allow_partial_end.)

  • ffill – The number of periods to forward fill the result. The forward fill is performed after aggregation. This can be used to fill gaps where there were not any or not enough data points to calculate the aggregate.

Supported methods:

method

Description

mean

The mean of the values.

median

The median of the values.

sum

The sum of the values.

mean_times_days

The mean of the values multiplied by the number of days within the period.
This is useful for handling missing data points in daily time series,
which is a common situation with alternative data time series.
The result can be seen as representing the sum over the period,
where the missing values have been imputed with the mean value.

count

The number of values.

std

The standard deviation of the values.

var

The variance of the values.

sem

The unbiased standard error of the mean.

first

The first value.

last

The last value.

min

The minimum value.

max

The maximum value.

bfill

Backfill the values. This is typically done to upsample a time series
where the dates are aligned to the end date of each period,
like 'W' or 'M' or 'Q'.

ffill

Forward fill the values. This is typically done to upsample a time series
where the dates are aligned to the start date of each period,
like 'MS' or 'QS'.

nearest

Take the value from the nearest date.

interpolate

Interpolate the values between the two nearest dates.

The statistical measures 'std', 'var' and 'sem' normalize by N-1 by default (delta degrees of freedom is 1).

The parameters min_points, allow_partial_start and allow_partial_end only apply to aggregation / downsampling. It is an error to specify any of these parameters with the upsampling methods 'bfill', 'ffill', 'nearest' or 'interpolate'.

Examples:

To calculate a monthly average from daily data:

signal.resample('M', 'mean')

To calculate a monthly sum from daily data, and not include the latest partial month:

signal.resample('M', 'sum', partial=False)

To sum up weekly data to fiscal quarters, with intermediate upsampling to daily (to handle weeks that fall into 2 quarters), and not include the latest partial quarter:

signal.resample('FQ', 'sum', upsample_daily=True, partial=False)

To sum up daily data to fiscal quarters, where missing data points are imputed with the mean, and we require at least 70 data points in each quarter, use:

signal.resample('FQ', 'mean_times_days', min_points=70)

To upsample quarterly data ('Q') to monthly, where we are certain there are no missing data points, use:

signal.resample('M', 'bfill')
signal.upsample(from_freq='auto', to_freq='D', divide=False)

Upsample the given signal to a higher resolution (daily by default). This is typically necessary as the first step when resampling between two frequencies which do not evenly divide each other.

For example, when resampling from weekly to monthly resolution, there will be weeks which overlap with two months. In such a situation, naïvely doing a sum over all the values with dates within the month, would mean that the value for week 52 of 2022, which runs from Dec 26, 2022 to Jan 1, 2023, would be included in the sum for January 2023, because the week is represented by the last day (Jan 1). The better approach would be to divide the value for that week over the seven days of the week, and include 6/7 of the value in the sum for December 2022, and only 1/7 in the sum for January 2023.

The same situation arises when resampling from either weekly or monthly time series to fiscal quarters.

The upsample logic needs to know which frequency the original data represents. The default setting of from_freq='auto' will attempt to detect the frequency of the original data, but if the frequency cannot be determined from the data, an error will be returned. If the frequency is known, it is therefore safer to specify an explicit frequency to upsample from. For instance, if a time series consists of a single data point for the date 31 Dec 2022, it is impossible to determine if that data point represents the quarter Q4 2022, the month of December 2022 or the week ending 31 Dec 2022.

The 'auto' setting only works for regular calendars such as weekly or monthly data, not for fiscal calendars.

Parameters:
  • from_freq – Either auto or a frequency, such as M or FQ. as well as the fiscal calendars like 'FQ' or 'FQ/FS' or 'FY'.

  • to_freq – The frequency to upsample to.

  • divide – Whether to divide the value by the number of days in the period. If set to False, the original value will be used for all dates within the corresponding period. If set to True, the original value will be divided by the number of days in the period (so that e.g. signal.upsample('M').resample('M', 'sum') brings back the original time series). Generally, if the data represents a sum of some sort, then divide should be set to True, whereas if it represents an average, it should be set to False.

Examples:

To upsample weekly data to daily, where each daily value is the value of the corresponding week:

signal.upsample('W')

To upsample monthly sums to daily numbers, where each daily value is the monthly value divided by the number of days in the month:

signal.upsample('M', divide=True)

To upsample a quarterly time series to monthly, where each monthly value is the value of the corresponding quarter:

signal.upsample('Q', 'M')

To resample monthly sums to fiscal quarters, where we also handle partial data with 'mean_times_days', use:

signal.upsample('M', divide=True).resample('FQ', 'mean_times_days')

To resample monthly averages to fiscal quarter averages, use:

signal.upsample('M').resample('FQ', 'mean')
signal.calendarize(freq: str, method: str = 'blended', from_freq: str = None)

Calendarize financial data to the standard calendar.

This is typically used to compare KPIs across companies that use different fiscal calendars. The dates used in the calendarized output are as follows:

freq

Dates

Q

31 March, 30 June, 30 September and 31 December

S

30 June and 31 December

Y

31 December

Three different algorithms for calendarization are supported:

method

Description

blended

Default method. Recommended for “flow” KPIs such as revenue.
Calculated as a blend based on the number of days of overlap between the original
periods and the calendar periods. For example, if a company ends its financial year
in November, the calendarized value for 2022 is 334/365 of the value for the fiscal
year 2022 plus 31/365 of the value for the fiscal year 2023.

fiscal

Recommended for “stock” KPIs such as balance sheet items, where it is desired
to take the value from the closest fiscal period.
This uses the company’s fiscal calendar to ensure a 1-1 match between fiscal
periods and calendar periods, such that each fiscal period is aligned to at most
one calendar period.
However, this works only on signals that have the same frequency as the company’s
fiscal calendar (quarterly/semi-annual). For signals with other frequencies
(daily/weekly etc), use nearest.

nearest

Uses Pandas’ resampling algorithm to use the nearest value for each calendarized
data point. This is similar to nearest, but has a drawback where companies with
irregular calendars may see two quarters mapped to the same calendar period.
This is equivalent to signal.resample(freq, 'nearest').

Parameters:
  • freq – The frequency to calendarize to: 'Q' (quarterly), 'S' (semi-annual) or 'Y' (annual). The aliases 'H' (for 'S') and 'A' (for 'Y') are also supported.

  • method – How the calendarization is perfomed. The options are 'blended', 'nearest' and 'fiscal'.

  • from_freq – Optionally, a hint about the frequency of the original time series. This is only required if the frequency cannot be detected from the time series itself.

signal.reindex_like(index_signal, fill_method='ffill')

Returns the signal re-indexed so that it has the same time-index as index_signal given the method fill_method. This is a wrapper around the pandas reindex method.

Parameters:
  • index_signal – A supplied signal whose time-index is used for the resampling of signal.

  • fill_method – The operation used to align series to index_signal before sampling. Valid values are ‘None’, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’.

Example:

The price return on earnings release dates:

close_price.relative_change(days=1).reindex_like(fs_actual('sales',alignment='rd'))
signal.filled_daily(fetch_prior_data=120, fetch_prior_data_from=None, stop_at_last_valid_value=False, fetch_later_data=7, limit=None, allow_forward_fill_for_current_dates=False, *, fill_value=None)

Transforms a signal by changing the frequency to daily and forward filling missing values.

Parameters:
  • fetch_prior_data – the number of days of prior data to retrieve in order to forward fill; the default amount is sufficient for quarterly data (with quarters up to seventeen weeks), as long as there are no missing data

  • fetch_prior_data_from – the start date to use for forward filling. If set, it overrides the fetch_prior_data argument.

  • stop_at_last_valid_value – if True, values will not be forward filled after the last available non-null value

  • fetch_later_data – the number of days after the eval period to retrieve data for to determine the last available non-null value. Only used if stop_at_last_valid_value is True.

  • limit – the maximum number of consecutive null values that are filled. If not set, all null values are filled (assuming there is a non-null value before it). If set, must be set to 1 or higher.

  • allow_forward_fill_for_current_dates – if True, and the difference in days between the current date and the date with the last non-NaN value is less than the given limit, values will be forward filled even if stop_at_last_valid_value is True. When evaluating with a version, the current date is assumed to be the version.

  • fill_value – By default, the previous value is forward filled. If fill_value is specified, then this value will be used instead. The dates filled are exactly the same, and all the other parameters such as limit and allow_forward_fill_for_current_dates apply as usual.

Examples:

Forward fill data in a monthly signal up to 35 days to make it a daily signal:

my_monthly_data.filled_daily(limit=35)

Fill in missing values with 0, up to 6 days forwards, but not past the last value in the series:

my_daily_signal.filled_daily(limit=6, fill_value=0, stop_at_last_valid_value=True)
signal.align_to_dates(index_signal, max_forward=None, max_backward=None, pre_extend=None, post_extend=None)

Returns the signal with the values aligned to the dates of the index_signal.

For each value in the signal we find the date in the index_signal which is closest in time and assigns that date to it, provided that it satisfies the movement constraints (given by max_forward and max_backward).

If there are two dates that are equally far away, the value is moved forwards.

If there is no date available within the movement constraints for some value, the value is discarded.

If there are multiple values in the signal that have the same the date as its closest one, the value that is closest in time is aligned to the date; the other values are discarded. If there is a tie between two dates, the one that would be move forward is used, while the other one is discarded.

Parameters:
  • index_signal – A signal whose time-index is used for the resampling of signal.

  • max_forward – Maximum number of days a data point can be moved forwards. Default is None, which means no limit.

  • max_backward – Maximum number of days a data point can be moved backwards. Default is None, which means no limit.

  • pre_extend – Offset to pre-extend the signal evaluation period with. By default, this offset is set equal to the max_forward constraint if max_forward is not None and 1 year otherwise.

  • post_extend – Offset to post-extend the signal evaluation period with. By default, this offset is set equal to the max_backward constraint if max_backward is not None and 1 year otherwise.

Examples:

Align the signal my_quarterly_signal to fundamental sales, allowing data points to move forwards 10 days and backwards 5 days:

my_quarterly_signal.align_to_dates(fundamental('sales'), max_forward=10, max_backward=5)

Align the signal my_quarterly_signal to fundamental sales, allowing data points to move unrestricted:

my_quarterly_signal.align_to_dates(fundamental('sales'),\
    pre_extend=pandas.DateOffset(months=6))