Extract data

Functions to extract particular data from a signal.

Select an interval of a signal

property signal.loc

Retrieve a single data point or a range of data points in an interval based on timestamps. The input dates are provided in square brackets.

If the input is a single date, e.g. signal.loc['2023-12-31'], the result is a single scalar value (i.e. without the timestamp). This can, for example, be used to normalize a time series to be 1 at a specific point in time, by writing:

signal / signal.loc['2000-01-01']

If the input is an interval, e.g. signal.loc['2024-01-01':'2024-12-31'], the result is a time series consisting of the values in the given interval. Both endpoints are inclusive.

property signal.iloc

Retrieve a single data point or a range of data points based on integer indexes.

The input indexes are provided in square brackets.

If the input is a single integer, e.g. signal.iloc[5], the result is a single scalar value (i.e. without the timestamp).

If the input is an interval, e.g. signal.iloc[10:20], the result is a time series consisting of the values with the given indexes. In this case, the first endpoint is inclusive and last one is exclusive.

The index is 0-based, so iloc[0] refers to the first data point in the time series, iloc[1] is the second and so on. If a negative integer is provided, the data points are counted from the end, with iloc[-1] referring to the last data point, iloc[-2] to the penultimate and so on.

It is also possible to specify a step as a third argument in order to return every nth data point, e.g. signal.iloc[10:20:4].

Examples:

Select the first data point as a scalar value:

signal.iloc[0]

Select the last data point as a scalar value:

signal.iloc[-1]

Select all but the first and last data points:

signal.iloc[1:-1]

Select every other data point, including the first one:

signal.iloc[::2]

signal.at_time(time: str)

Return a time series which consists only of the single data point at the requested time.

Parameters:: time – A string specifying a timestamp. This can either be a specific date such as 2024-05-31, a fiscal period. See Time arguments.

signal.at(time: str)

Return the value at the given time as a scalar value.

Since this transformation returns a scalar value, it cannot be plotted in Plotter, but it can be included in calculations involving other time series. For example:

signal / signal.at('2024-01-01')

scales the time series down so that it has the value 1.0 at 1 Jan 2024.

Parameters:: time – A string specifying a timestamp. This can either be a specific date such as 2024-05-31, a fiscal period. See Time arguments.

Select time series from multi-time-series signal

Some signals return multiple time series and you may only be interested in some of them. This is relevant for entity-independent signals that return multiple time series, and for entity-dependent signals that return multiple time series per entity. In these cases the different time series have different names, and these can be used to select the time series you are interested in.

Exact match

You can select time series by specifying the full name of the time series inside square brackets. Multiple columns are selected by providing a tuple of strings. The selection is case insensitive.

Examples:

Select the beta column of the underlying signal:

my_signal['beta']

Select both the alpha and beta columns:

my_signal[('alpha', 'beta')]

Substring and regex filtering

Alternatively you can select columns with the filter_columns method, which matches columns based on a substring or regex search.

signal.filter_columns(pattern: str, *, case: bool = True, regex: bool = True)

Parameters:

pattern – The pattern to search for.
case – Whether the search is case sensitive.
regex – Whether the pattern is treated as a regular expression

Examples:

Select all columns that contain the substring beta:

my_signal.filter_columns('beta')

Select all columns that contain the substring beta regardless of case (e.g. BETA or BeTa):

my_signal.filter_columns('beta', case=False)

Select all columns that contain either alpha or beta:

my_signal.filter_columns('alpha|beta')

Select all columns that contain the character | (since it has a special meaning in regular expressions, regex search must be disabled):

my_signal.filter_columns('|', regex=False)

Note

If you are working with data obtained by traversing relationships in the entity graph, you should use the graph filtering functionality if possible, rather than the bracket syntax. This avoids evaluating the underlying signal for entities you are not interested in. If you use square brackets or the filter_columns operation, the filtering happens after the underlying signal is evaluated.

Top and bottom n columns

If there are many columns, it can be useful to select the top or bottom n columns based on either the last value or the aggregation of the values across the evaluation period.

signal.top_n(n: int, func: str = 'last', *, show_other=False)

Selects the top n columns for each evaluation entity.

The func argument specifies how the ranking value is calculated from each column and can be any of:

last

max

min

mean

median

When using last, we first find the last date where any column has data, and then pick the value from each column on that date. This means that columns which do not have data on that date will be left out.

Note that which columns are chosen, depends on the evaluation time period. Signal transformations may extend or modify the evaluation period, so using this signal as an underlying signal in other transformations may lead to unexpected results. Also, different parts of the app may evaluate signals with different time periods.

Parameters:

n – The number of columns to return.
func – The function to use when selecting the ranking value from each column.
show_other – Whether to include an 'Other' timeseries containing the sum of all the excluded columns when there are more than n+1 columns. If True and there are exactly n+1 columns, we include all the columns rather than aggregating a single column as the 'Other' column.

signal.bottom_n(n: int, func: str = 'last', *, show_other=False)

Select the bottom n columns for each evaluation entity.

See the signal.top_n(..) method for more information.

Parameters:

n – The number of columns to return.
func – The function to use when selecting the ranking value from each column.
show_other – Whether to include an 'Other' timeseries containing the sum of all the excluded columns when there are more than n+1 columns. If True and there are exactly n+1 columns, we include all the columns rather than aggregating a single column as the 'Other' column.

Example:

Show the sales of the top 3 brands, along with the rest aggregated into an 'Other' time series:

data('ns.sales').for_type('ns.brand').top_n(3, 'last', show_other=True)