Selecting columns

Some signals return multiple time series and you may only be interested in some of them. This is relevant for entity-independent signals that return multiple time series, and for entity-dependent signals that return multiple time series per entity. In these cases the different time series have different names, and these can be used to select the time series you are interested in.

However, if you are working with data obtained by traversing relationships in the graph, it is recommended to use the graph filtering functionality if possible. It avoids evaluating the underlying signal for entities which are not desired in the result. While if using the square brackets or filter_columns operations, the filtering happens after the underlying signal is evaluated.

Exact match

You can select time series by specifying the full name of the time series inside square brackets. Multiple columns are selected by providing a tuple of strings. The selection is case insensitive.

Examples

Select the beta column of the underlying signal:

my_signal['beta']

Select both the alpha and beta columns:

my_signal[('alpha', 'beta')]

Select a specific segment (see the page on graph filtering):

data('signal').for_type('ns.segment').graph_filter('ns.segment', 'ns.americas')

Substring and regex filtering

Alternatively you can select columns with the filter_columns method, which matches columns based on a substring or regex search.

signal.filter_columns(pattern: str, *, case: bool = True, regex: bool = True)

Parameters:

pattern – The pattern to search for.
case – Whether the search is case sensitive.
regex – Whether the pattern is treated as a regular expression

Examples

Select all columns that contain the substring beta:

my_signal.filter_columns('beta')

Select all columns that contain the substring beta regardless of case (e.g. BETA or BeTa):

my_signal.filter_columns('beta', case=False)

Select all columns that contain either alpha or beta:

my_signal.filter_columns('alpha|beta')

Select all columns that contain the character | (since it has a special meaning in regular expressions, regex search must be disabled):

my_signal.filter_columns('|', regex=False)

Top and bottom n columns

If there are many columns, it can be useful to select the top or bottom n columns based on either the last value or the aggregation of the values across the evaluation period.

signal.top_n(n: int, func: str = 'last', *, show_other=False)

Selects the top n columns for each evaluation entity.

The func argument specifies how the ranking value is calculated from each column and can be any of:

last

max

min

mean

median

When using last, we first find the last date where any column has data, and then pick the value from each column on that date. This means that columns which do not have data on that date will be left out.

Note that which columns are chosen, depends on the evaluation time period. Signal transformations may extend or modify the evaluation period, so using this signal as an underlying signal in other transformations may lead to unexpected results. Also, different parts of the app may evaluate signals with different time periods.

Parameters:

n – The number of columns to return.
func – The function to use when selecting the ranking value from each column.
show_other – Whether to include an 'Other' timeseries containing the sum of all the excluded columns when there are more than n+1 columns. If True and there are exactly n+1 columns, we include all the columns rather than aggregating a single column as the 'Other' column.

signal.bottom_n(n: int, func: str = 'last', *, show_other=False)

Select the bottom n columns for each evaluation entity.

See the signal.top_n(..) method for more information.

Parameters:

n – The number of columns to return.
func – The function to use when selecting the ranking value from each column.
show_other – Whether to include an 'Other' timeseries containing the sum of all the excluded columns when there are more than n+1 columns. If True and there are exactly n+1 columns, we include all the columns rather than aggregating a single column as the 'Other' column.

Examples

Show the sales of the top 3 brands, along with the rest aggregated into an 'Other' time series:

data('ns.sales').for_type('ns.brand').top_n(3, 'last', show_other=True)