Selecting columns
Some signals return multiple time series and you may only be interested in some of them. This is relevant for entity-independent signals that return multiple time series, and for entity-dependent signals that return multiple time series per entity. In these cases the different time series have different names, and these can be used to select the time series you are interested in.
However, if you are working with data obtained by traversing relationships in the graph, it is
recommended to use the graph filtering functionality if possible. It
avoids evaluating the underlying signal for entities which are not desired in the result. While
if using the square brackets or filter_columns
operations, the filtering happens after the
underlying signal is evaluated.
Exact match
You can select time series by specifying the full name of the time series inside square brackets. Multiple columns are selected by providing a tuple of strings. The selection is case insensitive.
Examples
Select the beta column of the underlying signal:
my_signal['beta']
Select both the alpha and beta columns:
my_signal[('alpha', 'beta')]
Select a specific segment (see the page on graph filtering):
data('signal').for_type('ns.segment').graph_filter('ns.segment', 'ns.americas')
Substring and regex filtering
Alternatively you can select columns with the filter_columns
method, which matches columns based
on a substring or regex search.
- signal.filter_columns(pattern: str, *, case: bool = True, regex: bool = True)
- Parameters
pattern – The pattern to search for.
case – Whether the search is case sensitive.
regex – Whether the pattern is treated as a regular expression
Examples
Select all columns that contain the substring beta:
my_signal.filter_columns('beta')
Select all columns that contain the substring beta regardless of case (e.g. BETA or BeTa):
my_signal.filter_columns('beta', case=False)
Select all columns that contain either alpha or beta:
my_signal.filter_columns('alpha|beta')
Select all columns that contain the character | (since it has a special meaning in regular expressions, regex search must be disabled):
my_signal.filter_columns('|', regex=False)
Top and bottom n columns
If there are many columns, it can be useful to select the top or bottom n
columns based on either
the last value or the aggregation of the values across the evaluation period.
- signal.top_n(n: int, func: str = 'last', *, show_other=False)
Selects the top
n
columns for each evaluation entity.The
func
argument specifies how the ranking value is calculated from each column and can be any of:last
max
min
mean
median
When using
last
, we first find the last date where any column has data, and then pick the value from each column on that date. This means that columns which do not have data on that date will be left out.Note that which columns are chosen, depends on the evaluation time period. Signal transformations may extend or modify the evaluation period, so using this signal as an underlying signal in other transformations may lead to unexpected results. Also, different parts of the app may evaluate signals with different time periods.
- Parameters
n – The number of columns to return.
func – The function to use when selecting the ranking value from each column.
show_other – Whether to include an
'Other'
timeseries containing the sum of all the excluded columns when there are more thann+1
columns. IfTrue
and there are exactlyn+1
columns, we include all the columns rather than aggregating a single column as the'Other'
column.
- signal.bottom_n(n: int, func: str = 'last', *, show_other=False)
Select the bottom
n
columns for each evaluation entity.See the
signal.top_n(..)
method for more information.- Parameters
n – The number of columns to return.
func – The function to use when selecting the ranking value from each column.
show_other – Whether to include an
'Other'
timeseries containing the sum of all the excluded columns when there are more thann+1
columns. IfTrue
and there are exactlyn+1
columns, we include all the columns rather than aggregating a single column as the'Other'
column.
Examples
Show the sales of the top 3 brands, along with the rest aggregated into an 'Other'
time series:
data('ns.sales').for_type('ns.brand').top_n(3, 'last', show_other=True)