Graph traversal and filtering

On the page on Exabel Data API signals, we describe how data can be fetched using the data(...) and graph_signal(...) constructs. In this page we’ll describe a new, simpler way of fetching data from the graph.

Graph traversal

A common operation is to fetch data for all entities connected to the evaluation entity. One can then use the signal.for_type(entity_type) construct. We will then fetch data for all entities of the given type, which are connected to the evaluation entity. We follow the shortest path(s) in the combined data model of the evaluation entity namespace, the target entity type namespace and the global namespace.

signal.for_type(entity_type, ...)

Retrieve data for a specific entity type.

Parameters:: entity_type – the entity type to fetch the signal for, on the form namespace.entity_type.

Note that although it’s usually enough to specify a single entity type, it is possible to pass multiple entity types to the .for_type(..) method. We then follow a path through all the specified entity types in the data model, and the final entity type would be the entity type for which the underlying signal would be evaluated.

Doing:

data('ns.metric').for_type('ns.brand', 'ns.product')

would be equivalent to doing:

data('ns.metric').for_type('ns.product').for_type('ns.brand')

Examples

Retrieve the time series for signals/namespace.brand_metric for all the brands connected to the evaluation entities:

data('namespace.brand_metric').for_type('namespace.brand')

Sum all the time series for the products sold by the evaluation entity:

data('namespace.products_sold').for_type('namespace.product').sum()

Graph filtering

Sometimes one is not interested in all the connected entities of the specific type, but only a subset. We therefore support filtering the entities for which we fetch data down to only those entities connected to some other set of entities.

A use case could be to fetch sales data for only those products which are sold in the “Clothing” category.

signal.graph_filter(entity_type, entities)

Retrieve data for a specific entity type.

Parameters:

entity_type – the entity type to filter by, on the form namespace.entity_type.
entities – the entities to filter by, either as a single entity namespace.entity or as a list ['namespace.entity1', 'namespace.entity2'].

Examples

Retrieve all the data for products sold in the “Clothes” category:

data('ns.products_sold').for_type('ns.product').graph_filter('ns.category', 'ns.clothes')

Retrieve all the transactions in stores in a set of countries:

data('ns.transactions').for_type('ns.company_country').graph_filter('ns.country', ['ns.no', 'ns.se'])

Advanced filtering

In some cases, when performing the signal.graph_filter(..) operation, one can leave out the signal.for_type(..) traversal. We will then attempt to identify an associative entity type between the evaluation entity type and the filtered entity type.

In the example above, it would be sufficient to evaluate the following signal for a company to get the time series for the “Teacher” occupation:

data('ns.jobs').graph_filter('ns.occupation', 'ns.teacher')

The user has then asked about data for the evaluation company, and the “Teacher” occupation, and we infer that we must fetch the data from the associative ns.company_and_occupation entity type.

If the associative entity is a combination of multiple entity types, one can also apply multiple filters:

data('ns.transactions').for_type('ns.category_country_channel').graph_filter('ns.category', 'ns.shoes').graph_filter('ns.channel', 'ns.online')

Grouping

Aggregation through grouping works similar to the existing .group_by_entity(..) function except that one must provide an entity type instead of a path. We then identify and follow the shortest path(s) in the data model to find the entities that we want to group by.

signal.graph_group_by(entity_type, operation)

Group and aggregate the time series by a specific entity type.

Parameters:

entity_type – the entity type to group by, on the form namespace.entity_type.
operation – the aggregation operation, either "sum" or "mean".

Examples

Get the sales of products, summed by category:

data('ns.products_sold').for_type('ns.product').graph_group_by('ns.category', 'sum')