Entity relationships
Relationships link entities together. One can for example have a HAS_BRAND
relationship linking
companies to brands. Some relationships exist in the global namespace in the Exabel platform, but it
is also possible to create relationships through the Data API.
Following relationships
As explained in Signals from the Exabel Data API, the graph_signal
can be used to traverse
relationships and retrieve time series uploaded using the Data API.
A signal is always evaluated for some root entity, for example a company. The graph signal accepts a path argument, which is a list of relationships to be traversed, starting at the root entity.
If the path argument is empty (or left out altogether), we have no relationships to traverse, and the signal produces time series that are associated with the root entity.
If the path argument is provided, it has to be a list of relationships, which will be traversed in order, starting from the root entity. Each relationship must be provided either as a string with the name of the relationship type or as a dict specifying the name of the relationship and additional restrictions. See Signals from the Exabel Data API for detailed explanation of the different options.
As an example, suppose there is a relationship of the type relationshipTypes/namespace.IS_SUPPLIER_OF
which connects company entities. If you want to evaluate a Data API signal for all of a company’s suppliers,
you would use the signal:
graph_signal('namespace.signal', [{'relationship_type': 'namespace.IS_SUPPLIER_OF', 'direction': 'IN'}])
If you instead want to evaluate it for the companies which the root company is a supplier of, you would use
'OUT'
instead of 'IN'
. If you do not care about the direction of the relationship (i.e. you want
find all companies the root company has any sort of supplier relationship with), you can set the direction
to None
or leave it out.
Note that instead of strings representing Data API signals, you can also provide DSL expressions. To obtain the close price of the suppliers of the root company, you could use:
graph_signal(close_price, ['namespace.IS_SUPPLIER_OF'])
Specifying acceptable entity types in a path relationship may be useful if you have a single relationship type between
entities with different types. As an example, suppose there are entities of type entityTypes/geo_segment
and
entityTypes/business_segment
and relationships of type relationshipTypes/namespace.HAS_SEGMENT
that connects
companies to either geographical or business segments. If you want to evaluate a signal for only the geographical
segments of a company, you would use the signal:
graph_signal('namespace.signal', [{'relationship_type': 'namespace.HAS_SEGMENT', 'target_types': ['geo_segment']}])
You can also restrict entities by tag membership. If you only want to get segments that are in
the tag tags/user:123
, you would use the signal:
graph_signal('namespace.signal', [{'relationship_type': 'namespace.HAS_SEGMENT', 'tag': 'tags/user:123'}])
You can also traverse more than one relationship. If you want the suppliers of a company’s competitors, the expression might be:
graph_signal('namespace.signal', ['namespace.IS_COMPETITOR_OF', {'relationship_type': 'namespace.IS_SUPPLIER_OF', 'direction': 'IN'}])
(Note that these signals and relationships are hypothetical examples. The relationships used do not exist out of the box in the graph, but if you have access to the Exabel Data API you can create such relationship yourself, or you may use relationships a data provider you subscribe to has created.)
Aggregating over entities
There are two ways of aggregating time series across entities. The first way is to aggregate up to the root entities. Assume some companies are connected to some entities representing segments, and those segments have time series associated with them. You can then sum all the segment time series for each company using:
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).sum()
Aggregation functions mean, median, min and max are also supported:
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).mean()
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).median()
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).min()
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).max()
The second way of aggregating is to follow additional relationships from the entities with the time series, and then aggregating across those entities at the end of the path. As an example, say you have companies with brands, and time series representing the sales numbers of each brand. Lets say the brands are again connected to a more coarse-grained category entity, and you want to know the aggregated sales numbers for each category. You would then use:
graph_signal('ns.sales_numbers', ['namespace.HAS_BRAND'], False)
.group_by_entity(['namespace.HAS_CATEGORY'], 'sum')
The group_by_entity
then means that the namespace.HAS_CATEGORY
should be followed from
each brand entity, and we should sum up the time series for each category. A mean
operation
is also supported.
Note that when the graph_signal
is used as the base of the .group_by_entity(…)
operation,
the leaf_entity_as_label
argument must be set to False
.
Usage
- signal.group_by_entity(path, operation, leaf_entity_as_label=True)
Retrieve one or more time series associated with nodes in the graph.
- Parameters
signal – a signal which creates time series for entities other than the root entities, typically a
graph_signal
path – a list of relationships to traverse
operation (str) – the aggregation operation, either
sum
ormean
leaf_entity_as_label (bool) – whether to drop leaf entities from the result and only keep the leaf entity display name as the signal label.