Entity relationships

Relationships link entities together. One can for example have a HAS_BRAND relationship linking companies to brands. Some relationships exist in the global namespace in the Exabel platform, but it is also possible to create relationships through the Data API.

Following relationships

As explained in Signals from the Exabel Data API, the graph_signal can be used to traverse relationships and retrieve time series uploaded using the Data API.

A signal is always evaluated for some root entity, for example a company. The graph signal accepts a path argument, which is a list of relationships to be traversed, starting at the root entity.

If the path argument is empty (or left out altogether), we have no relationships to traverse, and the signal produces time series that are associated with the root entity.

If the path argument is provided, it has to be a list of relationships, which will be traversed in order, starting from the root entity. Each relationship must be provided either as a string with the name of the relationship type or as a dict specifying the name of the relationship and additional restrictions. See Signals from the Exabel Data API for detailed explanation of the different options.

As an example, suppose there is a relationship of the type relationshipTypes/namespace.IS_SUPPLIER_OF which connects company entities. If you want to evaluate a Data API signal for all of a company’s suppliers, you would use the signal:

graph_signal('namespace.signal', [{'relationship_type': 'namespace.IS_SUPPLIER_OF', 'direction': 'IN'}])

If you instead want to evaluate it for the companies which the root company is a supplier of, you would use 'OUT' instead of 'IN'. If you do not care about the direction of the relationship (i.e. you want find all companies the root company has any sort of supplier relationship with), you can set the direction to None or leave it out.

Note that instead of strings representing Data API signals, you can also provide DSL expressions. To obtain the close price of the suppliers of the root company, you could use:

graph_signal(close_price, ['namespace.IS_SUPPLIER_OF'])

Specifying acceptable entity types in a path relationship may be useful if you have a single relationship type between entities with different types. As an example, suppose there are entities of type entityTypes/geo_segment and entityTypes/business_segment and relationships of type relationshipTypes/namespace.HAS_SEGMENT that connects companies to either geographical or business segments. If you want to evaluate a signal for only the geographical segments of a company, you would use the signal:

graph_signal('namespace.signal', [{'relationship_type': 'namespace.HAS_SEGMENT', 'target_types': ['geo_segment']}])

You can also restrict entities by tag membership. If you only want to get segments that are in the tag tags/user:123, you would use the signal:

graph_signal('namespace.signal', [{'relationship_type': 'namespace.HAS_SEGMENT', 'tag': 'tags/user:123'}])

You can also traverse more than one relationship. If you want the suppliers of a company’s competitors, the expression might be:

graph_signal('namespace.signal', ['namespace.IS_COMPETITOR_OF', {'relationship_type': 'namespace.IS_SUPPLIER_OF', 'direction': 'IN'}])

(Note that these signals and relationships are hypothetical examples. The relationships used do not exist out of the box in the graph, but if you have access to the Exabel Data API you can create such relationship yourself, or you may use relationships a data provider you subscribe to has created.)

Aggregating over entities

There are two ways of aggregating time series across entities. The first way is to aggregate up to the root entities. Assume some companies are connected to some entities representing segments, and those segments have time series associated with them. You can then sum all the segment time series for each company using:

graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).sum()

Aggregation functions mean, median, min and max are also supported:

graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).mean()
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).median()
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).min()
graph_signal('ns.segment_sales', ['namespace.HAS_SEGMENT']).max()

The second way of aggregating is to follow additional relationships from the entities with the time series, and then aggregating across those entities at the end of the path. As an example, say you have companies with brands, and time series representing the sales numbers of each brand. Lets say the brands are again connected to a more coarse-grained category entity, and you want to know the aggregated sales numbers for each category. You would then use:

graph_signal('ns.sales_numbers', ['namespace.HAS_BRAND'], False)
    .group_by_entity(['namespace.HAS_CATEGORY'], 'sum')

The group_by_entity then means that the namespace.HAS_CATEGORY should be followed from each brand entity, and we should sum up the time series for each category. A mean operation is also supported.

Note that when the graph_signal is used as the base of the .group_by_entity(…) operation, the leaf_entity_as_label argument must be set to False.

Usage

signal.group_by_entity(path, operation, leaf_entity_as_label=True)

Retrieve one or more time series associated with nodes in the graph.

Parameters:

signal – a signal which creates time series for entities other than the root entities, typically a graph_signal
path – a list of relationships to traverse
operation (str) – the aggregation operation, either sum or mean
leaf_entity_as_label (bool) – whether to drop leaf entities from the result and only keep the leaf entity display name as the signal label.