Hierarchical Representation

scikit-hts’s core data structure is the HierarchyTree. At its core, it is simply an N-Ary Tree, a recursive data structure where each node is specified by:

  • A human readable key, such as ‘germany’, ‘total’, ‘berlin’, or ‘881f15ad61fffff’
  • Keys should be unique and delimited by underscores. Therfore, using the example below there should not be duplicate values across level 1, 2 or 3. For example, a should not also a value in level 2.
  • An item, represented by a pandas.Series (or pandas.DataFrame for multivariate inputs), which contains the actual data about that node

Hierarchical Structure

For instance, a tree with nodes and levels as follows:

  • Level 1: a, b, c
  • Level 2: x, y
  • Level 3: 1, 2
nodes = {'total': ['a', 'b', 'c'],
         'a': ['a_x', 'a_y'],
         'b': ['b_x', 'b_y'],
         'c': ['c_x', 'c_y'],
         'a_x': ['a_x_1', 'a_x_2'],
         'a_y': ['a_y_1', 'a_y_2'],
         'b_x': ['b_x_1', 'b_x_2'],
         'b_y': ['b_y_1', 'b_y_2'],
         'c_x': ['c_x_1', 'c_x_2'],
         'c_y': ['c_y_1', 'c_y_2']
         }

Represents the following structure:

Level                                           Node Key                                             # of nodes

  1                                                t                                                      1

  2                    a                           b                             c                        3

  3                a_x   a_y                    b_x   b_y                    c_x   c_y                    6

  4        a_x_1 a_x_2   a_y_1 a_y_2    b_x_1 b_x_2   b_y_1 b_y_2    c_x_1 c_x_2   c_y_1 c_y_2            12

To get a sense of how the hierarchy trees are implemented, some sample data can be loaded:

>>> from datetime import datetime
>>> from hts.hierarchy import HierarchyTree
>>> from hts.utilities.load_data import load_hierarchical_sine_data

>>> s, e = datetime(2019, 1, 15), datetime(2019, 10, 15)
>>> hsd = load_hierarchical_sine_data(start=s, end=e, n=10000)
>>> print(hsd.head())
                                total         a         b         c       a_x       a_y       b_x       b_y       c_x  ...     a_y_2     b_x_1     b_x_2     b_y_1     b_y_2     c_x_1     c_x_2     c_y_1     c_y_2
2019-01-15 01:11:09.255573   2.695133  0.150805  0.031629  2.512698  0.037016  0.113789  0.028399  0.003231  0.268406  ...  0.080803  0.013131  0.015268  0.000952  0.002279  0.175671  0.092734  0.282259  1.962034
2019-01-15 01:18:30.753096  -3.274595 -0.199276 -1.624369 -1.450950 -0.117717 -0.081559 -0.300076 -1.324294 -1.340172  ... -0.077289 -0.177000 -0.123075 -0.178258 -1.146035 -0.266198 -1.073975 -0.083517 -0.027260
2019-01-15 01:57:48.607109  -1.898038 -0.226974 -0.662317 -1.008747 -0.221508 -0.005466 -0.587826 -0.074492 -0.929464  ... -0.003297 -0.218128 -0.369698 -0.021156 -0.053335 -0.225994 -0.703470 -0.077021 -0.002262
2019-01-15 02:06:57.994575  13.904908  6.025506  5.414178  2.465225  5.012228  1.013278  4.189432  1.224746  1.546544  ...  0.467630  1.297829  2.891602  0.671085  0.553661  0.066278  1.480266  0.769954  0.148728
2019-01-15 02:14:22.367818  11.028013  3.537919  6.504104  0.985990  2.935614  0.602305  4.503611  2.000493  0.179114  ...  0.091993  4.350293  0.153318  1.349629  0.650864  0.066946  0.112168  0.473987  0.332889


>>> hier = {'total': ['a', 'b', 'c'],
            'a': ['a_x', 'a_y'],
            'b': ['b_x', 'b_y'],
            'c': ['c_x', 'c_y'],
            'a_x': ['a_x_1', 'a_x_2'],
            'a_y': ['a_y_1', 'a_y_2'],
            'b_x': ['b_x_1', 'b_x_2'],
            'b_y': ['b_y_1', 'b_y_2'],
            'c_x': ['c_x_1', 'c_x_2'],
            'c_y': ['c_y_1', 'c_y_2']
        }
>>> tree = HierarchyTree.from_nodes(hier, hsd, root='total')
>>> print(tree)
- total
   |- a
   |  |- a_x
   |  |  |- a_x_1
   |  |  - a_x_2
   |  - a_y
   |     |- a_y_1
   |     - a_y_2
   |- b
   |  |- b_x
   |  |  |- b_x_1
   |  |  - b_x_2
   |  - b_y
   |    |- b_y_1
   |    - b_y_2
   - c
     |- c_x
     |  |- c_x_1
     |  - c_x_2
     - c_y
        |- c_y_1
        - c_y_2

Grouped Structure

In order to create a grouped structure, instead of a strictly hierarchichal structure you must specify all levels within the grouping strucure dictionary and dataframe as seen below.

Levels in example:

  • Level 1: A, B
  • Level 2: X, Y
import hts
import pandas as pd

>>> hierarchy = {
    "total": ["A", "B", "X", "Y"],
    "A": ["A_X", "A_Y"],
    "B": ["B_X", "B_Y"],
}

>>> grouped_df = pd.DataFrame(
    data={
        "total": [],
        "A": [],
        "B": [],
        "X": [],
        "Y": [],
        "A_X": [],
        "A_Y": [],
        "B_X": [],
        "B_Y": [],
    }
)

>>> tree = hts.hierarchy.HierarchyTree.from_nodes(hierarchy, grouped_df)
>>> sum_mat, sum_mat_labels = hts.functions.to_sum_mat(tree)
>>> print(sum_mat)  # Commented labels will not appear in the printout, they are here as an example.
[[1. 1. 1. 1.]  # totals
 [0. 1. 0. 1.]  # Y
 [1. 0. 1. 0.]  # X
 [0. 0. 1. 1.]  # B
 [1. 1. 0. 0.]  # A
 [1. 0. 0. 0.]  # A_X
 [0. 1. 0. 0.]  # A_Y
 [0. 0. 1. 0.]  # B_X
 [0. 0. 0. 1.]] # B_Y

 >>> print(sum_mat_labels)  # Use this if you need to match summing matrix rows with labels.
 ['total', 'Y', 'X', 'B', 'A', 'A_X', 'A_Y', 'B_X', 'B_Y']
class hts.hierarchy.HierarchyTree(key: str = None, item: Union[pandas.core.series.Series, pandas.core.frame.DataFrame] = None, exogenous: List[str] = None, children: List[hts._t.NAryTreeT] = None, parent: hts._t.NAryTreeT = None)[source]

A generic N-ary tree implementations, that uses a list to store it’s children.

classmethod from_geo_events(df: pandas.core.frame.DataFrame, lat_col: str, lon_col: str, nodes: Tuple, levels: Tuple[int, int] = (6, 7), resample_freq: str = '1H', min_count: Union[float, int] = 0.2, root_name: str = 'total', fillna: bool = False)[source]
Parameters:
  • df (pandas.DataFrame) –
  • lat_col (str) – Column where the latitude coordinates can be found
  • lon_col (str) – Column where the longitude coordinates can be found
  • nodes (str) –
  • levels
  • resample_freq
  • min_count
  • root_name
  • fillna
Returns:

Return type:

HierarchyTree

classmethod from_nodes(nodes: Dict[str, List[str]], df: pandas.core.frame.DataFrame, exogenous: Dict[str, List[str]] = None, root: Union[str, HierarchyTree] = 'total', top: Optional[hts.hierarchy.HierarchyTree] = None, stack: List[T] = None)[source]

Standard method for creating a hierarchy from nodes and a dataframe containing as columns those nodes. The nodes are represented as a dictionary containing as keys the nodes, and as values list of edges. See the examples for usage. The total column must be named total and not something else.

Parameters:
  • nodes (NodesT) – Nodes definition. See Examples.
  • df (pandas.DataFrame) – The actual data containing the nodes
  • exogenous (ExogT) – The nodes representing the exogenous variables
  • root (Union[str, HierarchyTree]) – The name of the root node
  • top (HierarchyTree) – Not to be used for initialisation, only in recursive calls
  • stack (list) – Not to be used for initialisation, only in recursive calls
Returns:

hierarchy – The hierarchy tree representation of your data

Return type:

HierarchyTree

Examples

In this example we will create a tree from some multivariate data

>>> from hts.utilities.load_data import load_mobility_data
>>> from hts.hierarchy import HierarchyTree
>>> hmv = load_mobility_data()
>>> hmv.head()
            WF-01  CH-07  BT-01  CBD-13  SLU-15  CH-02  CH-08  SLU-01  BT-03  CH-05  SLU-19  SLU-07  SLU-02  CH-01  total   CH  SLU  BT  OTHER  temp  precipitation
starttime
2014-10-13     16     14     20      16      20     42     24      24     12     22      14       2       8      6    240  108   68  32     32  62.0           0.00
2014-10-14     22     28     28      38      36     36     42      40     14     26      18      32      16     18    394  150  142  42     60  59.0           0.11
2014-10-15     10     14      8      20      18     38     16      28     18     10       0      24      10     16    230   94   80  26     30  58.0           0.45
2014-10-16     22     18     24      44      44     40     24      20     22     18       8      26      14     14    338  114  112  46     66  61.0           0.00
2014-10-17      8     12     16      20      18     22     32      12      8     28      10      30       8     10    234  104   78  24     28  60.0           0.14
>>> hier = {
        'total': ['CH', 'SLU', 'BT', 'OTHER'],
        'CH': ['CH-07', 'CH-02', 'CH-08', 'CH-05', 'CH-01'],
        'SLU': ['SLU-15', 'SLU-01', 'SLU-19', 'SLU-07', 'SLU-02'],
        'BT': ['BT-01', 'BT-03'],
        'OTHER': ['WF-01', 'CBD-13']
    }
>>> exogenous = {k: ['precipitation', 'temp'] for k in hmv.columns if k not in ['precipitation', 'temp']}
>>> ht = HierarchyTree.from_nodes(hier, hmv, exogenous=exogenous)
>>> print(ht)
- total
   |- CH
   |  |- CH-07
   |  |- CH-02
   |  |- CH-08
   |  |- CH-05
   |  - CH-01
   |- SLU
   |  |- SLU-15
   |  |- SLU-01
   |  |- SLU-19
   |  |- SLU-07
   |  - SLU-02
   |- BT
   |  |- BT-01
   |  - BT-03
   - OTHER
      |- WF-01
      - CBD-13
get_level_order_labels() → List[List[str]][source]

Get the associated node labels from the NAryTreeT level_order_traversal().

Parameters:self (NAryTreeT) – Tree being searched.
Returns:Node labels corresponding to level order traversal.
Return type:List[List[str]]
get_node(key: str) → Optional[hts._t.NAryTreeT][source]

Get a node given its key

Parameters:key (str) – The key of the node of interest
Returns:node – The node of interest
Return type:HierarchyTree
is_leaf()[source]

Check if node is a leaf Node

Returns:True or False
Return type:bool
level_order_traversal() → List[List[int]][source]

Iterate through the tree in level order, getting the number of children for each node

Returns:
Return type:list[list[int]]
num_nodes() → int[source]

Return the of nodes in the tree

Returns:num nodes
Return type:int
to_pandas() → pandas.core.frame.DataFrame[source]

Transforms the hierarchy into a pandas.DataFrame :returns: df – Dataframe representation of the tree :rtype: pandas.DataFrame

traversal_level() → List[hts._t.NAryTreeT][source]

Level order traversal of the tree

Returns:
Return type:list of nodes