API Index

hts

class hts.HTSRegressor(model: str = 'prophet', revision_method: str = 'OLS', transform: Union[hts._t.Transform, bool, None] = False, n_jobs: int = 1, low_memory: bool = False, **kwargs)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Main regressor class for scikit-hts. Likely the only import you’ll need for using this project. It takes a pandas dataframe, the nodes specifying the hierarchies, model kind, revision method, and a few other parameters. See Examples to get an idea of how to use it.

Variables:
  • transform (Union[NamedTuple[str, Callable], bool]) – Function transform to be applied to input and outputs. If True, it will use scipy.stats.boxcox and scipy.special._ufuncs.inv_boxcox on input and output data
  • sum_mat (array_like) – The summing matrix, explained in depth in Forecasting
  • nodes (Dict[str, List[str]]) – Nodes representing node, edges of the hierarchy. Keys are nodes, values are list of edges.
  • df (pandas.DataFrame) – The dataframe containing the nodes and edges specified above
  • revision_method (str) – One of: "OLS", "WLSS", "WLSV", "FP", "PHA", "AHP", "BU", "NONE"
  • models (dict) – Dictionary that holds the trained models
  • mse (dict) – Dictionary that holds the mse scores for the trained models
  • residuals (dict) – Dictionary that holds the mse residual for the trained models
  • forecasts (dict) – Dictionary that holds the forecasts for the trained models
  • model_instance (TimeSeriesModel) – Reference to the class implementing the actual time series model
__init__(model: str = 'prophet', revision_method: str = 'OLS', transform: Union[hts._t.Transform, bool, None] = False, n_jobs: int = 1, low_memory: bool = False, **kwargs)[source]
Parameters:
  • model (str) – One of the models supported by hts. These can be found
  • revision_method (str) – The revision method to be used. One of: "OLS", "WLSS", "WLSV", "FP", "PHA", "AHP", "BU", "NONE"
  • transform (Boolean or NamedTuple) –

    If True, scipy.stats.boxcox and scipy.special._ufuncs.inv_boxcox will be applied prior and after fitting. If False (default), no transform is applied. If you desired to use custom functions, use a NamedTuple like:

    from collections import namedtuple
    
    Transform = namedtuple('Transform', ['func', 'inv_func']
    transform = Transform(func=numpy.exp, inv_func=numpy.log)
    
    ht = HTSRegressor(transform=transform, ...)
    

    The signatures for the func as well as inv_func parameters must both be Callable[[numpy.ndarry], numpy.ndarray], i.e. they must take an array and return an array, both of equal dimensions

  • n_jobs (int) – Number of parallel jobs to run the forecasting on
  • low_memory (Bool) – If True, models will be fit, serialized, and released from memory. Usually a good idea if you are dealing with a large amount of nodes
  • kwargs – Keyword arguments to be passed to the underlying model to be instantiated
fit(df: Optional[pandas.core.frame.DataFrame] = None, nodes: Optional[Dict[str, List[str]]] = None, tree: Optional[hts.hierarchy.HierarchyTree] = None, exogenous: Optional[Dict[str, List[str]]] = None, root: str = 'total', distributor: Optional[hts.utilities.distribution.DistributorBaseClass] = None, disable_progressbar=False, show_warnings=False, **fit_kwargs) → hts.core.regressor.HTSRegressor[source]

Fit hierarchical model to dataframe containing hierarchical data as specified in the nodes parameter.

Exogenous can also be passed as a dict of (string, list), where string is the specific node key and the list contains the names of the columns to be used as exogenous variables for that node.

Alternatively, a pre-built HierarchyTree can be passed without specifying the node and df. See more at hts.hierarchy.HierarchyTree

Parameters:
  • df (pandas.DataFrame) – A Dataframe of time series with a DateTimeIndex. Each column represents a node in the hierarchy. Ignored if tree argument is passed
  • nodes (Dict[str, List[str]]) –
    The hierarchy defined as a dict of (string, list), as specified in
    HierarchyTree.from_nodes
  • tree (HierarchyTree) – A pre-built HierarchyTree. Ignored if df and nodes are passed, as the tree will be built from thise
  • distributor (Optional[DistributorBaseClass]) – A distributor, for parallel/distributed processing
  • exogenous (Dict[str, List[str]] or None) – Node key mapping to columns that contain the exogenous variable for that node
  • root (str) – The name of the root node
  • disable_progressbar (Bool) – Disable or enable progressbar
  • show_warnings (Bool) – Disable warnings
  • fit_kwargs (Any) – Any arguments to be passed to the underlying forecasting model’s fit function
Returns:

The fitted HTSRegressor instance

Return type:

HTSRegressor

predict(exogenous_df: pandas.core.frame.DataFrame = None, steps_ahead: int = None, distributor: Optional[hts.utilities.distribution.DistributorBaseClass] = None, disable_progressbar: bool = False, show_warnings: bool = False, **predict_kwargs) → pandas.core.frame.DataFrame[source]
Parameters:
  • distributor (Optional[DistributorBaseClass]) – A distributor, for parallel/distributed processing
  • disable_progressbar (Bool) – Disable or enable progressbar
  • show_warnings (Bool) – Disable warnings
  • predict_kwargs (Any) – Any arguments to be passed to the underlying forecasting model’s predict function
  • exogenous_df (pandas.DataFrame) –

    A dataframe of length == steps_ahead containing the exogenous data for each of the nodes. Only required when using prophet or auto_arima models. See fbprophet’s additional regression docs and AutoARIMA’s exogenous handling docs for more information.

    Other models do not require additional regressors at predict time.

  • steps_ahead (int) – The number of forecasting steps for which to produce a forecast
Returns:

  • Revised Forecasts, as a pandas.DataFrame in the same format as the one passed for fitting, extended by steps_ahead
  • time steps`

class hts.RevisionMethod(name: str, sum_mat: numpy.ndarray, transformer)[source]

Bases: object

revise(forecasts=None, mse=None, nodes=None) → numpy.ndarray[source]
Parameters:
  • forecasts
  • mse
  • nodes

hts.convenience

hts.convenience.revise_forecasts(method: str, forecasts: Dict[str, Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame]], errors: Optional[Dict[str, float]] = None, residuals: Optional[Dict[str, Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame]]] = None, summing_matrix: numpy.ndarray = None, nodes: hts._t.NAryTreeT = None, transformer: Union[hts._t.Transform, bool] = None)[source]

Convenience function to get revised forecast for pre-computed base forecasts

Parameters:
  • method (str) – The reconciliation method to use
  • forecasts (Dict[str, ArrayLike]) – A dict mapping key name to its forecasts (including in-sample forecasts). Required, can be of type numpy.ndarray of ndim == 1, pandas.Series, or single columned pandas.DataFrame
  • errors (Dict[str, float]) – A dict mapping key name to the in-sample errors. Required for methods: OLS, WLSS, WLSV if residuals is not passed
  • residuals (Dict[str, ArrayLike]) – A dict mapping key name to the residuals of in-sample forecasts. Required for methods: OLS, WLSS, WLSV, can be of type numpy.ndarray of ndim == 1, pandas.Series, or single columned pandas.DataFrame. If passing residuals, errors dict is not required and will instead be calculated using MSE metric: numpy.mean(numpy.array(residual) ** 2)
  • summing_matrix (numpy.ndarray) – Not required if nodes argument is passed, or if using BU approach
  • nodes (NAryTreeT) – The tree of nodes as specified in HierarchyTree. Required if not if using AHP, PHA, FP methods, or if using passing the OLS, WLSS, WLSV methods and not passing the summing_matrix parameter
  • transformer (TransformT) – A transform with the method: inv_func that will be applied to the forecasts
Returns:

revised forecasts – The revised forecasts

Return type:

pandas.DataFrame

hts.defaults

hts.functions

hts.functions._create_bl_str_col(df: pandas.core.frame.DataFrame, level_names: List[str]) → List[str][source]

Concatenate the column values of all the specified level_names by row into a single column.

Parameters:
  • df (pandas.DataFrame) – Tabular data.
  • level_names (List[str]) – Levels in the hierarchy.
Returns:

Concatendated column values by row.

Return type:

List[str]

hts.functions._get_bl(grouped_levels: List[str], bottom_levels: List[str]) → List[List[str]][source]

Get bottom level columns required to sum to create grouped columns.

Parameters:
  • grouped_levels (List[str]) – Grouped level, underscore delimited, column names.
  • bottom_levels (List[str]) – Bottom level, underscore delimited, column names.
Returns:

Bottom level column names that make up each individual aggregated node in the hierarchy.

Return type:

List[List[str]]

hts.functions.add_agg_series_to_df(df: pandas.core.frame.DataFrame, grouped_levels: List[str], bottom_levels: List[str]) → pandas.core.frame.DataFrame[source]

Add aggregate series columns to wide dataframe.

Parameters:
  • df (pandas.DataFrame) – Wide dataframe containing bottom level series.
  • grouped_levels (List[str]) – Grouped level, underscore delimited, column names.
  • bottom_levels (List[str]) – Bottom level, underscore delimited, column names.
Returns:

Wide dataframe with all series in hierarchy.

Return type:

pandas.DataFrame

hts.functions.forecast_proportions(forecasts, nodes)[source]
Cons:
Produces biased revised forecasts even if base forecasts are unbiased
hts.functions.get_agg_series(df: pandas.core.frame.DataFrame, levels: List[List[str]]) → List[str][source]

Get aggregate level series names.

Parameters:
  • df (pandas.DataFrame) – Tabular data.
  • levels (List[List[str]]) – List of lists containing the desired level of aggregation.
Returns:

Aggregate series names.

Return type:

List[str]

hts.functions.get_hierarchichal_df(df: pandas.core.frame.DataFrame, level_names: List[str], hierarchy: List[List[str]], date_colname: str, val_colname: str) → Tuple[pandas.core.frame.DataFrame, numpy.array, List[str]][source]

Transform your tabular dataframe to a wide dataframe with desired levels a hierarchy.

Parameters:
  • df (pd.DataFrame) – Tabular dataframe
  • level_names (List[str]) – Levels in the hierarchy.
  • hierarchy (List[List[str]]) – Desired levels in your hierarchy.
  • date_colname (str) – Date column name
  • val_colname (str) – Name of column containing series values.
Returns:

  • pd.DataFrame – Wide dataframe with levels of specified aggregation.
  • np.array – Summing matrix.
  • List[str] – Summing matrix labels.

Examples

>>> import hts.functions
>>> hier_df = pandas.DataFrame(
    data={
        'ds': ['2020-01', '2020-02'] * 5,
        "lev1": ['A', 'A',
                 'A', 'A',
                 'A', 'A',
                 'B', 'B',
                 'B', 'B'],
        "lev2": ['X', 'X',
                 'Y', 'Y',
                 'Z', 'Z',
                 'X', 'X',
                 'Y', 'Y'],
        "val": [1, 2,
                3, 4,
                5, 6,
                7, 8,
                9, 10]
    }
)
>>> hier_df
        ds lev1 lev2  val
0  2020-01    A    X    1
1  2020-02    A    X    2
2  2020-01    A    Y    3
3  2020-02    A    Y    4
4  2020-01    A    Z    5
5  2020-02    A    Z    6
6  2020-01    B    X    7
7  2020-02    B    X    8
8  2020-01    B    Y    9
9  2020-02    B    Y   10
>>> level_names = ['lev1', 'lev2']
>>> hierarchy = [['lev1'], ['lev2']]
>>> wide_df, sum_mat, sum_mat_labels = hts.functions.get_hierarchichal_df(hier_df,
                                                                          level_names=level_names,
                                                                          hierarchy=hierarchy,
                                                                          date_colname='ds',
                                                                          val_colname='val')
>>> wide_df
    lev1_lev2  A_X  A_Y  A_Z  B_X  B_Y  total   A   B   X   Y  Z
    ds
    2020-01      1    3    5    7    9     25   9  16   8  12  5
    2020-02      2    4    6    8   10     30  12  18  10  14  6
hts.functions.optimal_combination(forecasts: Dict[str, pandas.core.frame.DataFrame], sum_mat: numpy.ndarray, method: str, mse: Dict[str, float])[source]

Produces the optimal combination of forecasts by trace minimization (as described by Wickramasuriya, Athanasopoulos, Hyndman in “Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization”)

Parameters:
  • forecasts (dict) – Dictionary of pandas.DataFrames containing the future predictions
  • sum_mat (np.ndarray) – The summing matrix
  • method (str) –
    One of:
    • OLS (ordinary least squares)
    • WLSS (structurally weighted least squares)
    • WLSV (variance weighted least squares)
  • mse
hts.functions.project(hat_mat: numpy.ndarray, sum_mat: numpy.ndarray, optimal_mat: numpy.ndarray) → numpy.ndarray[source]
hts.functions.proportions(nodes, forecasts, sum_mat, method='PHA')[source]
hts.functions.to_sum_mat(ntree: hts._t.NAryTreeT = None, node_labels: List[str] = None) → Tuple[numpy.ndarray, List[str]][source]

This function creates a summing matrix for the bottom up and optimal combination approaches All the inputs are the same as above The output is a summing matrix, see Rob Hyndman’s “Forecasting: principles and practice” Section 9.4

Parameters:
  • ntree (NAryTreeT) –
  • node_labels (List[str]) – Labels corresponing to node names/ summing matrix. Get from hts.functions.get_hierarchichal_df(…)
Returns:

  • numpy.ndarray – Summing matrix.
  • List[str] – Row order list of the level in the hierarchy represented by each row in the summing matrix.

hts.functions.y_hat_matrix(forecasts, keys=None)[source]

hts.revision

class hts.revision.RevisionMethod(name: str, sum_mat: numpy.ndarray, transformer)[source]

Bases: object

revise(forecasts=None, mse=None, nodes=None) → numpy.ndarray[source]
Parameters:
  • forecasts
  • mse
  • nodes

hts.transforms

class hts.transforms.BoxCoxTransformer[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(x: pandas.core.series.Series, y=None, **fit_params)[source]
fit_transform(x: pandas.core.series.Series, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

inverse_transform(x: Union[pandas.core.series.Series, numpy.ndarray])[source]
transform(x: pandas.core.series.Series)[source]
class hts.transforms.FunctionTransformer(func: callable = None, inv_func: callable = None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(x: pandas.core.series.Series, y=None, **fit_params)[source]
fit_transform(x: pandas.core.series.Series, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

inverse_transform(x: Union[pandas.core.series.Series, numpy.ndarray])[source]
transform(x: pandas.core.series.Series)[source]

hts._t

class hts._t.ExtendedEnum[source]

Bases: enum.Enum

An enumeration.

list = <bound method ExtendedEnum.list of <enum 'ExtendedEnum'>>[source]
names = <bound method ExtendedEnum.names of <enum 'ExtendedEnum'>>[source]
class hts._t.HierarchyVisualizerT[source]

Bases: object

create_map()[source]
class hts._t.MethodT[source]

Bases: hts._t.ExtendedEnum

An enumeration.

AHP = 'AHP'
BU = 'BU'
FP = 'FP'
NONE = 'NONE'
OLS = 'OLS'
PHA = 'PHA'
WLSS = 'WLSS'
WLSV = 'WLSV'
class hts._t.ModelT[source]

Bases: str, hts._t.ExtendedEnum

An enumeration.

auto_arima = 'auto_arima'
holt_winters = 'holt_winters'
prophet = 'prophet'
sarimax = 'sarimax'
class hts._t.NAryTreeT[source]

Bases: object

Type definition of an NAryTree

add_child(key=None, item=None, exogenous=None) → hts._t.NAryTreeT[source]
exogenous = None
get_height() → int[source]
get_level_order_labels() → List[List[str]][source]
get_node_height(key: str) → int[source]
get_series() → pandas.core.series.Series[source]
is_leaf() → bool[source]
leaf_sum() → int[source]
level_order_traversal() → List[List[int]][source]
num_nodes() → int[source]
parent
string_repr(prefix='', _last=True)[source]
sum_at_height(level) → int[source]
to_pandas() → pandas.core.frame.DataFrame[source]
traversal_level() → List[hts._t.NAryTreeT][source]
value_at_height(level: int) → List[T][source]
class hts._t.TimeSeriesModelT[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Type definition of an TimeSeriesModel

create_model(**kwargs)[source]
fit(**fit_args) → hts._t.TimeSeriesModelT[source]
predict(node: hts._t.NAryTreeT, **predict_args)[source]
class hts._t.Transform(func, inv_func)[source]

Bases: tuple

func

Alias for field number 0

inv_func

Alias for field number 1

class hts._t.UnivariateModelT[source]

Bases: str, hts._t.ExtendedEnum

An enumeration.

arima = 'arima'
auto_arima = 'auto_arima'
holt_winters = 'holt_winters'
prophet = 'prophet'
sarimax = 'sarimax'