API Index¶
hts¶
-
class
hts.
HTSRegressor
(model: str = 'prophet', revision_method: str = 'OLS', transform: Union[hts._t.Transform, bool, None] = False, n_jobs: int = 1, low_memory: bool = False, **kwargs)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.RegressorMixin
Main regressor class for scikit-hts. Likely the only import you’ll need for using this project. It takes a pandas dataframe, the nodes specifying the hierarchies, model kind, revision method, and a few other parameters. See Examples to get an idea of how to use it.
Variables: - transform (Union[NamedTuple[str, Callable], bool]) – Function transform to be applied to input and outputs. If True, it will use
scipy.stats.boxcox
andscipy.special._ufuncs.inv_boxcox
on input and output data - sum_mat (array_like) – The summing matrix, explained in depth in Forecasting
- nodes (Dict[str, List[str]]) – Nodes representing node, edges of the hierarchy. Keys are nodes, values are list of edges.
- df (pandas.DataFrame) – The dataframe containing the nodes and edges specified above
- revision_method (str) – One of:
"OLS", "WLSS", "WLSV", "FP", "PHA", "AHP", "BU", "NONE"
- models (dict) – Dictionary that holds the trained models
- mse (dict) – Dictionary that holds the mse scores for the trained models
- residuals (dict) – Dictionary that holds the mse residual for the trained models
- forecasts (dict) – Dictionary that holds the forecasts for the trained models
- model_instance (TimeSeriesModel) – Reference to the class implementing the actual time series model
-
__init__
(model: str = 'prophet', revision_method: str = 'OLS', transform: Union[hts._t.Transform, bool, None] = False, n_jobs: int = 1, low_memory: bool = False, **kwargs)[source]¶ Parameters: - model (str) – One of the models supported by
hts
. These can be found - revision_method (str) – The revision method to be used. One of:
"OLS", "WLSS", "WLSV", "FP", "PHA", "AHP", "BU", "NONE"
- transform (Boolean or NamedTuple) –
If True,
scipy.stats.boxcox
andscipy.special._ufuncs.inv_boxcox
will be applied prior and after fitting. If False (default), no transform is applied. If you desired to use custom functions, use a NamedTuple like:from collections import namedtuple Transform = namedtuple('Transform', ['func', 'inv_func'] transform = Transform(func=numpy.exp, inv_func=numpy.log) ht = HTSRegressor(transform=transform, ...)
The signatures for the
func
as well asinv_func
parameters must both beCallable[[numpy.ndarry], numpy.ndarray]
, i.e. they must take an array and return an array, both of equal dimensions - n_jobs (int) – Number of parallel jobs to run the forecasting on
- low_memory (Bool) – If True, models will be fit, serialized, and released from memory. Usually a good idea if you are dealing with a large amount of nodes
- kwargs – Keyword arguments to be passed to the underlying model to be instantiated
- model (str) – One of the models supported by
-
fit
(df: Optional[pandas.core.frame.DataFrame] = None, nodes: Optional[Dict[str, List[str]]] = None, tree: Optional[hts.hierarchy.HierarchyTree] = None, exogenous: Optional[Dict[str, List[str]]] = None, root: str = 'total', distributor: Optional[hts.utilities.distribution.DistributorBaseClass] = None, disable_progressbar=False, show_warnings=False, **fit_kwargs) → hts.core.regressor.HTSRegressor[source]¶ Fit hierarchical model to dataframe containing hierarchical data as specified in the
nodes
parameter.Exogenous can also be passed as a dict of (string, list), where string is the specific node key and the list contains the names of the columns to be used as exogenous variables for that node.
Alternatively, a pre-built HierarchyTree can be passed without specifying the node and df. See more at
hts.hierarchy.HierarchyTree
Parameters: - df (pandas.DataFrame) – A Dataframe of time series with a DateTimeIndex. Each column represents a node in the hierarchy. Ignored if tree argument is passed
- nodes (Dict[str, List[str]]) –
- The hierarchy defined as a dict of (string, list), as specified in
HierarchyTree.from_nodes
- tree (HierarchyTree) – A pre-built HierarchyTree. Ignored if df and nodes are passed, as the tree will be built from thise
- distributor (Optional[DistributorBaseClass]) – A distributor, for parallel/distributed processing
- exogenous (Dict[str, List[str]] or None) – Node key mapping to columns that contain the exogenous variable for that node
- root (str) – The name of the root node
- disable_progressbar (Bool) – Disable or enable progressbar
- show_warnings (Bool) – Disable warnings
- fit_kwargs (Any) – Any arguments to be passed to the underlying forecasting model’s fit function
Returns: The fitted HTSRegressor instance
Return type:
-
predict
(exogenous_df: pandas.core.frame.DataFrame = None, steps_ahead: int = None, distributor: Optional[hts.utilities.distribution.DistributorBaseClass] = None, disable_progressbar: bool = False, show_warnings: bool = False, **predict_kwargs) → pandas.core.frame.DataFrame[source]¶ Parameters: - distributor (Optional[DistributorBaseClass]) – A distributor, for parallel/distributed processing
- disable_progressbar (Bool) – Disable or enable progressbar
- show_warnings (Bool) – Disable warnings
- predict_kwargs (Any) – Any arguments to be passed to the underlying forecasting model’s predict function
- exogenous_df (pandas.DataFrame) –
A dataframe of length == steps_ahead containing the exogenous data for each of the nodes. Only required when using
prophet
orauto_arima
models. See fbprophet’s additional regression docs and AutoARIMA’s exogenous handling docs for more information.Other models do not require additional regressors at predict time.
- steps_ahead (int) – The number of forecasting steps for which to produce a forecast
Returns: - Revised Forecasts, as a pandas.DataFrame in the same format as the one passed for fitting, extended by steps_ahead
- time steps`
- transform (Union[NamedTuple[str, Callable], bool]) – Function transform to be applied to input and outputs. If True, it will use
hts.convenience¶
-
hts.convenience.
revise_forecasts
(method: str, forecasts: Dict[str, Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame]], errors: Optional[Dict[str, float]] = None, residuals: Optional[Dict[str, Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame]]] = None, summing_matrix: numpy.ndarray = None, nodes: hts._t.NAryTreeT = None, transformer: Union[hts._t.Transform, bool] = None)[source]¶ Convenience function to get revised forecast for pre-computed base forecasts
Parameters: - method (str) – The reconciliation method to use
- forecasts (Dict[str, ArrayLike]) – A dict mapping key name to its forecasts (including in-sample forecasts). Required, can be
of type
numpy.ndarray
ofndim == 1
,pandas.Series
, or single columnedpandas.DataFrame
- errors (Dict[str, float]) – A dict mapping key name to the in-sample errors. Required for methods:
OLS
,WLSS
,WLSV
ifresiduals
is not passed - residuals (Dict[str, ArrayLike]) – A dict mapping key name to the residuals of in-sample forecasts. Required for methods:
OLS
,WLSS
,WLSV
, can be of typenumpy.ndarray
of ndim == 1,pandas.Series
, or single columnedpandas.DataFrame
. If passing residuals,errors
dict is not required and will instead be calculated using MSE metric:numpy.mean(numpy.array(residual) ** 2)
- summing_matrix (numpy.ndarray) – Not required if
nodes
argument is passed, or if usingBU
approach - nodes (NAryTreeT) – The tree of nodes as specified in
HierarchyTree
. Required if not if usingAHP
,PHA
,FP
methods, or if using passing theOLS
,WLSS
,WLSV
methods and not passing thesumming_matrix
parameter - transformer (TransformT) – A transform with the method:
inv_func
that will be applied to the forecasts
Returns: revised forecasts – The revised forecasts
Return type: pandas.DataFrame
hts.defaults¶
hts.functions¶
-
hts.functions.
_create_bl_str_col
(df: pandas.core.frame.DataFrame, level_names: List[str]) → List[str][source]¶ Concatenate the column values of all the specified level_names by row into a single column.
Parameters: - df (pandas.DataFrame) – Tabular data.
- level_names (List[str]) – Levels in the hierarchy.
Returns: Concatendated column values by row.
Return type: List[str]
-
hts.functions.
_get_bl
(grouped_levels: List[str], bottom_levels: List[str]) → List[List[str]][source]¶ Get bottom level columns required to sum to create grouped columns.
Parameters: - grouped_levels (List[str]) – Grouped level, underscore delimited, column names.
- bottom_levels (List[str]) – Bottom level, underscore delimited, column names.
Returns: Bottom level column names that make up each individual aggregated node in the hierarchy.
Return type: List[List[str]]
-
hts.functions.
add_agg_series_to_df
(df: pandas.core.frame.DataFrame, grouped_levels: List[str], bottom_levels: List[str]) → pandas.core.frame.DataFrame[source]¶ Add aggregate series columns to wide dataframe.
Parameters: - df (pandas.DataFrame) – Wide dataframe containing bottom level series.
- grouped_levels (List[str]) – Grouped level, underscore delimited, column names.
- bottom_levels (List[str]) – Bottom level, underscore delimited, column names.
Returns: Wide dataframe with all series in hierarchy.
Return type: pandas.DataFrame
-
hts.functions.
forecast_proportions
(forecasts, nodes)[source]¶ - Cons:
- Produces biased revised forecasts even if base forecasts are unbiased
-
hts.functions.
get_agg_series
(df: pandas.core.frame.DataFrame, levels: List[List[str]]) → List[str][source]¶ Get aggregate level series names.
Parameters: - df (pandas.DataFrame) – Tabular data.
- levels (List[List[str]]) – List of lists containing the desired level of aggregation.
Returns: Aggregate series names.
Return type: List[str]
-
hts.functions.
get_hierarchichal_df
(df: pandas.core.frame.DataFrame, level_names: List[str], hierarchy: List[List[str]], date_colname: str, val_colname: str) → Tuple[pandas.core.frame.DataFrame, numpy.array, List[str]][source]¶ Transform your tabular dataframe to a wide dataframe with desired levels a hierarchy.
Parameters: - df (pd.DataFrame) – Tabular dataframe
- level_names (List[str]) – Levels in the hierarchy.
- hierarchy (List[List[str]]) – Desired levels in your hierarchy.
- date_colname (str) – Date column name
- val_colname (str) – Name of column containing series values.
Returns: - pd.DataFrame – Wide dataframe with levels of specified aggregation.
- np.array – Summing matrix.
- List[str] – Summing matrix labels.
Examples
>>> import hts.functions >>> hier_df = pandas.DataFrame( data={ 'ds': ['2020-01', '2020-02'] * 5, "lev1": ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], "lev2": ['X', 'X', 'Y', 'Y', 'Z', 'Z', 'X', 'X', 'Y', 'Y'], "val": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] } ) >>> hier_df ds lev1 lev2 val 0 2020-01 A X 1 1 2020-02 A X 2 2 2020-01 A Y 3 3 2020-02 A Y 4 4 2020-01 A Z 5 5 2020-02 A Z 6 6 2020-01 B X 7 7 2020-02 B X 8 8 2020-01 B Y 9 9 2020-02 B Y 10 >>> level_names = ['lev1', 'lev2'] >>> hierarchy = [['lev1'], ['lev2']] >>> wide_df, sum_mat, sum_mat_labels = hts.functions.get_hierarchichal_df(hier_df, level_names=level_names, hierarchy=hierarchy, date_colname='ds', val_colname='val') >>> wide_df lev1_lev2 A_X A_Y A_Z B_X B_Y total A B X Y Z ds 2020-01 1 3 5 7 9 25 9 16 8 12 5 2020-02 2 4 6 8 10 30 12 18 10 14 6
-
hts.functions.
optimal_combination
(forecasts: Dict[str, pandas.core.frame.DataFrame], sum_mat: numpy.ndarray, method: str, mse: Dict[str, float])[source]¶ Produces the optimal combination of forecasts by trace minimization (as described by Wickramasuriya, Athanasopoulos, Hyndman in “Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization”)
Parameters: - forecasts (dict) – Dictionary of pandas.DataFrames containing the future predictions
- sum_mat (np.ndarray) – The summing matrix
- method (str) –
- One of:
- OLS (ordinary least squares)
- WLSS (structurally weighted least squares)
- WLSV (variance weighted least squares)
- mse –
-
hts.functions.
project
(hat_mat: numpy.ndarray, sum_mat: numpy.ndarray, optimal_mat: numpy.ndarray) → numpy.ndarray[source]¶
-
hts.functions.
to_sum_mat
(ntree: hts._t.NAryTreeT = None, node_labels: List[str] = None) → Tuple[numpy.ndarray, List[str]][source]¶ This function creates a summing matrix for the bottom up and optimal combination approaches All the inputs are the same as above The output is a summing matrix, see Rob Hyndman’s “Forecasting: principles and practice” Section 9.4
Parameters: - ntree (NAryTreeT) –
- node_labels (List[str]) – Labels corresponing to node names/ summing matrix. Get from hts.functions.get_hierarchichal_df(…)
Returns: - numpy.ndarray – Summing matrix.
- List[str] – Row order list of the level in the hierarchy represented by each row in the summing matrix.
hts.revision¶
hts.transforms¶
-
class
hts.transforms.
BoxCoxTransformer
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
fit_transform
(x: pandas.core.series.Series, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
-
class
hts.transforms.
FunctionTransformer
(func: callable = None, inv_func: callable = None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
fit_transform
(x: pandas.core.series.Series, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
hts._t¶
-
class
hts._t.
MethodT
[source]¶ Bases:
hts._t.ExtendedEnum
An enumeration.
-
AHP
= 'AHP'¶
-
BU
= 'BU'¶
-
FP
= 'FP'¶
-
NONE
= 'NONE'¶
-
OLS
= 'OLS'¶
-
PHA
= 'PHA'¶
-
WLSS
= 'WLSS'¶
-
WLSV
= 'WLSV'¶
-
-
class
hts._t.
ModelT
[source]¶ Bases:
str
,hts._t.ExtendedEnum
An enumeration.
-
auto_arima
= 'auto_arima'¶
-
holt_winters
= 'holt_winters'¶
-
prophet
= 'prophet'¶
-
sarimax
= 'sarimax'¶
-
-
class
hts._t.
NAryTreeT
[source]¶ Bases:
object
Type definition of an NAryTree
-
exogenous
= None¶
-
parent
¶
-
-
class
hts._t.
TimeSeriesModelT
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.RegressorMixin
Type definition of an TimeSeriesModel