eelib.utils.eval.evaluation_utils

Useful helper methods for evaluating .hdf5 results in jupyter notebooks.

Author: elenia@TUBS
Copyright 2024 elenia
This file is part of eELib, which is free software under the terms of the GNU GPL Version 3.

Module Contents

Functions

find_corresponding_component_dict(series_group_name[, ...])

Tries to infer the component that corresponds to a series_group_name using a dictionary of

find_corresponding_component(series_group_name)

Tries to infer the component that corresponds to a series_group_name using a pre-defined

make_compact(df)

The function returns a compacted version of the non-compact output of hdf5_file_as_pandas()

hdf5_file_as_pandas(path[, pseudonyms, compact, ...])

Return a dataframe representation of the hdf5 file.

get_config(→ dict)

Return scenario configuration of the hdf5 file.

get_timeseries(dataframe, unit, component[, number])

Extracts timeseries data from non-compact dataframe using component name, number and unit.

convert_hdf5_to_csv(input_path, output_path[, ...])

Converts an hdf5 file with the proper format into a csv file. Uses non-compact representation

timestep_to_datetime(timestep, zero_datetime, step_size)

Converts a (numpy list of) timestep(s) to a (numpy list of) actual date(s).

save_figure(fig, ax, filename, path[, figsize, dpi, ...])

Saves a Matplotlib figure with standardized sizing and format.

_read_config(hdf5_data)

Return dictionary corresponding to scenario_config.

Attributes

NAME_COM_DICT

NAME_COM_DICT
find_corresponding_component_dict(series_group_name: str, name_com_dict: dict = NAME_COM_DICT)

Tries to infer the component that corresponds to a series_group_name using a dictionary of regular expressions and components.

Parameters:
  • series_group_name (str) – the name of the series group inside the hdf5 file.

  • name_com_dict (dict) – dictionary linking regular expressions to components. Defaults to NAME_COM_DICT.

Returns:

A string representation of the component, i.e. the value corresponding to the first regular expression in name_com_dict that matches the series_group_name. Returns “Unidentified”, if no proper value is found.

Return type:

str

find_corresponding_component(series_group_name: str)

Tries to infer the component that corresponds to a series_group_name using a pre-defined regular expression.

Parameters:

series_group_name (str) – the name of the series group inside the .hdf5 file.

Returns:

  1. A string representation of the component. “Unidentified”, if no proper value is found.

  2. The number of the component.

Return type:

str, int

make_compact(df: pandas.DataFrame)

The function returns a compacted version of the non-compact output of hdf5_file_as_pandas() so the rows represent elements of timeseries and columns represent compacted names for each.

Parameters:

df (DataFrame) – non-compact dataframe.

Returns:

compacted dataframe.

Return type:

DataFrame

hdf5_file_as_pandas(path: str, pseudonyms=True, compact=False, datetime_col=False)

Return a dataframe representation of the hdf5 file.

Parameters:
  • path (str) – path of the hdf5 file.

  • pseudonyms (bool) – If set to True the function also tries to infer the component corresponding to each timeseries group in the file and adds it as a column to the beginning of the dataframe. Uses find_corresponding_component(). Defaults to True.

  • compact (bool) – If set to True the function returns a compacted version of the pandas dataframe so the rows represent elements of timeseries and columns represent compacted names for each. Defaults to False.

  • datetime_col (bool) – If set to True the function adds a column to dataframe to display the actual date and time in addition to timesteps. Only works if compact=True. Defaults to False.

Returns:

A dataframe representation of the data inside the hdf5 file.

Return type:

pandas.DataFrame

get_config(path: str) dict

Return scenario configuration of the hdf5 file.

Parameters:

path (str) – path of the hdf5 file.

Returns:

scenario configuration dict.

Return type:

dict

get_timeseries(dataframe: pandas.DataFrame, unit: str, component: str, number: int = 0)

Extracts timeseries data from non-compact dataframe using component name, number and unit.

Parameters:
  • dataframe (DataFrame) – the dataframe generated from hdf5_file_as_pandas() method.

  • unit (str) – the name of the unit.

  • component (str) – the name of the component.

  • number (int) – the number of the component. Defaults to 0.

Raises:
  • KeyError – if the column “Component” does not exist.

  • ValueError – if there are multiple or zero timeseries matching the inputs.

Returns:

the timeseries corresponding to the received component and unit.

Return type:

list

convert_hdf5_to_csv(input_path: str, output_path: str, pseudonyms=True, sep=',', na_rep='', compact=False, datetime_col=False)

Converts an hdf5 file with the proper format into a csv file. Uses non-compact representation unless specified.

Parameters:
  • input_path (str) – path for input hdf5 file.

  • output_path (str) – path for the output csv file.

  • pseudonyms (bool) – If set to True the function also tries to infer the component corresponding to each timeseries group in the file and adds it as a column to the beginning of the dataframe. Defaults to True.

  • sep (str) – String of length 1. Field delimiter for the output file. Defaults to ‘,’.

  • na_rep (str) – Missing data representation. Defaults to ‘’.

  • compact (bool) – Whether function returns a compacted version of the data - rows represent elements of timeseries and columns represent compacted names for each. Defaults to False

  • datetime_col (bool) – Whether the function adds a column to dataframe to display the actual date and time in addition to timesteps. Only works if compact=True. Defaults to False

timestep_to_datetime(timestep, zero_datetime: datetime.datetime, step_size: int)

Converts a (numpy list of) timestep(s) to a (numpy list of) actual date(s).

Parameters:
  • timestep – a timestep or a numpy array of timesteps.

  • zero_datetime (datetime) – the datetime corresponding to timestep 0. Inclusion of zero in the list is not mandatory.

  • step_size (int) – size of each timestep in seconds.

Returns:

calculated date(s) corresponding to timestep(s).

Return type:

numpy.datetime64

save_figure(fig, ax, filename, path, figsize=(15, 5), dpi=300, format='svg', rasterized=True)

Saves a Matplotlib figure with standardized sizing and format.

Parameters:
  • fig (Figure) – Matplotlib figure to be saved.

  • ax (Axes) – Matplotlib ax.

  • filename (str) – Name of the output file.

  • path (str) – path of the output file.

  • figsize (tuple) – Size of the figure in inches (width, height).

  • dpi (int) – Dots per inch for image resolution.

  • format (str) – Output file format (e.g., ‘svg’, ‘png’, ‘jpg’, etc.).

  • rasterized (bool) – Whether to rasterize vector elements (True) or not (False).

_read_config(hdf5_data)

Return dictionary corresponding to scenario_config.

Parameters:

hdf5_data – data read from an hdf5 file using h5py.

Returns:

the scenario_config as stored.

Return type:

dict