HyFile

HyFile is the main interface for reading, writing, and processing HDF5 files in Hystorian.

class hystorian.io.hyFile.HyApply(file, func: Callable, args: tuple[Any], kwargs: dict[str, Any] | None = None)
apply()
class hystorian.io.hyFile.HyFile(path: Path | str, mode: str = 'r')

HyFile is a class that wraps around the h5py.File class and is used to create and manipulate datafile from proprieteray files

path

Path to the hdf5 file to be manipulated.

Type:

Path

file

Handle of the file: the class call h5py.File(path). If the file does not exist, it is generated. (See __init__ docstring for more details).

Type:

h5py.File

attrs

Attributes is an internal class, which allow the manipulation of hdf5 attributes through HyFile.

Type:

Attributes

class Attributes(file: File)

Internal class of HyFile which allows for the manipulation of attributes inside an hdf5 file in the same way than h5py does.

Examples

  • This will navigate to the Dataset located at ‘path/to/data’ and read the attribute with the key ‘important_attribute’.

>>> f['path/to/data'].attrs('important_attribute')
  • This will navigate to the Dataset located at ‘path/to/data’ and write (or overwrite if it already exists) the attribute with key new_attribute and set it to 0.

>>> f['path/to/data'].attrs('new_attribute') = 0
apply(function: Callable, /, *args: Any, output_names: list[str] | str | None = None, increment_proc: bool = True, **kwargs: Any)

apply allows to call a function and store all the inputs and outputs in the hdf5 file with the raw data.

Parameters:
  • function (Callable) – function used to transform the data. Result of the function will be stored in process/XXX-<function-name>, where XXX is an incrementing number for each already existing process.

  • *args (Any) – All the positional arguments for the above function. If an HyPath is present in args, then self.file.read() will be called for this argument.

  • output_names (Optional[list[str] | str], optional) – Name to be given to the result of the function. The number of names should be the same as the number of outputs of the function passed, othewise a ValueError will be raised, if None is passed, the name of the function will be used, by default None.

  • increment_proc (bool, optional) – if a process/XXX-<function-name> already exist and increment_proc is set to true, the result will be save in the existing folder, otherwise it will generated a new folder, by default True

  • **kwargs (Any) – All the keyword arguments for the above function. If an HyPath is present in kwargs, then self.file.read() will be called for this argument.

delete(path: str | HyPath | int, renumber=True) None

Remove a path from the file. If the path is a group, it will remove the group and all its subgroups and datasets. If the path is a dataset, it will remove the dataset.

Parameters:

path (str | HyPath | int) – Path to the group or dataset you want to remove, or a number which will be used to remove the n-th process folder.

extract_data(path: str | Path, overwrite=False, **kwargs) None

Extract the data, metadata and attributes from a file given by path. Currently supported files are:

  • .gsf (Gwyddion Simple Field): generated by Gwyddion

  • .ibw (Igor Binary Wave): generated by Asylum AFM. (might work for other kind of ibw files)

  • .000 : Nanoscope files

  • .ARDF : generated by Asylum AFM (for ForceMaps and SSPFM)

Parameters:

path (str | Path) – path to the file to be converted. If a string is provided it is converted to Path.

Raises:

TypeError – If the file you pass through path does not have a conversion function, will raise an error.

is_empty_group(path: str)

Return True if the group (and its subgroups) contain no datasets.

property last_process

returns a string which is the path to the last process in the hdf5 file.

merge(tomerge, overwrite: bool = False) None
multiple_apply(function, list_args, /, output_names=None, smart=False, increment_proc=True, **kwargs)
read(path: str | HyPath | None = None, search: bool = False) list[str] | Datatype | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]

Wrapper around the __getitem__ of h5py. Directly returns the keys of the sub-groups if the path lead to an h5py.Group, otherwise directly load the dataset. This allows to get a list of keys to the folders without calling .keys(), and to the data without [()] therefore the way to call the keys or the data are the same. And therefore the user does not need to change the call between .keys() and [()] to navigate the hierarchical structure.

Parameters:

path (Optional[str], optional) – Path to the Group or Dataset you want to read. If the value is None, read the root of the folder (should be [datasets, metadata, process] if created with Hystorian), by default None

Returns:

If the path lead to Groups, will return a list of the subgroups, if it lead to a Dataset containing data, it will directly return the data, and if it is an empty Dataset, will return its Datatype.

Return type:

list[str] | h5py.Datatype | npt.ArrayLike

class hystorian.io.hyFile.HyPath(path: str)
property path
split(separator=None, maxsplit=-1)
property stem

Example: Extract and process data

from hystorian.io.hyFile import HyFile
with HyFile('data.hdf5', 'r+') as f:
    f.extract_data('/path/to/file.ibw')
    # Apply a function to a dataset
    import numpy as np
    from hystorian.io.hyFile import HyPath
    f.apply(np.mean, HyPath('datasets/data/grid'), output_names='grid_mean')

with HyFile('random.hdf5', 'r+') as f:
    f.apply(np.sum, HyPath('datasets/data/grid'), output_names='grid_sum')
    f.apply(np.sum, HyPath('datasets/data/grid'), output_names='grid_sum', axis=0)
    f.apply(np.sum, [HyPath('datasets/data/grid'), HyPath('datasets/data/grid2')], output_names='grid_sum')
    f.multiple_apply(np.sum, [HyPath('datasets/data/grid'), HyPath('datasets/data/grid2')], output_names=['grid_sum', 'grid_sum2'])