HyFile
HyFile is the main interface for reading, writing, and processing HDF5 files in Hystorian.
- class hystorian.io.hyFile.HyApply(file, func: Callable, args: tuple[Any], kwargs: dict[str, Any] | None = None)
- apply()
- class hystorian.io.hyFile.HyFile(path: Path | str, mode: str = 'r')
HyFile is a class that wraps around the h5py.File class and is used to create and manipulate datafile from proprieteray files
- path
Path to the hdf5 file to be manipulated.
- Type:
Path
- file
Handle of the file: the class call h5py.File(path). If the file does not exist, it is generated. (See __init__ docstring for more details).
- Type:
h5py.File
- attrs
Attributes is an internal class, which allow the manipulation of hdf5 attributes through HyFile.
- Type:
- class Attributes(file: File)
Internal class of HyFile which allows for the manipulation of attributes inside an hdf5 file in the same way than h5py does.
Examples
This will navigate to the Dataset located at ‘path/to/data’ and read the attribute with the key ‘important_attribute’.
>>> f['path/to/data'].attrs('important_attribute')
This will navigate to the Dataset located at ‘path/to/data’ and write (or overwrite if it already exists) the attribute with key new_attribute and set it to 0.
>>> f['path/to/data'].attrs('new_attribute') = 0
- apply(function: Callable, /, *args: Any, output_names: list[str] | str | None = None, increment_proc: bool = True, **kwargs: Any)
apply allows to call a function and store all the inputs and outputs in the hdf5 file with the raw data.
- Parameters:
function (Callable) – function used to transform the data. Result of the function will be stored in process/XXX-<function-name>, where XXX is an incrementing number for each already existing process.
*args (Any) – All the positional arguments for the above function. If an HyPath is present in args, then self.file.read() will be called for this argument.
output_names (Optional[list[str] | str], optional) – Name to be given to the result of the function. The number of names should be the same as the number of outputs of the function passed, othewise a ValueError will be raised, if None is passed, the name of the function will be used, by default None.
increment_proc (bool, optional) – if a process/XXX-<function-name> already exist and increment_proc is set to true, the result will be save in the existing folder, otherwise it will generated a new folder, by default True
**kwargs (Any) – All the keyword arguments for the above function. If an HyPath is present in kwargs, then self.file.read() will be called for this argument.
- delete(path: str | HyPath | int, renumber=True) None
Remove a path from the file. If the path is a group, it will remove the group and all its subgroups and datasets. If the path is a dataset, it will remove the dataset.
- Parameters:
path (str | HyPath | int) – Path to the group or dataset you want to remove, or a number which will be used to remove the n-th process folder.
- extract_data(path: str | Path, overwrite=False, **kwargs) None
Extract the data, metadata and attributes from a file given by path. Currently supported files are:
.gsf (Gwyddion Simple Field): generated by Gwyddion
.ibw (Igor Binary Wave): generated by Asylum AFM. (might work for other kind of ibw files)
.000 : Nanoscope files
.ARDF : generated by Asylum AFM (for ForceMaps and SSPFM)
- Parameters:
path (str | Path) – path to the file to be converted. If a string is provided it is converted to Path.
- Raises:
TypeError – If the file you pass through path does not have a conversion function, will raise an error.
- is_empty_group(path: str)
Return True if the group (and its subgroups) contain no datasets.
- property last_process
returns a string which is the path to the last process in the hdf5 file.
- merge(tomerge, overwrite: bool = False) None
- multiple_apply(function, list_args, /, output_names=None, smart=False, increment_proc=True, **kwargs)
- read(path: str | HyPath | None = None, search: bool = False) list[str] | Datatype | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]
Wrapper around the __getitem__ of h5py. Directly returns the keys of the sub-groups if the path lead to an h5py.Group, otherwise directly load the dataset. This allows to get a list of keys to the folders without calling .keys(), and to the data without [()] therefore the way to call the keys or the data are the same. And therefore the user does not need to change the call between .keys() and [()] to navigate the hierarchical structure.
- Parameters:
path (Optional[str], optional) – Path to the Group or Dataset you want to read. If the value is None, read the root of the folder (should be [datasets, metadata, process] if created with Hystorian), by default None
- Returns:
If the path lead to Groups, will return a list of the subgroups, if it lead to a Dataset containing data, it will directly return the data, and if it is an empty Dataset, will return its Datatype.
- Return type:
list[str] | h5py.Datatype | npt.ArrayLike
- class hystorian.io.hyFile.HyPath(path: str)
- property path
- split(separator=None, maxsplit=-1)
- property stem
Example: Extract and process data
from hystorian.io.hyFile import HyFile
with HyFile('data.hdf5', 'r+') as f:
f.extract_data('/path/to/file.ibw')
# Apply a function to a dataset
import numpy as np
from hystorian.io.hyFile import HyPath
f.apply(np.mean, HyPath('datasets/data/grid'), output_names='grid_mean')
with HyFile('random.hdf5', 'r+') as f:
f.apply(np.sum, HyPath('datasets/data/grid'), output_names='grid_sum')
f.apply(np.sum, HyPath('datasets/data/grid'), output_names='grid_sum', axis=0)
f.apply(np.sum, [HyPath('datasets/data/grid'), HyPath('datasets/data/grid2')], output_names='grid_sum')
f.multiple_apply(np.sum, [HyPath('datasets/data/grid'), HyPath('datasets/data/grid2')], output_names=['grid_sum', 'grid_sum2'])