In HDF5, datasets can be located in a group (see H5Group
) or at the
root of a file (see H5File
). They can be created either with a pre-existing R-object
(arrays as well as data.frames are supported, but not lists or other complex objects), or by specifying
an explicit datatype (for available datatypes see h5types$overview
as well as the dimension.
In addition, other features are supported such as transparent compression (for which a chunk-size can be selected).
Object of class H5D
.
In order to create a dataset, the create_dataset
methods of either H5Group
or
H5File
should be used. Please see the documentation there for how to create them.
The most important parts of a dataset are the
The space of the dataset. It describes the dimension of the dataset as well as the maximum dimensions.
Can be obtained using the get_space
of the H5S
object.
The datatypes that is being used in the dataset. Can be obtained using the get_type
method.
See H5T
to get more information about using datatypes.
In order to read and write datasets, the read
and write
methods are available. In addition, the standard way of using
[
to access arrays is supported as well (see H5S_H5D_subset_assign
for more help).
Other information/action of possible interest are
The size of the dataset can be extracted using get_storage_size
The size of the dataset can be changed using the set_extent
method
Please also note the active methods
Dimension of the dataset
Maximum dimensions of the dataset
Dimension of the chunks
Returns the space, type, property-list and dimensions
new(id = NULL)
Initializes a new dataset-object. Only for internal use. Use the create_dataset
function for H5Group
and H5File
objects
Parameters
For internal use only
get_space()
This function implements the HDF5-API function H5Dget_space. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_space_status()
This function implements the HDF5-API function H5Dget_space_status. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_type(native = TRUE)
This function implements the HDF5-API function H5Dget_type. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_create_plist()
This function implements the HDF5-API function H5Dget_create_plist. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_access_plist()
This function implements the HDF5-API function H5Dget_access_plist. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_offset()
This function implements the HDF5-API function H5Dget_offset. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_storage_size()
This function implements the HDF5-API function H5Dget_storage_size. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
vlen_get_buf_size(type, space)
This function implements the HDF5-API function H5Dvlen_get_buf_size. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
vlen_reclaim(buffer, type, space,
dataset_xfer_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Dvlen_reclaim. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
read_low_level(file_space = h5const$H5S_ALL, mem_space = NULL,
mem_type = NULL, dataset_xfer_pl = h5const$H5P_DEFAULT,
flags = getOption("hdf5r.h5tor_default"), set_dim = FALSE,
dim_to_set = NULL, drop = TRUE)
This function is for advanced users. It is recommended to use read
instead or the [
interface.
This function implements the HDF5-API function H5Dread, with minor changes to the API to accommodate R.
Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
It reads the data in the dataset as specified by mem_space
and return it as an R-obj
Parameters
An HDF5-space, represented as class H5S
that determines which part
of the dataset is being read. Can also be given as an id
The space as it is represented in memory; advanced feature; may be removed in the future. Can also be given as an id.
Memory type; extracted from the dataset if null (can be passed in for efficiency reasons Can also be given as an id.
Dataset transfer property list. See H5P_DATASET_XFER
Conversion rules for integer values. See also h5const
If TRUE
, the dimension attribute is set in the return value. How it is set
is determined by dim_to_set
.
The dimension to set; Has to be numeric and needs to be specified if set_dim
is TRUE
.
If the result is a data.frame, i.e. the data-type is a compound, then the dimension is ignored as the
correct dimension is already set.
Logical. Should dimensions of length 1 be dropped (R-default for arrays)
read(args = NULL, dataset_xfer_pl = h5const$H5P_DEFAULT,
flags = getOption("hdf5r.h5tor_default"), drop = TRUE,
envir = parent.frame())
Main interface for reading data from the dataset. It is the function that is used by [
, where
all indices are being passed in the parameter args
.
Parameters
The indices for each dimension to subset given as a list. This makes this easier to use as a programmatic API.
For interactive use we recommend the use of the [
operator. If set to NULL
, the entire dataset will be read.
The environment in which to evaluate args
An object of class H5P_DATASET_XFER
.
Some flags governing edge cases of conversion from HDF5 to R. This is related to how integers are being treated and
the issue of R not being able to natively represent 64bit integers and not at all being able to represent unsigned 64bit integers
(even using add-on packages). The constants governing this are part of h5const
. The relevant ones start with the term
H5TOR
and are documented there. The default set here returns a regular 32bit integer if it doesn't lead to an overflow
and returns a 64bit integer from the bit64
package otherwise. For 64bit unsigned int that are larger than 64bit signed int,
it return a double
. This looses precision, however.
Logical. When reading data, should dimensions of size 1 be dropped.
Return
The data that was read as an R object
write_low_level(robj, file_space = h5const$H5S_ALL,
mem_space = NULL, mem_type = NULL,
dataset_xfer_pl = h5const$H5P_DEFAULT,
flush = getOption("hdf5r.flush_on_write"))
This function is for advanced users. It is recommended to use read
instead or the [<-
interface
as used for arrays.
This function implements the HDF5-API function H5Dwrite, with some changes to accommodate R.
Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
It writes that data from the robj
into the dataset.
Parameters
The object to write into the dataset
The space as it is represented in memory; advanced feature; may be removed in the future
Memory type; extracted from the dataset if null (can be passed in for efficiency reasons
An HDF5-space, represented as class H5S
that determines which part
of the dataset is being written.
Dataset transfer property list. See H5P_DATASET_XFER
Should a flush be done after the write
write(args, value, dataset_xfer_pl = h5const$H5P_DEFAULT,
envir = parent.frame())
Main interface for writing data to the dataset. It is the function that is used by [<-
, where
all indices are being passed in the parameter args
.
Parameters
The indices for each dimension to subset given as a list. This makes this easier to use as a programmatic API.
For interactive use we recommend the use of the [
operator. If set to NULL
, the entire dataset will be read.
The data to write to the dataset
The environment in which to evaluate args
An object of class H5P_DATASET_XFER
.
Return
The HDF5 dataset object, returned invisibly
set_extent(dims)
This function implements the HDF5-API function H5Dset_extent. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.
get_fill_value()
This function implements the HDF5-API function H5Pget_fill_value, automatically supplying the datatype of the dataset for convenience. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_p.html for details.
create_reference(...)
This function implements the HDF5-API function H5Rcreate. The parameters are interpreted as in '['.
The function always create H5R_DATASET_REGION
references
Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_r.html for details.
print(..., max.attributes = 10)
Prints information for the dataset
Parameters
ignored
Maximum number of attribute names to print
obj_info(remove_internal_use_only = TRUE)
This function implements the HDF5-API function H5Oget_info. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_o.html for details.
get_obj_name()
This function implements the HDF5-API function H5Iget_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_i.html for details.
create_attr(attr_name, robj = NULL, dtype = NULL, space = NULL)
This function implements the HDF5-API function H5Acreate2. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_open(attr_name)
This function implements the HDF5-API function H5Aopen. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
create_attr_by_name(attr_name, obj_name, robj = NULL,
dtype = NULL, space = NULL,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Acreate_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_open_by_name(attr_name, obj_name,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Aopen_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_open_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME,
order = h5const$H5_ITER_NATIVE,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Aopen_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_exists_by_name(attr_name, obj_name,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Aexists_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_exists(attr_name)
This function implements the HDF5-API function H5Aexists. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_rename_by_name(old_attr_name, new_attr_name, obj_name,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Arename_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_rename(old_attr_name, new_attr_name)
This function implements the HDF5-API function H5Arename. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_delete(attr_name)
This function implements the HDF5-API function H5Adelete. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_delete_by_name(attr_name, obj_name,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Adelete_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_delete_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME,
order = h5const$H5_ITER_NATIVE,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Adelete_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_info_by_name(attr_name, obj_name,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Aget_info_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_info_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME,
order = h5const$H5_ITER_NATIVE,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Aget_info_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_name_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME,
order = h5const$H5_ITER_NATIVE,
link_access_pl = h5const$H5P_DEFAULT)
This function implements the HDF5-API function H5Aget_name_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.
attr_get_number()
This function implements the HDF5-API function H5Aget_num_attrs. Please see the documentation at https://support.hdfgroup.org/HDF5/doc/RM/RM_H5A.html#Annot-NumAttrs for details.
flush(scope = h5const$H5F_SCOPE_GLOBAL)
This function implements the HDF5-API function H5Fflush. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_f.html for details.
get_filename()
This function implements the HDF5-API function H5Fget_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_f.html for details.
dims()
Get the dimension of the dataset
maxdims()
Get the maximal dimension of the dataset
chunk_dims()
Return the dimension of the chunks. NA if the dataset is not chunked
key_info()
Returns the key types as a list, consisting of type, space, dataset_create_pl,
type_size_raw, type_size_variable, dims and chunk_dims.
type_size_raw versus variable differs for variable length types, which return Inf
for type_size_variable and the underlying size for type_size_raw
# First create a file to create datasets in it
fname <- tempfile(fileext = ".h5")
file <- H5File$new(fname, mode = "a")
# Show the 3 different ways how to create a dataset
file[["directly"]] <- matrix(1:10, ncol=2)
file$create_dataset("from_robj", matrix(1:10, ncol=2))
dset <- file$create_dataset("basic", dtype=h5types$H5T_NATIVE_INT,
space=H5S$new("simple", dims=c(5, 2), maxdims=c(10,2)), chunk_dims=c(5,2))
# Different ways of reading the dataset
dset$read(args=list(1:5, 1))
#> [1] 0 0 0 0 0
dset$read(args=list(1:5, quote(expr=)))
#> [,1] [,2]
#> [1,] 0 0
#> [2,] 0 0
#> [3,] 0 0
#> [4,] 0 0
#> [5,] 0 0
dset$read(args=list(1:5, NULL))
#>
#> [1,]
#> [2,]
#> [3,]
#> [4,]
#> [5,]
dset[1:5, 1]
#> [1] 0 0 0 0 0
dset[1:5, ]
#> [,1] [,2]
#> [1,] 0 0
#> [2,] 0 0
#> [3,] 0 0
#> [4,] 0 0
#> [5,] 0 0
dset[1:5, NULL]
#>
#> [1,]
#> [2,]
#> [3,]
#> [4,]
#> [5,]
# Writing to the dataset
dset$write(args=list(1:3, 1:2), value=11:16)
dset[4:5, 1:2] <- -(1:4)
dset[,]
#> [,1] [,2]
#> [1,] 11 14
#> [2,] 12 15
#> [3,] 13 16
#> [4,] -1 -3
#> [5,] -2 -4
# Extract key information
dset$dims
#> [1] 5 2
dset$maxdims
#> [1] 10 2
dset$chunk_dims
#> [1] 5 2
dset$key_info
#> $space
#> Class: H5S
#> Type: Simple
#> Dims: 5 x 2
#> Maxdims: 10 x 2
#>
#> $type
#> Class: H5T_INTEGER
#> Datatype: H5T_STD_I32LE
#>
#> $create_pl
#> Class: H5P_DATASET_CREATE
#>
#> $type_size_raw
#> [1] 4
#>
#> $type_size_variable
#> [1] 4
#>
#> $dims
#> [1] 5 2
#>
#> $chunk_dims
#> [1] 5 2
#>
dset
#> Class: H5D
#> Dataset: /basic
#> Filename: /tmp/Rtmp0t2L4j/file3bd6258e502ef.h5
#> Access type: H5F_ACC_RDWR
#> Datatype: H5T_STD_I32LE
#> Space: Type=Simple Dims=5 x 2 Maxdims=10 x 2
#> Chunk: 5 x 2
file$close_all()
file.remove(fname)
#> [1] TRUE