In HDF5, datasets can be located in a group (see H5Group) or at the root of a file (see H5File). They can be created either with a pre-existing R-object (arrays as well as data.frames are supported, but not lists or other complex objects), or by specifying an explicit datatype (for available datatypes see h5types$overview as well as the dimension. In addition, other features are supported such as transparent compression (for which a chunk-size can be selected).

Value

Object of class H5D.

Details

In order to create a dataset, the create_dataset methods of either H5Group or H5File should be used. Please see the documentation there for how to create them.

The most important parts of a dataset are the

Space

The space of the dataset. It describes the dimension of the dataset as well as the maximum dimensions. Can be obtained using the get_space of the H5S object.

Datatype

The datatypes that is being used in the dataset. Can be obtained using the get_type method. See H5T to get more information about using datatypes.

In order to read and write datasets, the read and write methods are available. In addition, the standard way of using [ to access arrays is supported as well (see H5S_H5D_subset_assign for more help).

Other information/action of possible interest are

Storage size

The size of the dataset can be extracted using get_storage_size

Size change

The size of the dataset can be changed using the set_extent method

Please also note the active methods

dims

Dimension of the dataset

maxdims

Maximum dimensions of the dataset

chunk_dims

Dimension of the chunks

key_info

Returns the space, type, property-list and dimensions

Methods

new(id = NULL)

Initializes a new dataset-object. Only for internal use. Use the create_dataset function for H5Group and H5File objects

Parameters

id

For internal use only

get_space()

This function implements the HDF5-API function H5Dget_space. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_space_status()

This function implements the HDF5-API function H5Dget_space_status. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_type(native = TRUE)

This function implements the HDF5-API function H5Dget_type. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_create_plist()

This function implements the HDF5-API function H5Dget_create_plist. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_access_plist()

This function implements the HDF5-API function H5Dget_access_plist. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_offset()

This function implements the HDF5-API function H5Dget_offset. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_storage_size()

This function implements the HDF5-API function H5Dget_storage_size. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

vlen_get_buf_size(type, space)

This function implements the HDF5-API function H5Dvlen_get_buf_size. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

vlen_reclaim(buffer, type, space, dataset_xfer_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Dvlen_reclaim. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

read_low_level(file_space = h5const$H5S_ALL, mem_space = NULL, mem_type = NULL, dataset_xfer_pl = h5const$H5P_DEFAULT, flags = getOption("hdf5r.h5tor_default"), set_dim = FALSE, dim_to_set = NULL, drop = TRUE)

This function is for advanced users. It is recommended to use read instead or the [ interface. This function implements the HDF5-API function H5Dread, with minor changes to the API to accommodate R. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details. It reads the data in the dataset as specified by mem_space and return it as an R-obj

Parameters

file_space

An HDF5-space, represented as class H5S that determines which part of the dataset is being read. Can also be given as an id

mem_space

The space as it is represented in memory; advanced feature; may be removed in the future. Can also be given as an id.

mem_type

Memory type; extracted from the dataset if null (can be passed in for efficiency reasons Can also be given as an id.

dataset_xfer_pl

Dataset transfer property list. See H5P_DATASET_XFER

flags

Conversion rules for integer values. See also h5const

set_dim

If TRUE, the dimension attribute is set in the return value. How it is set is determined by dim_to_set.

dim_to_set

The dimension to set; Has to be numeric and needs to be specified if set_dim is TRUE. If the result is a data.frame, i.e. the data-type is a compound, then the dimension is ignored as the correct dimension is already set.

drop

Logical. Should dimensions of length 1 be dropped (R-default for arrays)

read(args = NULL, dataset_xfer_pl = h5const$H5P_DEFAULT, flags = getOption("hdf5r.h5tor_default"), drop = TRUE, envir = parent.frame())

Main interface for reading data from the dataset. It is the function that is used by [, where all indices are being passed in the parameter args.

Parameters

args

The indices for each dimension to subset given as a list. This makes this easier to use as a programmatic API. For interactive use we recommend the use of the [ operator. If set to NULL, the entire dataset will be read.

envir

The environment in which to evaluate args

dataset_xfer_pl

An object of class H5P_DATASET_XFER.

flags

Some flags governing edge cases of conversion from HDF5 to R. This is related to how integers are being treated and the issue of R not being able to natively represent 64bit integers and not at all being able to represent unsigned 64bit integers (even using add-on packages). The constants governing this are part of h5const. The relevant ones start with the term H5TOR and are documented there. The default set here returns a regular 32bit integer if it doesn't lead to an overflow and returns a 64bit integer from the bit64 package otherwise. For 64bit unsigned int that are larger than 64bit signed int, it return a double. This looses precision, however.

drop

Logical. When reading data, should dimensions of size 1 be dropped.

Return

The data that was read as an R object

write_low_level(robj, file_space = h5const$H5S_ALL, mem_space = NULL, mem_type = NULL, dataset_xfer_pl = h5const$H5P_DEFAULT, flush = getOption("hdf5r.flush_on_write"))

This function is for advanced users. It is recommended to use read instead or the [<- interface as used for arrays. This function implements the HDF5-API function H5Dwrite, with some changes to accommodate R. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details. It writes that data from the robj into the dataset.

Parameters

robj

The object to write into the dataset

mem_space

The space as it is represented in memory; advanced feature; may be removed in the future

mem_type

Memory type; extracted from the dataset if null (can be passed in for efficiency reasons

file_space

An HDF5-space, represented as class H5S that determines which part of the dataset is being written.

dataset_xfer_pl

Dataset transfer property list. See H5P_DATASET_XFER

flush

Should a flush be done after the write

write(args, value, dataset_xfer_pl = h5const$H5P_DEFAULT, envir = parent.frame())

Main interface for writing data to the dataset. It is the function that is used by [<-, where all indices are being passed in the parameter args.

Parameters

args

The indices for each dimension to subset given as a list. This makes this easier to use as a programmatic API. For interactive use we recommend the use of the [ operator. If set to NULL, the entire dataset will be read.

value

The data to write to the dataset

envir

The environment in which to evaluate args

dataset_xfer_pl

An object of class H5P_DATASET_XFER.

Return

The HDF5 dataset object, returned invisibly

set_extent(dims)

This function implements the HDF5-API function H5Dset_extent. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_d.html for details.

get_fill_value()

This function implements the HDF5-API function H5Pget_fill_value, automatically supplying the datatype of the dataset for convenience. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_p.html for details.

create_reference(...)

This function implements the HDF5-API function H5Rcreate. The parameters are interpreted as in '['. The function always create H5R_DATASET_REGION references Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_r.html for details.

print(..., max.attributes = 10)

Prints information for the dataset

Parameters

...

ignored

max.attributes

Maximum number of attribute names to print

obj_info(remove_internal_use_only = TRUE)

This function implements the HDF5-API function H5Oget_info. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_o.html for details.

get_obj_name()

This function implements the HDF5-API function H5Iget_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_i.html for details.

create_attr(attr_name, robj = NULL, dtype = NULL, space = NULL)

This function implements the HDF5-API function H5Acreate2. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_open(attr_name)

This function implements the HDF5-API function H5Aopen. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

create_attr_by_name(attr_name, obj_name, robj = NULL, dtype = NULL, space = NULL, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Acreate_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_open_by_name(attr_name, obj_name, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Aopen_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_open_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME, order = h5const$H5_ITER_NATIVE, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Aopen_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_exists_by_name(attr_name, obj_name, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Aexists_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_exists(attr_name)

This function implements the HDF5-API function H5Aexists. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_rename_by_name(old_attr_name, new_attr_name, obj_name, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Arename_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_rename(old_attr_name, new_attr_name)

This function implements the HDF5-API function H5Arename. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_delete(attr_name)

This function implements the HDF5-API function H5Adelete. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_delete_by_name(attr_name, obj_name, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Adelete_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_delete_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME, order = h5const$H5_ITER_NATIVE, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Adelete_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_info_by_name(attr_name, obj_name, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Aget_info_by_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_info_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME, order = h5const$H5_ITER_NATIVE, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Aget_info_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_name_by_idx(n, obj_name, idx_type = h5const$H5_INDEX_NAME, order = h5const$H5_ITER_NATIVE, link_access_pl = h5const$H5P_DEFAULT)

This function implements the HDF5-API function H5Aget_name_by_idx. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_a.html for details.

attr_get_number()

This function implements the HDF5-API function H5Aget_num_attrs. Please see the documentation at https://support.hdfgroup.org/HDF5/doc/RM/RM_H5A.html#Annot-NumAttrs for details.

flush(scope = h5const$H5F_SCOPE_GLOBAL)

This function implements the HDF5-API function H5Fflush. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_f.html for details.

get_filename()

This function implements the HDF5-API function H5Fget_name. Please see the documentation at https://docs.hdfgroup.org/hdf5/v1_10/group___h5_f.html for details.

dims()

Get the dimension of the dataset

maxdims()

Get the maximal dimension of the dataset

chunk_dims()

Return the dimension of the chunks. NA if the dataset is not chunked

key_info()

Returns the key types as a list, consisting of type, space, dataset_create_pl, type_size_raw, type_size_variable, dims and chunk_dims. type_size_raw versus variable differs for variable length types, which return Inf for type_size_variable and the underlying size for type_size_raw

Author

Holger Hoefling

Examples

# First create a file to create datasets in it
fname <- tempfile(fileext = ".h5")
file <- H5File$new(fname, mode = "a")

# Show the 3 different ways how to create a dataset
file[["directly"]] <- matrix(1:10, ncol=2)
file$create_dataset("from_robj", matrix(1:10, ncol=2))
dset <- file$create_dataset("basic", dtype=h5types$H5T_NATIVE_INT,
             space=H5S$new("simple", dims=c(5, 2), maxdims=c(10,2)), chunk_dims=c(5,2))

# Different ways of reading the dataset
dset$read(args=list(1:5, 1))
#> [1] 0 0 0 0 0
dset$read(args=list(1:5, quote(expr=)))
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0
#> [3,]    0    0
#> [4,]    0    0
#> [5,]    0    0
dset$read(args=list(1:5, NULL))
#>     
#> [1,]
#> [2,]
#> [3,]
#> [4,]
#> [5,]
dset[1:5, 1]
#> [1] 0 0 0 0 0
dset[1:5, ]
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0
#> [3,]    0    0
#> [4,]    0    0
#> [5,]    0    0
dset[1:5, NULL]
#>     
#> [1,]
#> [2,]
#> [3,]
#> [4,]
#> [5,]

# Writing to the dataset
dset$write(args=list(1:3, 1:2), value=11:16)
dset[4:5, 1:2] <- -(1:4)
dset[,]
#>      [,1] [,2]
#> [1,]   11   14
#> [2,]   12   15
#> [3,]   13   16
#> [4,]   -1   -3
#> [5,]   -2   -4

# Extract key information
dset$dims
#> [1] 5 2
dset$maxdims
#> [1] 10  2
dset$chunk_dims
#> [1] 5 2
dset$key_info
#> $space
#> Class: H5S
#> Type: Simple
#> Dims: 5 x 2
#> Maxdims: 10 x 2
#> 
#> $type
#> Class: H5T_INTEGER
#> Datatype: H5T_STD_I32LE
#> 
#> $create_pl
#> Class: H5P_DATASET_CREATE
#> 
#> $type_size_raw
#> [1] 4
#> 
#> $type_size_variable
#> [1] 4
#> 
#> $dims
#> [1] 5 2
#> 
#> $chunk_dims
#> [1] 5 2
#> 
dset
#> Class: H5D
#> Dataset: /basic
#> Filename: /tmp/Rtmp0t2L4j/file3bd6258e502ef.h5
#> Access type: H5F_ACC_RDWR
#> Datatype: H5T_STD_I32LE
#> Space: Type=Simple     Dims=5 x 2     Maxdims=10 x 2
#> Chunk: 5 x 2

file$close_all()
file.remove(fname)
#> [1] TRUE