write_hdd: Saves or appends a data set into a HDD file

View source: R/core.R

write_hddR Documentation

Saves or appends a data set into a HDD file

Description

This function saves in-memory/HDD data sets into HDD repositories. Useful to append several data sets.

Usage

write_hdd(
  x,
  dir,
  chunkMB = Inf,
  rowsPerChunk,
  compress = 50,
  add = FALSE,
  replace = FALSE,
  showWarning,
  ...
)

Arguments

x

A data set.

dir

The HDD repository, i.e. the directory where the HDD data is.

chunkMB

If the data has to be split in several files of chunkMB sizes. Default is Inf.

rowsPerChunk

Integer, default is missing. Alternative to the argument chunkMB. If provided, the data will be split in several files of rowsPerChunk rows.

compress

Compression rate to be applied by write_fst. Default is 50.

add

Should the file be added to the existing repository? Default is FALSE.

replace

If add = FALSE, should any existing document be replaced? Default is FALSE.

showWarning

If the data x has no observation, then a warning is raised if showWarning = TRUE. By default, it occurs only if write_hdd is NOT called within a function.

...

Not currently used.

Details

Creating a HDD data set with this function always create an additional file named “_hdd.txt” in the HDD folder. This file contains summary information on the data: the number of rows, the number of variables, the first five lines and a log of how the HDD data set has been created. To access the log directly from R, use the function origin.

Value

This function does not return anything in R. Instead it creates a folder on disk containing .fst files. These files represent the data that has been converted to the hdd format.

You can then read the created data with the function hdd().

Author(s)

Laurent Berge

See Also

See hdd, sub-.hdd and cash-.hdd for the extraction and manipulation of out of memory data. For importation of HDD data sets from text files: see txt2hdd.

See hdd_slice to apply functions to chunks of data (and create HDD objects) and hdd_merge to merge large files.

To create/reshape HDD objects from memory or from other HDD objects, see write_hdd.

To display general information from HDD objects: origin, summary.hdd, print.hdd, dim.hdd and names.hdd.

Examples


# Toy example with iris data

# Let's create a HDD data set from iris data
hdd_path = tempfile() # => folder where the data will be saved
write_hdd(iris, hdd_path)
# Let's add data to it
for(i in 1:10) write_hdd(iris, hdd_path, add = TRUE)

base_hdd = hdd(hdd_path)
summary(base_hdd) # => 11 files, 1650 lines, 48.7KB on disk

# Let's save the iris data by chunks of 1KB
# we use replace = TRUE to delete the previous data
write_hdd(iris, hdd_path, chunkMB = 0.001, replace = TRUE)

base_hdd = hdd(hdd_path)
summary(base_hdd) # => 8 files, 150 lines, 10.2KB on disk


hdd documentation built on Aug. 25, 2023, 5:19 p.m.