h2o.exportFile: Export an H2O Data Frame (H2OFrame) to a File or to a...

View source: R/export.R

h2o.exportFileR Documentation

Export an H2O Data Frame (H2OFrame) to a File or to a collection of Files.

Description

Exports an H2OFrame (which can be either VA or FV) to a file. This file may be on the H2O instace's local filesystem, or to HDFS (preface the path with hdfs://) or to S3N (preface the path with s3n://).

Usage

h2o.exportFile(
  data,
  path,
  force = FALSE,
  sep = ",",
  compression = NULL,
  parts = 1,
  header = TRUE,
  quote_header = TRUE,
  format = "csv",
  write_checksum = TRUE
)

Arguments

data

An H2OFrame object.

path

The path to write the file to. Must include the directory and also filename if exporting to a single file. May be prefaced with hdfs:// or s3n://. Each row of data appears as line of the file.

force

logical, indicates how to deal with files that already exist.

sep

The field separator character. Values on each line of the file will be separated by this character (default ",").

compression

How to compress the exported dataset (default none; gzip, bzip2 and snappy available)

parts

integer, number of part files to export to. Default is to write to a single file. Large data can be exported to multiple 'part' files, where each part file contains subset of the data. User can specify the maximum number of part files or use value -1 to indicate that H2O should itself determine the optimal number of files. Parameter path will be considered to be a path to a directory if export to multiple part files is desired. Part files conform to naming scheme 'part-m-?????'.

header

logical, indicates whether to write the header line. Default is to include the header in the output file.

quote_header

logical, indicates whether column names should be quoted. Default is to use quotes.

format

string, one of "csv" or "parquet". Default is "csv". Export to parquet is multipart and H2O itself determines the optimal number of files (1 file per chunk).

write_checksum

logical, if supported by the format (e.g. 'parquet'), export will include a checksum file for each exported data file.

Details

In the case of existing files force = TRUE will overwrite the file. Otherwise, the operation will fail.

Examples

## Not run: 
library(h2o)
h2o.init()
iris_hf <- as.h2o(iris)

# These aren't real paths
# h2o.exportFile(iris_hf, path = "/path/on/h2o/server/filesystem/iris.csv")
# h2o.exportFile(iris_hf, path = "hdfs://path/in/hdfs/iris.csv")
# h2o.exportFile(iris_hf, path = "s3n://path/in/s3/iris.csv")

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.