R/hadoop_dataset.R

Defines functions sequence_file_dataset

Documented in sequence_file_dataset

#' Create a `SequenceFileDataset`.
#'
#' This function allows a user to read data from a hadoop sequence
#' file. A sequence file consists of (key value) pairs sequentially. At
#' the moment, `org.apache.hadoop.io.Text` is the only serialization type
#' being supported, and there is no compression support.
#'
#' @param filenames A `tf.string` tensor containing one or more filenames.
#'
#' @examples \dontrun{
#' dataset <- sequence_file_dataset("testdata/string.seq") %>%
#'   dataset_repeat(1)
#'
#' sess <- tf$Session()
#' iterator <- make_iterator_one_shot(dataset)
#' next_batch <- iterator_get_next(iterator)
#'
#' until_out_of_range({
#'   batch <- sess$run(next_batch)
#'   print(batch)
#' })
#' }
#'
#' @export
sequence_file_dataset <- function(filenames) {
  dataset <- tfio_lib$hadoop$SequenceFileDataset(filenames = filenames)
  as_tf_dataset(dataset)
}

Try the tfio package in your browser

Any scripts or data that you put into this service are public.

tfio documentation built on Dec. 25, 2019, 5:06 p.m.