stream_read_parquet: Read Parquet Stream
In sparklyr: R Interface to Apache Spark

stream_read_parquet

R Documentation

Read Parquet Stream

Description

Reads a parquet stream as a Spark dataframe stream.

Usage

stream_read_parquet(
  sc,
  path,
  name = NULL,
  columns = NULL,
  options = list(),
  ...
)

Arguments

`sc`	A `spark_connection`.
`path`	The path to the file. Needs to be accessible from the cluster. Supports the ‘⁠"hdfs://"⁠’, ‘⁠"s3a://"⁠’ and ‘⁠"file://"⁠’ protocols.
`name`	The name to assign to the newly generated stream.
`columns`	A vector of column names or a named vector of column types. If specified, the elements can be `"binary"` for `BinaryType`, `"boolean"` for `BooleanType`, `"byte"` for `ByteType`, `"integer"` for `IntegerType`, `"integer64"` for `LongType`, `"double"` for `DoubleType`, `"character"` for `StringType`, `"timestamp"` for `TimestampType` and `"date"` for `DateType`.
`options`	A list of strings with additional options.
`...`	Optional arguments; currently unused.

Examples

## Not run: 

sc <- spark_connect(master = "local")

sdf_len(sc, 10) %>% spark_write_parquet("parquet-in")

stream <- stream_read_parquet(sc, "parquet-in") %>% stream_write_parquet("parquet-out")

stream_stop(stream)

## End(Not run)

sparklyr documentation built on Nov. 2, 2023, 5:09 p.m.