write.stream: Write the streaming SparkDataFrame to a data source.

Description Usage Arguments Details Note See Also Examples

Description

The data source is specified by the source and a set of options (...). If source is not specified, the default data source configured by spark.sql.sources.default will be used.

Usage

1
2
3
4
5
6
7
8
9
write_stream(
  .data,
  source = NULL,
  outputMode = NULL,
  partitionBy = NULL,
  trigger.processingTime = NULL,
  trigger.once = NULL,
  ...
)

Arguments

.data

a spark_tbl

source

a name for external data source.

outputMode

one of 'append', 'complete', 'update'.

partitionBy

a name or a list of names of columns to partition the output by on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme.

trigger.processingTime

a processing time interval as a string, e.g. '5 seconds', '1 minute'. This is a trigger that runs a query periodically based on the processing time. If value is '0 seconds', the query will run as fast as possible, this is the default. Only one trigger can be set.

trigger.once

a logical, must be set to TRUE. This is a trigger that processes only one batch of data in a streaming query then terminates the query. Only one trigger can be set.

...

additional external data source specific named options.

Details

Additionally, outputMode specifies how data of a streaming SparkDataFrame is written to a output data source. There are three modes:

Note

write.stream since 2.2.0

experimental

See Also

read.stream

Other SparkDataFrame functions: isStreaming

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
spark_session()
df <- read_stream("socket", host = "localhost", port = 9999)
is_streaming(df)
wordCounts <- df %>%
  group_by(df, value) %>%
  count

# console
q <- write_stream(wordCounts, "console", outputMode = "complete")
# text stream
q <- write_stream(df, "text", path = "/home/user/out",
                  checkpointLocation = "/home/user/cp"
                  partitionBy = c("year", "month"),
                  trigger.processingTime = "30 seconds")
# memory stream
q <- write_stream(wordCounts, "memory", queryName = "outs",
                  outputMode = "complete")
head(spark_sql("SELECT * from outs"))
queryName(q)

stopQuery(q)

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.