coalesce: Reduce partitions OR Find first non-missing element

Description Usage Arguments

View source: R/columns.R

Description

coalesce is used twice in Spark. The first use case is to reduce the number of partitions of a Spark DataFrame without a shuffle stage (in contrast to repartition which requires a shuffle). The other use case is for ETL where it can be used on a Column object to find the first non-missing element. See ?dplyr::coalese for more info.

Usage

1

Arguments

...

For the ETL case, this is the olumn or objects coercible to Column to be coalesced. For the partition reducing case the first argument should be a spark_tbl and the second should be an integer specifing the number of partitions to reduce to.


danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.