coalesce: Reduce partitions OR Find first non-missing element
In danzafar/tidyspark: A Tidy Interface to Spark

coalesce is used twice in Spark. The first use case is to reduce the number of partitions of a Spark DataFrame without a shuffle stage (in contrast to repartition which requires a shuffle). The other use case is for ETL where it can be used on a Column object to find the first non-missing element. See ?dplyr::coalese for more info.

1	coalesce(...)

...

For the ETL case, this is the olumn or objects coercible to Column to be coalesced. For the partition reducing case the first argument should be a spark_tbl and the second should be an integer specifing the number of partitions to reduce to.

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.