coalesce
is used twice in Spark. The first use case is
to reduce the number of partitions of a Spark DataFrame
without
a shuffle stage (in contrast to repartition
which requires a shuffle).
The other use case is for ETL where it can be used on a Column object to
find the first non-missing element. See ?dplyr::coalese
for more info.
1 |
... |
For the ETL case, this is the olumn or objects coercible to
Column to be coalesced. For the partition reducing case the first argument
should be a |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.