sdf_explode: Explode data along a column
In sparklyr.nested: A 'sparklyr' Extension for Nested Data

View source: R/explode.R

sdf_explode

R Documentation

Explode data along a column

Description

Exploding an array column of length N will replicate the top level record N times. The i^th replicated record will contain a struct (not an array) corresponding to the i^th element of the exploded array. Exploding will not promote any fields or otherwise change the schema of the data.

Usage

sdf_explode(x, column, is_map = FALSE, keep_all = FALSE)

Arguments

`x`	An object (usually a `spark_tbl`) coercible to a Spark DataFrame.
`column`	The field to explode
`is_map`	Logical. The (scala) `explode` method works for both `array` and `map` column types. If the column to explode in an array, then `is_map=FALSE` will ensure that the exploded output retains the name of the array column. If however the column to explode is a map, then the map will have key/value names that will be used if `is_map=TRUE`.
`keep_all`	Logical. If `FALSE` then records where the exploded value is empty/null will be dropped.

Details

Two types of exploding are possible. The default method calls the scala explode method. This operation is supported in both Spark version > 1.6. It will however drop records where the exploding field is empty/null. Alternatively keep_all=TRUE will use the explode_outer scala method introduced in spark 2 to not drop any records.

Examples

## Not run: 
# first get some nested data
iris_tbl <- copy_to(sc, iris, name="iris")
iris_nst <- iris_tbl %>%
  sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
  group_by(Species) %>%
  summarize(data=collect_list(data))

# then explode it
iris_nst %>% sdf_explode(data)

## End(Not run)

sparklyr.nested documentation built on March 7, 2023, 6:20 p.m.