sdf_nest: Nest data in a Spark Dataframe
In mitre/sparklyr.nested: A 'sparklyr' Extension for Nested Data

View source: R/nest.R

sdf_nest

R Documentation

Nest data in a Spark Dataframe

Description

This function is like tidyr::nest. Calling this function will not aggregate over other columns. Rather the output has the same number of rows/records as the input. See examples of how to achieve row reduction by aggregating elements using collect_list, which is a Spark SQL function

Usage

sdf_nest(x, ..., .key = "data")

Arguments

`x`	A Spark dataframe.
`...`	Columns to nest.
`.key`	Character. A name for the new column containing nested fields

Examples

## Not run: 
# produces a dataframe with an array of characteristics nested under
# each unique species identifier
iris_tbl <- copy_to(sc, iris, name="iris")
iris_tbl %>%
  sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
  group_by(Species) %>%
  summarize(data=collect_list(data))

## End(Not run)

mitre/sparklyr.nested documentation built on Feb. 22, 2023, 10:09 a.m.