partition: Partition data across workers in a cluster
In multidplyr: A Multi-Process 'dplyr' Backend

View source: R/partydf.R

partition

R Documentation

Partition data across workers in a cluster

Description

Partitioning ensures that all observations in a group end up on the same worker. To try and keep the observations on each worker balanced, 'partition()' uses a greedy algorithm that iteratively assigns each group to the worker that currently has the fewest rows.

Usage

partition(data, cluster)

Arguments

`data`	Dataset to partition, typically grouped. When grouped, all observations in a group will be assigned to the same cluster.
`cluster`	Cluster to use.

Value

A [party_df].

Examples

library(dplyr)
cl <- default_cluster()
cluster_library(cl, "dplyr")

mtcars2 <- partition(mtcars, cl)
mtcars2 %>% mutate(cyl2 = 2 * cyl)
mtcars2 %>% filter(vs == 1)
mtcars2 %>% group_by(cyl) %>% summarise(n())
mtcars2 %>% select(-cyl)

multidplyr documentation built on March 31, 2023, 6:42 p.m.