multiplyr: Data Manipulation with Parellelism and Shared Memory Matrices

Description Major differences from dplyr Standard dplyr-like functions Parallel functions Additional data frame functions Data manipulation adjuncts

Description

Provides a new form of data frame backed by shared memory matrices and a way to manipulate them. Upon creation these data frames are shared across multiple local nodes to allow for simple parallel processing. Run the following command for a more thorough explanation: vignette("basics")

Major differences from dplyr

summarise with dplyr will return a single number, but here it will return N values depending on how many nodes there are. Typically you should follow summarise with reduce, which is run locally.

Standard dplyr-like functions

arrange Sort data
distinct Select unique rows or unique combinations of variables
filter Filter data
group_by Group data
group_sizes Return size of groups
groupwise Use grouped data (also known as ungroup)
mutate Change values of existing variables (and create new ones)
n_groups Return number of groups
rename Rename variables
rowwise Use data as individual rows
select Retain only specified variables
slice Select rows by position
summarise Summarise data
transmute Change variables and drop all others

Parallel functions

partition_even Partition data evenly amongst cluster nodes
partition_group Partition data so that each group is wholly on a node
within_group Execute code within a group
within_node Execute code within a group

Additional data frame functions

Multiplyr Create new parallel data frame
define Define new variables
nsa No strings attached mode
reduce Summarise locally only
regroup Return to grouped data
undefine Delete variables

Data manipulation adjuncts

between Tests whether elements of a vector lie between two values (inclusively)
cumall Cumulative all
cumany Cumulative any
cummean Cumulative mean
first Returns first value in vector
last Returns last value in vector
lag Offset x backwards by n
lead Offset x forwards by n
n Number of items in current group
nth Return the nth item from a vector

multiplyr documentation built on May 30, 2017, 12:09 a.m.