plyr: plyr: the split-apply-combine paradigm for R.

plyrR Documentation

plyr: the split-apply-combine paradigm for R.

Description

The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.

Details

The plyr functions are named according to what sort of data structure they split up and what sort of data structure they return:

a

array

l

list

d

data.frame

m

multiple inputs

r

repeat multiple times

_

nothing

So ddply takes a data frame as input and returns a data frame as output, and l_ply takes a list as input and returns nothing as output.

Row names

By design, no plyr function will preserve row names - in general it is too hard to know what should be done with them for many of the operations supported by plyr. If you want to preserve row names, use name_rows to convert them into an explicit column in your data frame, perform the plyr operations, and then use name_rows again to convert the column back into row names.

Helpers

Plyr also provides a set of helper functions for common data analysis problems:

  • arrange: re-order the rows of a data frame by specifying the columns to order by

  • mutate: add new columns or modifying existing columns, like transform, but new columns can refer to other columns that you just created.

  • summarise: like mutate but create a new data frame, not preserving any columns in the old data frame.

  • join: an adapation of merge which is more similar to SQL, and has a much faster implementation if you only want to find the first match.

  • match_df: a version of join that instead of returning the two tables combined together, only returns the rows in the first table that match the second.

  • colwise: make any function work colwise on a dataframe

  • rename: easily rename columns in a data frame

  • round_any: round a number to any degree of precision

  • count: quickly count unique combinations and return return as a data frame.


hadley/plyr documentation built on Nov. 6, 2024, 5:54 p.m.