smartAgg: Aggregate multiple columns using different functions...

Description Usage Arguments Value Examples

Description

aggregates columns of a data.frame on 1 or multiple dimensions using 1 or multiple functions for different columns. It is equivalent to the base R aggregate function, except that it allows the user to aggregate sets of columns (referred to by name or column #) with different functions... and it's fast!

Usage

1
smartAgg(df, by, ..., catN = T, printAgg = F)

Arguments

df

input data.frame to be aggregated

by

variable name of data.frame df to aggregate on. Same as by in base R aggregate function

...

method to identify the variables to aggregate and the functions used to do so. Specify the function first as a string argument and then a vector of the column names to aggregate using that function. You can specify as many different functions as necessary, but every function must follow a vector of column names.

catN

adds a column named "countPerBin" with the # of observations aggregated in each row of the output data.frame.

printAgg

prints the line of code used to

Value

aggregated data.frame with columns corresponding to the grouping variables in by followed by aggregated columns from df.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
require('data.table')

## establishing variables to aggregate on
lengthVs <- c('Sepal.Length', 'Petal.Length')
widthVs <- c('Sepal.Width', 'Petal.Width')

## aggregating using 2 different functions and identifying columns to aggregate by variable names
irisAgg1 <- smartAgg(df=iris, by='Species', 'mean', lengthVs, 'sum', widthVs)

## aggregating using 2 dimensions ("Specied" and "randthing")
iris$randthing <- as.character(sample(1:5, nrow(iris), replace=T))
irisAgg2 <- smartAgg(df=iris, by=c('Species', 'randthing'), 'mean', lengthVs, 'sum', widthVs, catN=T, printAgg=T)

## aggregating variables by column number
irisAgg3 <- smartAgg(df=iris, by=c('Species', 'randthing'), 'mean', 1:2, 'sum', 3:4, catN=T, printAgg=T)

## use anonymous functions
data(mtcars)
smartAgg(mtcars, by='cyl', function(x) sum(x*100), c('drat', 'mpg', 'disp'))

## use anonymous functions with more than 1 argument.  Uses the provided variables for all unassigned arguments in anonymous function
smartAgg(mtcars, by='cyl', function(x,y='carb') sum(x*y), c('drat', 'mpg', 'disp'))
with(mtcars[mtcars$cyl==6,], c(sum(drat*carb), sum(mpg*carb), sum(disp*carb)))

## with anonymous functions with more than 1 argument.
## Example of possible unintended behavior - the user-provided variable is used for both and x and y in this example.
smartAgg(mtcars, by='cyl', function(x,y) sum(x*y), c('drat', 'mpg', 'disp'))
with(mtcars[mtcars$cyl==6,], c(sum(drat*drat), sum(mpg*mpg), sum(carb*carb)))

## demonstrating speed gain of smartAgg using data.table over aggregate
n <- 300000
df <- data.frame(x1=rnorm(n), x2=rbinom(n,5,0.5), x3=sample(letters, n, replace=T))
system.time(aggFast <- smartAgg(df, by='x3', 'mean', c('x1', 'x2')))
system.time(aggSlow <- aggregate(df[,c('x2', 'x1')], by=list(df$x3), FUN='mean'))

brooksandrew/Rsenal documentation built on May 13, 2019, 7:50 a.m.