Description Usage Arguments Value Examples
aggregates columns of a data.frame on 1 or multiple dimensions using 1 or multiple functions for different columns.
It is equivalent to the base R aggregate
function, except that it allows the user to
aggregate sets of columns (referred to by name or column #) with different functions... and it's fast!
1 |
df |
input data.frame to be aggregated |
by |
variable name of data.frame |
... |
method to identify the variables to aggregate and the functions used to do so. Specify the function first as a string argument and then a vector of the column names to aggregate using that function. You can specify as many different functions as necessary, but every function must follow a vector of column names. |
catN |
adds a column named "countPerBin" with the # of observations aggregated in each row of the output data.frame. |
printAgg |
prints the line of code used to |
aggregated data.frame with columns corresponding to the grouping variables in by followed by aggregated columns from df.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | require('data.table')
## establishing variables to aggregate on
lengthVs <- c('Sepal.Length', 'Petal.Length')
widthVs <- c('Sepal.Width', 'Petal.Width')
## aggregating using 2 different functions and identifying columns to aggregate by variable names
irisAgg1 <- smartAgg(df=iris, by='Species', 'mean', lengthVs, 'sum', widthVs)
## aggregating using 2 dimensions ("Specied" and "randthing")
iris$randthing <- as.character(sample(1:5, nrow(iris), replace=T))
irisAgg2 <- smartAgg(df=iris, by=c('Species', 'randthing'), 'mean', lengthVs, 'sum', widthVs, catN=T, printAgg=T)
## aggregating variables by column number
irisAgg3 <- smartAgg(df=iris, by=c('Species', 'randthing'), 'mean', 1:2, 'sum', 3:4, catN=T, printAgg=T)
## use anonymous functions
data(mtcars)
smartAgg(mtcars, by='cyl', function(x) sum(x*100), c('drat', 'mpg', 'disp'))
## use anonymous functions with more than 1 argument. Uses the provided variables for all unassigned arguments in anonymous function
smartAgg(mtcars, by='cyl', function(x,y='carb') sum(x*y), c('drat', 'mpg', 'disp'))
with(mtcars[mtcars$cyl==6,], c(sum(drat*carb), sum(mpg*carb), sum(disp*carb)))
## with anonymous functions with more than 1 argument.
## Example of possible unintended behavior - the user-provided variable is used for both and x and y in this example.
smartAgg(mtcars, by='cyl', function(x,y) sum(x*y), c('drat', 'mpg', 'disp'))
with(mtcars[mtcars$cyl==6,], c(sum(drat*drat), sum(mpg*mpg), sum(carb*carb)))
## demonstrating speed gain of smartAgg using data.table over aggregate
n <- 300000
df <- data.frame(x1=rnorm(n), x2=rbinom(n,5,0.5), x3=sample(letters, n, replace=T))
system.time(aggFast <- smartAgg(df, by='x3', 'mean', c('x1', 'x2')))
system.time(aggSlow <- aggregate(df[,c('x2', 'x1')], by=list(df$x3), FUN='mean'))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.