agg_dfm: Data Information by Group

Description Usage Arguments Details Value See Also Examples

View source: R/quest_functions.R

Description

agg_dfm evaluates a function on a set of variables in a data.frame separately for each group and combines the results back together. The rep and rtn.grp arguments determine exactly how the results are combined together. If rep = TRUE, then the result of fun is repeated for every row of the group in data[grp.nm]; If rep = FALSE, then the result of fun for each unique combination of data[grp.nm] is returned once. If rtn.grp = TRUE, then the results are returned in a data.frame where the first columns are the groups from data[grp.nm]; If rtn.grp = FALSE, then the results are returned in an atomic vector. Note, agg_dfm evaluates fun on all the variables in data[vrb.nm] as a whole, If instead, you want to evaluate fun separately for variable vrb.nm in data, then use Agg.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
agg_dfm(
  data,
  vrb.nm,
  grp.nm,
  rep = FALSE,
  rtn.grp = !rep,
  sep = ".",
  rtn.result.nm = "result",
  fun,
  ...
)

Arguments

data

data.frame of data.

vrb.nm

character vector of colnames from data specifying the set of variables to evaluate fun on.

grp.nm

character vector of colnames from data specifying the groups.

rep

logical vector of length 1 specifying whether the result of fun should be repeated for every instance of the group in data[vrb.nm] (TRUE) or only once for each group (FALSE).

rtn.grp

logical vector of length 1 specifying whether the group columns (i.e., data[grp.nm]) should be included in the return object as columns. The default is the opposite of rep as traditionally it is most important to return the group columns when rep = FALSE.

sep

character vector of length 1 specifying the string to paste the group values together with when there are multiple grouping variables (i.e., length(grp.nm) > 1). Only used if rep = FALSE and rtn.grp = FALSE.

rtn.result.nm

character vector of length 1 specifying the name for the column of results in the return object. Only used if rtn.grp = TRUE.

fun

function to evaluate each grouping of data[vrb.nm] by. This function must return an atomic vector of length 1. If not, then consider using by2 or plyr::dlply.

...

additional named arguments to fun.

Details

If rep = TRUE, then agg_dfm calls ave_dfm; if rep = FALSE, then agg_dfm calls by. When rep = FALSE and rtn.grp = TRUE, agg_dfm is very similar to plyr::ddply; when rep = FALSE and rtn.grp = FALSE, then agg_dfm is very similar to plyr::daply.

Value

result of fun applied to each grouping of data[vrb.nm]. The structure of the return object depends on the arguments rep and rtn.grp.

If rep = TRUE and rtn.grp = TRUE:

then the return object is a data.frame with nrow = nrow(data) where the first columns are data[grp.nm] and the last column is the result of fun with colname = rtn.result.nm.

If rep = TRUE and rtn.grp = FALSE:

then the return object is an atomic vector with length = nrow(data) where the values are the result of fun and the names = row.names(data).

If rep = FALSE and codertn.grp = TRUE:

then the return object is a data.frame with nrow = length(levels(interaction(data[grp.nm]))) where the first columns are the unique group combinations in data[grp.nm] and the last column is the result of fun with colname = rtn.result.nm.

If rep = FALSE and codertn.grp = FALSE:

then the return object is an atomic vector with length length(levels(interaction(data[grp.nm]))) where the values are the result of fun and the names are each group value pasted together by sep if there are multiple grouping variables (i.e., length(grp.nm) > 2).

See Also

agg aggs by2 ddply daply

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
### one grouping variable

## by in base R
by(data = airquality[c("Ozone","Solar.R")], INDICES = airquality["Month"],
   simplify = FALSE, FUN = function(dat) cor(dat, use = "complete")[1,2])

## rep = TRUE

# rtn.group = TRUE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = TRUE, rtn.grp = TRUE, fun = function(dat) cor(dat, use = "complete")[1,2])

# rtn.group = FALSE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = TRUE, rtn.grp = FALSE, fun = function(dat) cor(dat, use = "complete")[1,2])

## rep = FALSE

# rtn.group = TRUE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = FALSE, rtn.grp = TRUE, fun = function(dat) cor(dat, use = "complete")[1,2])
suppressWarnings(plyr::ddply(.data = airquality[c("Ozone","Solar.R","Month")],
   .variables = "Month", .fun = function(dat) cor(dat, use = "complete")[1,2]))

# rtn.group = FALSE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = FALSE, rtn.grp = FALSE, fun = function(dat) cor(dat, use = "complete")[1,2])
suppressWarnings(plyr::daply(.data = airquality[c("Ozone","Solar.R","Month")],
   .variables = "Month", .fun = function(dat) cor(dat, use = "complete")[1,2]))

### two grouping variables

## by in base R
by(data = mtcars[c("mpg","cyl","disp")], INDICES = mtcars[c("vs","am")],
   FUN = nrow, simplify = FALSE) # with multiple group columns

## rep = TRUE

# rtn.grp = TRUE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = TRUE, rtn.grp = TRUE, fun = nrow)

# rtn.grp = FALSE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = TRUE, rtn.grp = FALSE, fun = nrow)

## rep = FALSE

# rtn.grp = TRUE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = TRUE, fun = nrow)
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = TRUE, rtn.result.nm = "value", fun = nrow)

# rtn.grp = FALSE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = FALSE, fun = nrow)
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = FALSE, sep = "_", fun = nrow)

quest documentation built on Sept. 10, 2021, 5:07 p.m.

Related to agg_dfm in quest...