dplyr_wrapper: Wrapper for dplyr's summarize

Description Usage Arguments Value Examples

View source: R/dplyr_wrapper.R

Description

This function wraps dplyr's summarize() function in a convenient way. The user only needs to define functions on the dataset with a named vector or list (with atomic entries of length 1) as return.

Usage

1
dplyr_wrapper(data, group_by, fun, check_fun = TRUE)

Arguments

data

('dataframe'). A dataframe with a grouping variable.

group_by

('character()'). Name of column, which contains identifiers on which the dataset should be grouped by. E.g. different user IDs.

fun

('function'). Must be a function, which has a dataframe as input and a (named) vector of desired length as output.

check_fun

('logical(1)'). If TRUE, fun(data) will be evaluated and checked if the outcome is of correct form. Set to FALSE if evaluation on the whole dataset takes too long.

Value

('dataframe')

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Number of used chrome apps
fun1 = function(data) {
  c(uses_chrome = nrow(
    dplyr::filter(data, RUNNING_TASKS_baseActivity_mPackage == "com.android.chrome"))
  )
}
dplyr_wrapper(data = studentlife_small, group_by = "userId", fun = fun1)

# mean, max, sd of a column
fun2 = function(data) {
  c(mean_sepal_length = mean(data$Sepal.Length),
    max_sepal_length = max(data$Sepal.Length),
    sd_sepal_length = sd(data$Sepal.Length)
  )
}
dplyr_wrapper(data = iris, group_by = "Species", fun = fun2)

# return list
fun3 = function(data) {
  list(mean_sepal_length = mean(data$Sepal.Length),
    max_sepal_length = max(data$Sepal.Length),
    sd_sepal_length = sd(data$Sepal.Length)
  )
}
dplyr_wrapper(data = iris, group_by = "Species", fun = fun3)

# group by two columns
df = data.frame(id = c(rep(1, 10), rep(2, 10)))
df$task = rep(c(rep("task1", 5), rep("task2", 5)), 2)
df$hour = rep(c(rep("hour1", 3), rep("hour2", 2), rep("hour1", 2), rep("hour2", 3)), 2)
df$x = 1:20
fun4 = function(data) c(mean_x = mean(data$x))
dplyr_wrapper(data = df, group_by = c("id", "task"), fun = fun4)

fxtract documentation built on July 8, 2020, 5:43 p.m.