gapply: grouped ordered apply

Description Usage Arguments Details Value Examples

View source: R/groupedApply.R

Description

Partitions from by values in grouping column, applies a generic transform to each group and then binds the groups back together. Only advised for a moderate number of groups and better if grouping column is an index. This is powerful enough to implement "The Split-Apply-Combine Strategy for Data Analysis" https://www.jstatsoft.org/article/view/v040i01

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
gapply(
  df,
  gcolumn,
  f,
  ...,
  ocolumn = NULL,
  decreasing = FALSE,
  partitionMethod = "split",
  bindrows = TRUE,
  maxgroups = 100,
  eagerCompute = FALSE,
  restoreGroup = FALSE,
  tempNameGenerator = mk_tmp_name_source("replyr_gapply")
)

Arguments

df

remote dplyr data item

gcolumn

grouping column

f

transform function or pipeline

...

force later values to be bound by name

ocolumn

ordering column (optional)

decreasing

logical, if TRUE sort in decreasing order by ocolumn

partitionMethod

method to partition the data, one of 'group_by' (depends on f being dplyr compatible), 'split' (only works over local data frames), or 'extract'

bindrows

logical, if TRUE bind the rows back into a data item, else return split list

maxgroups

maximum number of groups to work over (intentionally not enforced if partitionMethod=='group_by')

eagerCompute

logical, if TRUE call compute on split results

restoreGroup

logical, if TRUE restore group column after apply when partitionMethod %in% c('extract', 'split')

tempNameGenerator

temp name generator produced by wrapr::mk_tmp_name_source, used to record dplyr::compute() effects.

Details

Note this is a fairly expensive operator, so it only makes sense to use in situations where f itself is fairly complicated and/or expensive.

Value

transformed frame

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
d <- data.frame(
  group = c(1, 1, 2, 2, 2),
  order = c(.1, .2, .3, .4, .5),
  values = c(10, 20, 2, 4, 8)
)

# User supplied window functions.  They depend on known column names and
# the data back-end matching function names (as cumsum).
cumulative_sum <- function(d) {
  dplyr::mutate(d, cv = cumsum(values))
}
rank_in_group <- function(d) {
  d %.>%
    dplyr::mutate(., constcol = 1) %.>%
    dplyr::mutate(., rank = cumsum(constcol)) %.>%
    dplyr::select(., -constcol)
}

for (partitionMethod in c('group_by', 'split', 'extract')) {
  print(partitionMethod)
  print('cumulative sum example')
  print(
    gapply(
      d,
      'group',
      cumulative_sum,
      ocolumn = 'order',
      partitionMethod = partitionMethod
    )
  )
  print('ranking example')
  print(
    gapply(
      d,
      'group',
      rank_in_group,
      ocolumn = 'order',
      partitionMethod = partitionMethod
    )
  )
  print('ranking example (decreasing)')
  print(
    gapply(
      d,
      'group',
      rank_in_group,
      ocolumn = 'order',
      decreasing = TRUE,
      partitionMethod = partitionMethod
    )
  )
}

WinVector/replyr documentation built on Oct. 22, 2020, 8:07 p.m.