group_dt: Data manipulation within groups

View source: R/group_dt.R

group_dtR Documentation

Data manipulation within groups

Description

Carry out data manipulation within specified groups.

Usage

group_dt(.data, by = NULL, ...)

rowwise_dt(.data, ...)

Arguments

.data

A data.frame

by

Variables to group by,unquoted name of grouping variable of list of unquoted names of grouping variables.

...

Any data manipulation arguments that could be implemented on a data.frame.

Details

If you want to use summarise_dt and mutate_dt in group_dt, it is better to use the "by" parameter in those functions, that would be much faster because you don't have to use .SD (which takes extra time to copy).

Value

data.table

References

https://stackoverflow.com/questions/36802385/use-by-each-row-for-data-table

Examples

iris %>% group_dt(by = Species,slice_dt(1:2))
iris %>% group_dt(Species,filter_dt(Sepal.Length == max(Sepal.Length)))
iris %>% group_dt(Species,summarise_dt(new = max(Sepal.Length)))

# you can pipe in the `group_dt`
iris %>% group_dt(Species,
                  mutate_dt(max= max(Sepal.Length)) %>%
                    summarise_dt(sum=sum(Sepal.Length)))

# for users familiar with data.table, you can work on .SD directly
# following codes get the first and last row from each group
iris %>%
  group_dt(
    by = Species,
    rbind(.SD[1],.SD[.N])
  )

#' # for summarise_dt, you can use "by" to calculate within the group
mtcars %>%
  summarise_dt(
   disp = mean(disp),
   hp = mean(hp),
   by = cyl
)

  # but you could also, of course, use group_dt
 mtcars %>%
   group_dt(by =.(vs,am),
     summarise_dt(avg = mean(mpg)))

  # and list of variables could also be used
 mtcars %>%
   group_dt(by =list(vs,am),
            summarise_dt(avg = mean(mpg)))

# examples for `rowwise_dt`
df <- data.table(x = 1:2, y = 3:4, z = 4:5)

df %>% mutate_dt(m = mean(c(x, y, z)))

df %>% rowwise_dt(
  mutate_dt(m = mean(c(x, y, z)))
)

hope-data-science/tidyfst documentation built on Sept. 23, 2024, 8:05 p.m.