mutate_subset: Propagate a calculation performed on a subset of data to the...

Description Usage Arguments Details Examples

View source: R/major_mutate_variations.R

Description

This function performs dplyr::summarize on a .filtered subset of data. Then it applies the result to all observations (or all observations in the group, if applied to grouped data), filling in columns of the data with the summarize results, as though dplyr::mutate had been run.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
mutate_subset(
  .df,
  ...,
  .filter,
  .group_i = TRUE,
  .i = NULL,
  .t = NULL,
  .d = NA,
  .uniqcheck = FALSE,
  .setpanel = TRUE
)

Arguments

.df

Data frame or tibble.

...

Specification to be passed to dplyr::summarize().

.filter

Unquoted logical condition for which observations dplyr::summarize() operations are to be run on.

.group_i

By default, if .i is specified or found in the data, mutate_cascade will group the data by .i, overwriting any grouping already implemented. Set .group_i = FALSE to avoid this.

.i

Quoted or unquoted variables that identify the individual cases. Note that setting any one of .i, .t, or .d will override all three already applied to the data, and will return data that is as_pibble()d with all three, unless .setpanel=FALSE.

.t

Quoted or unquoted variable indicating the time. pmdplyr accepts two kinds of time variables: numeric variables where a fixed distance .d will take you from one observation to the next, or, if .d=0, any standard variable type with an order. Consider using the time_variable() function to create the necessary variable if your data uses a Date variable for time.

.d

Number indicating the gap in .t between one period and the next. For example, if .t indicates a single day but data is collected once a week, you might set .d=7. To ignore gap length and assume that "one period ago" is always the most recent prior observation in the data, set .d=0. The default .d = NA here will become .d = 1 if either .i or .t are declared.

.uniqcheck

Logical parameter. Set to TRUE to always check whether .i and .t uniquely identify observations in the data. By default this is set to FALSE and the check is only performed once per session, and only if at least one of .i, .t, or .d is set.

.setpanel

Logical parameter. TRUE by default, and so if .i, .t, and/or .d are declared, will return a pibble set in that way.

Details

One application of this is to partially widen data. For example, if your analysis uses childhood height as a control variable in all years, mutate_subset() could be used to easily generate a height_age10 variable from a height variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(SPrail)
# In preparation for fitting a choice model for how people choose ticket type,
# I'd like to know the price of a "Promo" ticket for a given route
# So that I can compare each other type of ticket price to that type
SPrail <- SPrail %>%
  mutate_subset(
    promo_price = mean(price, na.rm = TRUE),
    .filter = fare == "Promo",
    .i = c(origin, destination)
  )

pmdplyr documentation built on July 2, 2020, 4:08 a.m.