R/across.R

Defines functions quo_eval_fns apply_unpack_spec df_unpack c_across_missing_cols_deprecate_warn across_missing_cols_deprecate_warn is_inlinable_lambda as_across_fn_call new_expanded_quosures expand_across expand_if_across dplyr_quosures dplyr_quosure_name new_dplyr_quosure c_across_setup quo_set_env_to_data_mask_top data_mask_top uninline across_setup across_glue_mask c_across if_across if_all if_any across

Documented in across c_across if_all if_any

#' Apply a function (or functions) across multiple columns
#'
#' @description
#' `across()` makes it easy to apply the same transformation to multiple
#' columns, allowing you to use [select()] semantics inside in "data-masking"
#' functions like [summarise()] and [mutate()]. See `vignette("colwise")` for
#'  more details.
#'
#' `if_any()` and `if_all()` apply the same
#' predicate function to a selection of columns and combine the
#' results into a single logical vector: `if_any()` is `TRUE` when
#' the predicate is `TRUE` for *any* of the selected columns, `if_all()`
#' is `TRUE` when the predicate is `TRUE` for *all* selected columns.
#'
#' If you just need to select columns without applying a transformation to each
#' of them, then you probably want to use [pick()] instead.
#'
#' `across()` supersedes the family of "scoped variants" like
#' `summarise_at()`, `summarise_if()`, and `summarise_all()`.
#'
#' @param .cols <[`tidy-select`][dplyr_tidy_select]> Columns to transform.
#'   You can't select grouping columns because they are already automatically
#'   handled by the verb (i.e. [summarise()] or [mutate()]).
#' @param .fns Functions to apply to each of the selected columns.
#'   Possible values are:
#'
#'   - A function, e.g. `mean`.
#'   - A purrr-style lambda, e.g. `~ mean(.x, na.rm = TRUE)`
#'   - A named list of functions or lambdas, e.g.
#'     `list(mean = mean, n_miss = ~ sum(is.na(.x))`. Each function is applied
#'     to each column, and the output is named by combining the function name
#'     and the column name using the glue specification in `.names`.
#'
#'   Within these functions you can use [cur_column()] and [cur_group()]
#'   to access the current column and grouping keys respectively.
#' @param ... `r lifecycle::badge("deprecated")`
#'
#'   Additional arguments for the function calls in `.fns` are no longer
#'   accepted in `...` because it's not clear when they should be evaluated:
#'   once per `across()` or once per group? Instead supply additional arguments
#'   directly in `.fns` by using a lambda. For example, instead of
#'   `across(a:b, mean, na.rm = TRUE)` write
#'   `across(a:b, ~ mean(.x, na.rm = TRUE))`.
#' @param .names A glue specification that describes how to name the output
#'   columns. This can use `{.col}` to stand for the selected column name, and
#'   `{.fn}` to stand for the name of the function being applied. The default
#'   (`NULL`) is equivalent to `"{.col}"` for the single function case and
#'   `"{.col}_{.fn}"` for the case where a list is used for `.fns`.
#' @param .unpack `r lifecycle::badge("experimental")`
#'
#'   Optionally [unpack][tidyr::unpack()] data frames returned by functions in
#'   `.fns`, which expands the df-columns out into individual columns, retaining
#'   the number of rows in the data frame.
#'
#'   - If `FALSE`, the default, no unpacking is done.
#'   - If `TRUE`, unpacking is done with a default glue specification of
#'     `"{outer}_{inner}"`.
#'   - Otherwise, a single glue specification can be supplied to describe how to
#'     name the unpacked columns. This can use `{outer}` to refer to the name
#'     originally generated by `.names`, and `{inner}` to refer to the names of
#'     the data frame you are unpacking.
#'
#' @returns
#' `across()` typically returns a tibble with one column for each column in
#' `.cols` and each function in `.fns`. If `.unpack` is used, more columns may
#' be returned depending on how the results of `.fns` are unpacked.
#'
#' `if_any()` and `if_all()` return a logical vector.
#'
#' @section Timing of evaluation:
#' R code in dplyr verbs is generally evaluated once per group.
#' Inside `across()` however, code is evaluated once for each
#' combination of columns and groups. If the evaluation timing is
#' important, for example if you're generating random variables, think
#' about when it should happen and place your code in consequence.
#'
#' ```{r}
#' gdf <-
#'   tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) %>%
#'   group_by(g)
#'
#' set.seed(1)
#'
#' # Outside: 1 normal variate
#' n <- rnorm(1)
#' gdf %>% mutate(across(v1:v2, ~ .x + n))
#'
#' # Inside a verb: 3 normal variates (ngroup)
#' gdf %>% mutate(n = rnorm(1), across(v1:v2, ~ .x + n))
#'
#' # Inside `across()`: 6 normal variates (ncol * ngroup)
#' gdf %>% mutate(across(v1:v2, ~ .x + rnorm(1)))
#' ```
#'
#' @examples
#' # For better printing
#' iris <- as_tibble(iris)
#'
#' # across() -----------------------------------------------------------------
#' # Different ways to select the same set of columns
#' # See <https://tidyselect.r-lib.org/articles/syntax.html> for details
#' iris %>%
#'   mutate(across(c(Sepal.Length, Sepal.Width), round))
#' iris %>%
#'   mutate(across(c(1, 2), round))
#' iris %>%
#'   mutate(across(1:Sepal.Width, round))
#' iris %>%
#'   mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round))
#'
#' # Using an external vector of names
#' cols <- c("Sepal.Length", "Petal.Width")
#' iris %>%
#'   mutate(across(all_of(cols), round))
#'
#' # If the external vector is named, the output columns will be named according
#' # to those names
#' names(cols) <- tolower(cols)
#' iris %>%
#'   mutate(across(all_of(cols), round))
#'
#' # A purrr-style formula
#' iris %>%
#'   group_by(Species) %>%
#'   summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))
#'
#' # A named list of functions
#' iris %>%
#'   group_by(Species) %>%
#'   summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd)))
#'
#' # Use the .names argument to control the output names
#' iris %>%
#'   group_by(Species) %>%
#'   summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"))
#' iris %>%
#'   group_by(Species) %>%
#'   summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd), .names = "{.col}.{.fn}"))
#'
#' # If a named external vector is used for column selection, .names will use
#' # those names when constructing the output names
#' iris %>%
#'   group_by(Species) %>%
#'   summarise(across(all_of(cols), mean, .names = "mean_{.col}"))
#'
#' # When the list is not named, .fn is replaced by the function's position
#' iris %>%
#'   group_by(Species) %>%
#'   summarise(across(starts_with("Sepal"), list(mean, sd), .names = "{.col}.fn{.fn}"))
#'
#' # When the functions in .fns return a data frame, you typically get a
#' # "packed" data frame back
#' quantile_df <- function(x, probs = c(0.25, 0.5, 0.75)) {
#'   tibble(quantile = probs, value = quantile(x, probs))
#' }
#'
#' iris %>%
#'   reframe(across(starts_with("Sepal"), quantile_df))
#'
#' # Use .unpack to automatically expand these packed data frames into their
#' # individual columns
#' iris %>%
#'   reframe(across(starts_with("Sepal"), quantile_df, .unpack = TRUE))
#'
#' # .unpack can utilize a glue specification if you don't like the defaults
#' iris %>%
#'   reframe(across(starts_with("Sepal"), quantile_df, .unpack = "{outer}.{inner}"))
#'
#' # This is also useful inside mutate(), for example, with a multi-lag helper
#' multilag <- function(x, lags = 1:3) {
#'   names(lags) <- as.character(lags)
#'   purrr::map_dfr(lags, lag, x = x)
#' }
#'
#' iris %>%
#'   group_by(Species) %>%
#'   mutate(across(starts_with("Sepal"), multilag, .unpack = TRUE)) %>%
#'   select(Species, starts_with("Sepal"))
#'
#' # if_any() and if_all() ----------------------------------------------------
#' iris %>%
#'   filter(if_any(ends_with("Width"), ~ . > 4))
#' iris %>%
#'   filter(if_all(ends_with("Width"), ~ . > 2))
#'
#' @export
#' @seealso [c_across()] for a function that returns a vector
across <- function(.cols,
                   .fns,
                   ...,
                   .names = NULL,
                   .unpack = FALSE) {
  mask <- peek_mask()
  caller_env <- caller_env()

  across_if_fn <- context_peek_bare("across_if_fn") %||% "across"
  error_call <- context_peek_bare("across_frame") %||% current_env()

  .cols <- enquo(.cols)
  fns_quo <- enquo(.fns)
  fns_quo_env <- quo_get_env(fns_quo)

  if (quo_is_missing(.cols)) {
    across_missing_cols_deprecate_warn()
    .cols <- quo_set_expr(.cols, expr(everything()))
  }

  if (is_missing(.fns)) {
    # Silent restoration to old defaults of `.fns` for now.
    # TODO: Escalate this to formal deprecation.
    .fns <- NULL

    # Catch if dots are non-empty with no `.fns` supplied.
    # Mainly catches typos, e.g. `.funs` (#6638).
    check_dots_empty0(...)
  } else {
    .fns <- quo_eval_fns(fns_quo, mask = fns_quo_env, error_call = error_call)
  }

  if (!is_bool(.unpack) && !is_string(.unpack)) {
    stop_input_type(.unpack, "`TRUE`, `FALSE`, or a single string")
  }

  if (is_string(.unpack)) {
    unpack_spec <- .unpack
    .unpack <- TRUE
  } else {
    unpack_spec <- "{outer}_{inner}"
  }

  setup <- across_setup(
    cols = !!.cols,
    fns = .fns,
    names = .names,
    .caller_env = caller_env,
    mask = mask,
    error_call = error_call,
    across_if_fn = across_if_fn
  )

  if (!missing(...)) {
    details <- c(
      "Supply arguments directly to `.fns` through an anonymous function instead.",
      "",
      " " = "# Previously",
      " " = "across(a:b, mean, na.rm = TRUE)",
      "",
      " " = "# Now",
      " " = "across(a:b, \\(x) mean(x, na.rm = TRUE))"
    )
    lifecycle::deprecate_soft(
      when = "1.1.0",
      what = "across(...)",
      details = details
    )
  }

  vars <- setup$vars
  if (length(vars) == 0L) {
    return(dplyr_new_tibble(list(), size = 1L))
  }
  fns <- setup$fns
  names <- setup$names

  fns <- map(fns, function(fn) uninline(fn, fns_quo_env))

  if (!length(fns)) {
    # TODO: Deprecate and remove the `.fns = NULL` path in favor of `pick()`
    data <- mask$pick_current(vars)

    if (is.null(names)) {
      return(data)
    } else {
      return(set_names(data, names))
    }
  }

  data <- mask$current_cols(vars)

  n_cols <- length(data)
  n_fns <- length(fns)

  seq_n_cols <- seq_len(n_cols)
  seq_fns <- seq_len(n_fns)

  k <- 1L
  out <- vector("list", n_cols * n_fns)

  # Reset `cur_column()` info on exit
  old_var <- context_peek_bare("column")
  on.exit(context_poke("column", old_var), add = TRUE)

  # Loop in such an order that all functions are applied
  # to a single column before moving on to the next column
  withCallingHandlers(
    for (i in seq_n_cols) {
      var <- vars[[i]]
      col <- data[[i]]

      context_poke("column", var)

      for (j in seq_fns) {
        fn <- fns[[j]]
        out[[k]] <- fn(col, ...)
        k <- k + 1L
      }
    }, error = function(cnd) {
      bullets <- c(
        glue("Can't compute column `{names[k]}`.")
      )
      abort(bullets, call = error_call, parent = cnd)
    }
  )

  size <- vec_size_common(!!!out)
  out <- vec_recycle_common(!!!out, .size = size)
  names(out) <- names
  out <- dplyr_new_tibble(out, size = size)

  if (.unpack) {
    out <- df_unpack(out, unpack_spec, caller_env)
  }

  out
}

#' @rdname across
#' @export
if_any <- function(.cols, .fns, ..., .names = NULL) {
  context_local("across_if_fn", "if_any")
  context_local("across_frame", current_env())
  if_across(`|`, across({{ .cols }}, .fns, ..., .names = .names))
}
#' @rdname across
#' @export
if_all <- function(.cols, .fns, ..., .names = NULL) {
  context_local("across_if_fn", "if_all")
  context_local("across_frame", current_env())
  if_across(`&`, across({{ .cols }}, .fns, ..., .names = .names))
}

if_across <- function(op, df) {
  n <- nrow(df)

  if (!length(df)) {
    return(TRUE)
  }

  combine <- function(x, y) {
    if (is_null(x)) {
      y
    } else {
      op(x, y)
    }
  }
  reduce(df, combine, .init = NULL)
}

#' Combine values from multiple columns
#'
#' @description
#' `c_across()` is designed to work with [rowwise()] to make it easy to
#' perform row-wise aggregations. It has two differences from `c()`:
#'
#' * It uses tidy select semantics so you can easily select multiple variables.
#'   See `vignette("rowwise")` for more details.
#'
#' * It uses [vctrs::vec_c()] in order to give safer outputs.
#'
#' @inheritParams across
#' @seealso [across()] for a function that returns a tibble.
#' @export
#' @examples
#' df <- tibble(id = 1:4, w = runif(4), x = runif(4), y = runif(4), z = runif(4))
#' df %>%
#'   rowwise() %>%
#'   mutate(
#'     sum = sum(c_across(w:z)),
#'     sd = sd(c_across(w:z))
#'   )
c_across <- function(cols) {
  mask <- peek_mask()
  cols <- enquo(cols)

  if (quo_is_missing(cols)) {
    c_across_missing_cols_deprecate_warn()
    cols <- quo_set_expr(cols, expr(everything()))
  }

  vars <- c_across_setup(!!cols, mask = mask)

  cols <- mask$current_cols(vars)
  vec_c(!!!cols, .name_spec = zap())
}

across_glue_mask <- function(.col, .fn, .caller_env) {
  glue_mask <- env(.caller_env, .col = .col, .fn = .fn)
  # TODO: we can make these bindings louder later
  env_bind_active(
    glue_mask, col = function() glue_mask$.col, fn = function() glue_mask$.fn
  )
  glue_mask
}

across_setup <- function(cols,
                         fns,
                         names,
                         .caller_env,
                         mask,
                         error_call = caller_env(),
                         across_if_fn = "across") {
  cols <- enquo(cols)

  # `across()` is evaluated in a data mask so we need to remove the
  # mask layer from the quosure environment (#5460)
  cols <- quo_set_env_to_data_mask_top(cols)

  # TODO: call eval_select with a calling handler to intercept
  #       classed error, after https://github.com/r-lib/tidyselect/issues/233
  if (is.null(fns) && quo_is_call(cols, "~")) {
    bullets <- c(
      "Must supply a column selection.",
      i = glue("You most likely meant: `{across_if_fn}(everything(), {as_label(cols)})`."),
      i = "The first argument `.cols` selects a set of columns.",
      i = "The second argument `.fns` operates on each selected columns."
    )
    abort(bullets, call = error_call)
  }
  data <- mask$get_current_data(groups = FALSE)

  vars <- tidyselect::eval_select(
    cols,
    data = data,
    error_call = error_call
  )
  names_vars <- names(vars)
  vars <- names(data)[vars]

  if (is.null(fns)) {
    # TODO: Eventually deprecate and remove the `.fns = NULL` path in favor of `pick()`
    if (!is.null(names)) {
      glue_mask <- across_glue_mask(.caller_env, .col = names_vars, .fn = "1")
      names <- vec_as_names(
        glue(names, .envir = glue_mask),
        repair = "check_unique",
        call = error_call
      )
    } else {
      names <- names_vars
    }

    value <- list(vars = vars, fns = fns, names = names)
    return(value)
  }

  # apply `.names` smart default
  if (is.function(fns)) {
    names <- names %||% "{.col}"
    fns <- list("1" = fns)
  } else {
    names <- names %||% "{.col}_{.fn}"
  }

  if (!is.list(fns)) {
    abort("Expected a list.", .internal = TRUE)
  }

  # make sure fns has names, use number to replace unnamed
  if (is.null(names(fns))) {
    names_fns <- seq_along(fns)
  } else {
    names_fns <- names(fns)
    empties <- which(names_fns == "")
    if (length(empties)) {
      names_fns[empties] <- empties
    }
  }

  glue_mask <- across_glue_mask(.caller_env,
    .col = rep(names_vars, each = length(fns)),
    .fn  = rep(names_fns , length(vars))
  )
  names <- vec_as_names(
    glue(names, .envir = glue_mask),
    repair = "check_unique",
    call = error_call
  )

  list(
    vars = vars,
    fns = fns,
    names = names
  )
}

uninline <- function(fn, env) {
  # Reset environment of inlinable lambdas which are set to the empty
  # env sentinel
  if (identical(get_env(fn), empty_env())) {
    set_env(fn, env)
  } else {
    fn
  }
}

# FIXME: This pattern should be encapsulated by rlang
data_mask_top <- function(env, recursive = FALSE, inherit = FALSE) {
  while (env_has(env, ".__tidyeval_data_mask__.", inherit = inherit)) {
    env <- env_parent(env_get(env, ".top_env", inherit = inherit))
    if (!recursive) {
      return(env)
    }
  }

  env
}
quo_set_env_to_data_mask_top <- function(quo) {
  env <- quo_get_env(quo)
  env <- data_mask_top(env, recursive = FALSE, inherit = FALSE)
  quo_set_env(quo, env)
}

c_across_setup <- function(cols, mask, error_call = caller_env()) {
  cols <- enquo(cols)

  # `c_across()` is evaluated in a data mask so we need to remove the
  # mask layer from the quosure environments (same as `across()`) (#5460, #6522)
  cols <- quo_set_env_to_data_mask_top(cols)

  data <- mask$get_current_data(groups = FALSE)

  vars <- tidyselect::eval_select(
    expr = cols,
    data = data,
    allow_rename = FALSE,
    error_call = error_call
  )

  value <- names(vars)
  value
}

new_dplyr_quosure <- function(quo, ...) {
  attr(quo, "dplyr:::data") <- list2(...)
  quo
}

dplyr_quosure_name <- function(quo_data) {
  if (quo_data$is_named) {
    # `name` is a user-supplied or known character string
    quo_data$name
  } else {
    # `name` is a quosure that must be auto-named
    with_no_rlang_infix_labeling(as_label(quo_data$name))
  }
}

dplyr_quosures <- function(...) {
  # We're using quos() instead of enquos() here for speed, because we're not defusing named arguments --
  # only the ellipsis is converted to quosures, there are no further arguments.
  quosures <- quos(..., .ignore_empty = "all")
  names <- names2(quosures)

  for (i in seq_along(quosures)) {
    quosure <- quosures[[i]]
    name <- names[[i]]
    is_named <- (name != "")

    if (!is_named) {
      # Will be auto-named by `dplyr_quosure_name()` only as needed
      name <- quosure
    }

    quosures[[i]] <- new_dplyr_quosure(
      quo = quosure,
      name = name,
      is_named = is_named,
      index = i
    )
  }
  quosures
}

# When mutate() or summarise() have an unnamed call to across() at the top level, e.g.
# summarise(across(<...>)) or mutate(across(<...>))
#
# a call to top_across(<...>) is evaluated instead.
# top_across() returns a flattened list of expressions along with some
# information about the "current column" for each expression
# in the "columns" attribute:
#
# For example with: summarise(across(c(x, y), mean, .names = "mean_{.col}")) top_across() will return
# something like:
#
# structure(
#   list(mean_x = expr(mean(x)), mean_y = expr(mean(y)))
#   columns = c("x", "y")
# )

# Technically this always returns a single quosure but we wrap it in a
# list to follow the pattern in `expand_across()`
expand_if_across <- function(quo) {
  quo_data <- attr(quo, "dplyr:::data")
  if (!quo_is_call(quo, c("if_any", "if_all"), ns = c("", "dplyr"))) {
    return(list(quo))
  }

  call <- match.call(
    definition = if_any,
    call = quo_get_expr(quo),
    expand.dots = FALSE,
    envir = quo_get_env(quo)
  )
  if (!is_null(call$...)) {
    return(list(quo))
  }

  if (is_call(call, "if_any")) {
    op <- "|"
    if_fn <- "if_any"
  } else {
    op <- "&"
    if_fn <- "if_all"
  }

  context_local("across_if_fn", if_fn)

  # Set frame here for backtrace truncation. But override error call
  # via `local_error_call()` so it refers to the function we're
  # expanding, e.g. `if_any()` and not `expand_if_across()`.
  context_local("across_frame", current_env())
  local_error_call(call(if_fn))

  call[[1]] <- quote(across)
  quos <- expand_across(quo_set_expr(quo, call))

  # Select all rows if there are no inputs
  if (!length(quos)) {
    return(list(quo(TRUE)))
  }

  combine <- function(x, y) {
    if (is_null(x)) {
      y
    } else {
      call(op, x, y)
    }
  }
  expr <- reduce(quos, combine, .init = NULL)

  # Use `as_quosure()` instead of `new_quosure()` to avoid rewrapping
  # quosure in case of single input
  list(as_quosure(expr, env = baseenv()))
}

expand_across <- function(quo) {
  quo_data <- attr(quo, "dplyr:::data")
  if (!quo_is_call(quo, "across", ns = c("", "dplyr")) || quo_data$is_named) {
    return(list(quo))
  }

  across_if_fn <- context_peek_bare("across_if_fn") %||% "across"

  # Set error call to frame for backtrace truncation, but override
  # call with the relevant function we're doing the expansion for
  error_call <- context_peek_bare("across_frame") %||% current_env()
  local_error_call(call(across_if_fn))

  # Expand dots in lexical env
  env <- quo_get_env(quo)
  expr <- match.call(
    definition = across,
    call = quo_get_expr(quo),
    expand.dots = FALSE,
    envir = env
  )

  # Abort expansion if there are any expression supplied because dots
  # must be evaluated once per group in the data mask. Expanding the
  # `across()` call would lead to either `n_group * n_col` evaluations
  # if dots are delayed or only 1 evaluation if they are eagerly
  # evaluated.
  if (!is_null(expr$...)) {
    return(list(quo))
  }

  dplyr_mask <- peek_mask()
  mask <- dplyr_mask$get_rlang_mask()

  if (".unpack" %in% names(expr)) {
    # We're expanding expressions but we do need some actual values ahead of
    # time. We evaluate those in the mask to simulate masked evaluation of an
    # `across()` call within a verb like `mutate()`. `.names` and `.fns` are
    # also evaluated this way below.
    unpack <- eval_tidy(expr$.unpack, mask, env = env)
  } else {
    unpack <- FALSE
  }

  # Abort expansion if unpacking as expansion makes named expressions and we
  # need the expressions to remain unnamed
  if (!is_false(unpack)) {
    return(list(quo))
  }

  # Differentiate between missing and null (`match.call()` doesn't
  # expand default argument)
  if (".cols" %in% names(expr)) {
    cols <- expr$.cols
  } else {
    across_missing_cols_deprecate_warn()
    cols <- expr(everything())
  }
  cols <- as_quosure(cols, env)

  if (".fns" %in% names(expr)) {
    fns <- as_quosure(expr$.fns, env)
    fns <- quo_eval_fns(fns, mask = mask, error_call = error_call)
  } else {
    # In the missing case, silently restore the old default of `NULL`.
    # TODO: Escalate this to formal deprecation.
    fns <- NULL
  }

  setup <- across_setup(
    !!cols,
    fns = fns,
    names = eval_tidy(expr$.names, mask, env = env),
    .caller_env = env,
    mask = dplyr_mask,
    error_call = error_call,
    across_if_fn = across_if_fn
  )

  vars <- setup$vars

  # Empty expansion
  if (length(vars) == 0L) {
    return(new_expanded_quosures(list()))
  }

  fns <- setup$fns
  names <- setup$names %||% vars

  # No functions, so just return a list of symbols
  if (is.null(fns)) {
    # TODO: Deprecate and remove the `.fns = NULL` path in favor of `pick()`
    expressions <- pmap(list(vars, names, seq_along(vars)), function(var, name, k) {
      quo <- new_quosure(sym(var), empty_env())
      quo <- new_dplyr_quosure(
        quo,
        name = name,
        is_named = TRUE,
        index = c(quo_data$index, k),
        column = var
      )
    })
    names(expressions) <- names
    expressions <- new_expanded_quosures(expressions)
    return(expressions)
  }

  n_vars <- length(vars)
  n_fns <- length(fns)

  seq_vars <- seq_len(n_vars)
  seq_fns  <- seq_len(n_fns)

  expressions <- vector(mode = "list", n_vars * n_fns)
  columns <- character(n_vars * n_fns)

  k <- 1L
  for (i in seq_vars) {
    var <- vars[[i]]

    for (j in seq_fns) {
      fn_call <- as_across_fn_call(fns[[j]], var, env, mask)

      name <- names[[k]]
      expressions[[k]] <- new_dplyr_quosure(
        fn_call,
        name = name,
        is_named = TRUE,
        index = c(quo_data$index, k),
        column = var
      )

      k <- k + 1L
    }
  }

  names(expressions) <- names
  new_expanded_quosures(expressions)
}

new_expanded_quosures <- function(x) {
  structure(x, class = "dplyr_expanded_quosures")
}

as_across_fn_call <- function(fn, var, env, mask) {
  if (is_inlinable_lambda(fn)) {
    # Transform inlinable lambdas to simple quosured calls
    arg <- names(formals(fn))[[1]]
    expr <- body(fn)
    expr <- expr_substitute(expr, sym(arg), sym(var))
    new_quosure(expr, env)
  } else {
    # Non-inlinable elements are wrapped in a quosured call. It's
    # important that these are set to their original quosure
    # environment (passed as `env`) because we change non-inlinable
    # lambdas to inherit from the data mask in order to make them
    # maskable. By wrapping them in a quosured call that inherits from
    # the original quosure environment that wrapped the expanded
    # `across()` call, we cause `eval_tidy()` to chains this
    # environment to the top of the data mask, thereby preserving the
    # lexical environment of the lambda when it is evaluated.
    new_quosure(call2(fn, sym(var)), env)
  }
}

# The environment of functions that are safe to inline has been set to
# the empty env sentinel
is_inlinable_lambda <- function(x) {
  is_function(x) && identical(fn_env(x), empty_env())
}

across_missing_cols_deprecate_warn <- function() {
  across_if_fn <- context_peek_bare("across_if_fn") %||% "across"

  # Passing the correct `user_env` through `expand_across()` to here is
  # complicated, so instead we force the global environment. This means users
  # won't ever see the "deprecated feature was likely used in the {pkg}"
  # message, but the warning will still fire and that is more important.
  user_env <- global_env()

  lifecycle::deprecate_warn(
    when = "1.1.0",
    what = I(glue("Using `{across_if_fn}()` without supplying `.cols`")),
    details = "Please supply `.cols` instead.",
    user_env = user_env
  )
}

c_across_missing_cols_deprecate_warn <- function(user_env = caller_env(2)) {
  lifecycle::deprecate_warn(
    when = "1.1.0",
    what = I("Using `c_across()` without supplying `cols`"),
    details = "Please supply `cols` instead.",
    user_env = user_env
  )
}

df_unpack <- function(x, spec, caller_env, error_call = caller_env()) {
  size <- vec_size(x)

  out <- dplyr_new_list(x)
  names <- names(out)

  loc <- which(map_lgl(out, is.data.frame))

  cols <- out[loc]
  col_names <- names[loc]

  out[loc] <- map2(
    .x = cols,
    .y = col_names,
    .f = apply_unpack_spec,
    spec = spec,
    caller_env = caller_env
  )

  # Signal to `df_list()` that these columns should be unpacked
  names[loc] <- ""
  names(out) <- names

  out <- df_list(!!!out, .size = size, .name_repair = "minimal")
  out <- dplyr_new_tibble(out, size = size)

  vec_as_names(names(out), repair = "check_unique", call = error_call)

  out
}

apply_unpack_spec <- function(col, outer, spec, caller_env) {
  inner <- names(col)
  outer <- vec_rep(outer, times = length(inner))

  mask <- env(caller_env, outer = outer, inner = inner)

  inner <- glue(spec, .envir = mask)
  inner <- as.character(inner)

  names(col) <- inner
  col
}

# Evaluate the quosure of the `.fns` argument
#
# We detect and mark inlinable lambdas here. By lambda we mean either
# a `~` or `function` call that is directly supplied to
# `across()`. Lambdas haven't been evaluated yet and don't carry an
# environment.
#
# Inlinable lambdas are eventually expanded in the surrounding call.
# To distinguish inlinable lambdas from non-inlinable ones, we set
# their environments to the empty env.
#
# There are cases where we can't inline, for instance lambdas that are
# passed additional arguments through `...`. We still want these
# non-inlinable lambdas to be maskable so that they can refer to
# data-mask columns. So we set them (a) in the evaluation case, to
# their original quosure environment which is the data mask, or (b) in
# the expansion case, to the uninitialised data mask.
#
# @value  <fn> | <list<fn>>. Inlinable lambdas are set to the
#   empty env.
quo_eval_fns <- function(quo, mask, error_call = caller_env()) {
  # In the evaluation path (as opposed to expansion), the quosure
  # inherits from the data mask. We set the environment to the data
  # mask top (the original quosure environment) so that we don't
  # evaluate the function expressions in the mask. This prevents
  # masking a function symbol (e.g. `mean`) by a column of the same
  # name.
  quo <- quo_set_env_to_data_mask_top(quo)

  # The following strange scheme is a work around to reconciliate two
  # contradictory goals. We want to evaluate outside the mask so that
  # data mask columns are not confused with functions (#6545).
  # However at the same time we want non-inlinable lambdas (inlinable
  # ones are dealt with above) to be maskable so they can refer to
  # data mask columns. So we evaluate outside the mask, in a data-less
  # quosure mask that handles quosures. Then, in `validate()`, we
  # detect lambdas that inherit from this quosure mask and set their
  # environment to the data mask.
  sentinel_env <- empty_env()

  out <- eval_tidy(quo({
    sentinel_env <<- current_env()
    !!quo
  }))

  validate <- function(x) {
    if (is_formula(x) || is_function(x)) {
      # If the function or formula inherits from the data-less quosure
      # mask, we have a lambda that was directly supplied and
      # evaluated here. We inline it if possible.
      if (identical(get_env(x), sentinel_env)) {
        if (is_inlinable_function(x)) {
          return(set_env(x, empty_env()))
        }

        if (is_inlinable_formula(x)) {
          x <- expr_substitute(x, quote(.), quote(.x))
          fn <- new_function(pairlist2(.x = ), f_rhs(x), empty_env())
          return(fn)
        }

        # Can't inline the lambda. We set its environment to the data
        # mask so it can still refer to columns.
        x <- set_env(x, mask)
      }

      as_function(x, arg = ".fns", call = error_call)
    } else {
      abort(
        "`.fns` must be a function, a formula, or a list of functions/formulas.",
        call = error_call
      )
    }
  }

  if (obj_is_list(out)) {
    map(out, function(elt) validate(elt))
  } else {
    validate(out)
  }
}

is_inlinable_function <- function(x) {
  if (!is_function(x)) {
    return(FALSE)
  }

  fmls <- formals(x)

  # Don't inline if there are additional arguments even if they have
  # defaults or are passed through `...`
  if (length(fmls) != 1) {
    return(FALSE)
  }

  # Don't inline lambdas that call `return()` at the moment a few
  # packages do things like `across(1, function(x)
  # return(x))`. Whereas `eval()` sets a return point, `eval_tidy()`
  # doesn't which causes `return()` to throw an error.
  if ("return" %in% all.names(body(x))) {
    return(FALSE)
  }

  TRUE
}

is_inlinable_formula <- function(x) {
  if (!is_formula(x, lhs = FALSE)) {
    return(FALSE)
  }

  # Don't inline if there are additional arguments passed through `...`
  nms <- all.names(x)
  unsupported_arg_rx <- "\\.\\.[0-9]|\\.y"

  if (any(grepl(unsupported_arg_rx, nms))) {
    return(FALSE)
  }

  # Don't inline lambdas that call `return()` at the moment, see above
  if ("return" %in% nms) {
    return(FALSE)
  }

  TRUE
}

Try the dplyr package in your browser

Any scripts or data that you put into this service are public.

dplyr documentation built on Nov. 17, 2023, 5:08 p.m.