summarize_qc: Report number of NAs created when performing dplyr summarize

View source: R/summarize_qc.R

summarize_qcR Documentation

Report number of NAs created when performing dplyr summarize

Description

summarize_qc is used exactly the same as dplyr::summarize and requires all of the same arguments and returns an identical object. The only difference is that summarize_qc prints a message indicating the number of NA or INFinite values created in the new summary variable(s). This is most useful when using on a grouped data frame.

Usage

summarize_qc(.data = NULL, ..., .group_check = F)

summarise_qc(.data = NULL, ..., .group_check = F)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Name-value pairs of summary functions. The name will be the name of the variable in the result.

The value can be:

  • A vector of length 1, e.g. min(x), n(), or sum(is.na(y)).

  • A data frame, to add multiple columns from a single expression.

[Deprecated] Returning values with size 0 or >1 was deprecated as of 1.1.0. Please use reframe() for this instead.

.group_check

a logical value, that when TRUE, will print a table with each group variable and a column called "missing_vars" that lists which variables are missing from the summarized data for each group. Only groups with at least one missing variable are listed. This has no effect on the returned object, and only prints information. Default is FALSE, to avoid excess printing. If data is not grouped and .group_check = T, then an error is thrown.

Value

An object of the same class as .data. This object will be identical to that which is returned when running dplyr::summarise.

Scoped variants

There are _qc versions of the scoped summarize functions. See summarize_at_qc, summarize_all_qc, or summarize_if_qc.

Grouping

All functions work with grouped data.

summarize vs. summarise

There are _qc versions of summarize and summarise. But this is America, use a z!

See Also

summarise

Examples

practice_data <- 
  data.frame(
  A = c(1:4, NA), 
  B = c(NA, 7:10), 
  C = 21:25,
  G = c("X", "X", "X", "Y", "Y"),
  stringsAsFactors = F
)

summarize_qc(practice_data, new_var_1 = mean(C), sum(A))
summarize_qc(practice_data, new_var_1 = mean(C), sum(A, na.rm = T))

# Pipes work
practice_data %>% 
  summarize_qc(practice_data, new_var_1 = mean(C), sum(A, na.rm = T))

# Functions worked on grouped data, too
grouped_data <- dplyr::group_by(practice_data, G)
summarize_qc(grouped_data, new_var_1 = mean(A), mean_b = mean(B), sum(C))

# Setting .group_check = T will print, for each group with a missing value,
which new variables are missing. 
summarize_qc(
  grouped_data, 
  .group_check = T,
  new_var_1 = mean(A),
  mean_b = mean(B),
  sum(C)
)


adamMaier/reviewr documentation built on Nov. 5, 2023, 7:21 a.m.