colSums_if: Column Sums Conditional on Frequency of Observed Values
In quest: Prepare Questionnaire Data for Analysis

colSums_if

R Documentation

Column Sums Conditional on Frequency of Observed Values

Description

colSums_if calculates the sum of every column in a numeric or logical matrix conditional on the frequency of observed data. If the frequency of observed values in that column is less than (or equal to) that specified by ov.min, then NA is returned for that column. It also has the option to return a value other than 0 (e.g., NA) when all columns are NA, which differs from colSums(x, na.rm = TRUE).

Usage

colSums_if(
  x,
  ov.min = 1,
  prop = TRUE,
  inclusive = TRUE,
  impute = TRUE,
  allNA = NA_real_
)

Arguments

`x`	numeric or logical matrix. If not a matrix, it will be coerced to one.
`ov.min`	minimum frequency of observed values required per column. If `prop` = TRUE, then this is a decimal between 0 and 1. If `prop` = FALSE, then this is a integer between 0 and `nrow(x)`.
`prop`	logical vector of length 1 specifying whether `ov.min` should refer to the proportion of observed values (TRUE) or the count of observed values (FALSE).
`inclusive`	logical vector of length 1 specifying whether the sum should be calculated if the frequency of observed values in a column is exactly equal to `ov.min`.
`impute`	logical vector of length 1 specifying if missing values should be imputed with the mean of observed values of `x[, i]`. If TRUE (default), this will make sums over the same rows with different amounts of observed data comparable.
`allNA`	numeric vector of length 1 specifying what value should be returned for columns that are all NA. This is most applicable when `ov.min = 0` and `inclusive = TRUE`. The default is NA, which differs from `colSums` with `na.rm = TRUE` where 0 is returned. Note, the value is overwritten by NA if the frequency of observed values in that column is less than (or equal to) that specified by `ov.min`.

Details

Conceptually this function does: apply(X = x, MARGIN = 2, FUN = sum_if, ov.min = ov.min, prop = prop, inclusive = inclusive). But for computational efficiency purposes it does not because then the observed values conditioning would not be vectorized. Instead, it uses colSums and then inserts NAs for columns that have too few observed values.

Value

numeric vector of length = ncol(x) with names = colnames(x) providing the sum of each column or NA depending on the frequency of observed values.

Examples

colSums_if(airquality)
colSums_if(x = airquality, ov.min = 150, prop = FALSE)
x <- data.frame("x" = c(1, 2, NA), "y" = c(1, NA, NA), "z" = c(NA, NA, NA))
colSums_if(x)
colSums_if(x, ov.min = 0)
colSums_if(x, ov.min = 0, allNA = 0)
identical(x = colSums(x, na.rm = TRUE),
   y = colSums_if(x, impute = FALSE, ov.min = 0, allNA = 0)) # identical to
   # colSums(x, na.rm = TRUE)

quest documentation built on May 29, 2024, 4:59 a.m.