rowSums_if: Row Sums Conditional on Frequency of Observed Values
In quest: Prepare Questionnaire Data for Analysis

rowSums_if

R Documentation

Row Sums Conditional on Frequency of Observed Values

Description

rowSums_if calculates the sum of every row in a numeric or logical matrix conditional on the frequency of observed data. If the frequency of observed values in that row is less than (or equal to) that specified by ov.min, then NA is returned for that row. It also has the option to return a value other than 0 (e.g., NA) when all rows are NA, which differs from rowSums(x, na.rm = TRUE).

Usage

rowSums_if(
  x,
  ov.min = 1,
  prop = TRUE,
  inclusive = TRUE,
  impute = TRUE,
  allNA = NA_real_
)

Arguments

`x`	numeric or logical matrix. If not a matrix, it will be coerced to one.
`ov.min`	minimum frequency of observed values required per row. If `prop` = TRUE, then this is a decimal between 0 and 1. If `prop` = FALSE, then this is a integer between 0 and `ncol(x)`.
`prop`	logical vector of length 1 specifying whether `ov.min` should refer to the proportion of observed values (TRUE) or the count of observed values (FALSE).
`inclusive`	logical vector of length 1 specifying whether the sum should be calculated if the frequency of observed values in a row is exactly equal to `ov.min`.
`impute`	logical vector of length 1 specifying if missing values should be imputed with the mean of observed values of `x[i, ]`. If TRUE (default), this will make sums over the same columns with different amounts of observed data comparable.
`allNA`	numeric vector of length 1 specifying what value should be returned for rows that are all NA. This is most applicable when `ov.min = 0` and `inclusive = TRUE`. The default is NA, which differs from `rowSums` with `na.rm = TRUE` where 0 is returned. Note, the value is overwritten by NA if the frequency of observed values in that row is less than (or equal to) that specified by `ov.min`.

Details

Conceptually this function is doing: apply(X = x, MARGIN = 1, FUN = sum_if, ov.min = ov.min, prop = prop, inclusive = inclusive). But for computational efficiency purposes it does not because then the observed values conditioning would not be vectorized. Instead, it uses rowSums and then inserts NAs for rows that have too few observed values.

Value

numeric vector of length = nrow(x) with names = rownames(x) providing the sum of each row or NA (or allNA) depending on the frequency of observed values.

Examples

rowSums_if(airquality)
rowSums_if(x = airquality, ov.min = 5, prop = FALSE)
x <- data.frame("x" = c(1, 1, NA), "y" = c(2, NA, NA), "z" = c(NA, NA, NA))
rowSums_if(x)
rowSums_if(x, ov.min = 0)
rowSums_if(x, ov.min = 0, allNA = 0)
identical(x = rowSums(x, na.rm = TRUE),
   y = unname(rowSums_if(x, impute = FALSE, ov.min = 0, allNA = 0))) # identical to
   # rowSums(x, na.rm = TRUE)

quest documentation built on May 29, 2024, 4:59 a.m.