# colSums_if: Column Sums Conditional on Frequency of Observed Values In quest: Prepare Questionnaire Data for Analysis

## Description

`colSums_if` calculates the sum of every column in a numeric or logical matrix conditional on the frequency of observed data. If the frequency of observed values in that column is less than (or equal to) that specified by `ov.min`, then NA is returned for that column. It also has the option to return a value other than 0 (e.g., NA) when all columns are NA, which differs from `colSums(x, na.rm = TRUE)`.

## Usage

 ```1 2 3 4 5 6 7 8``` ```colSums_if( x, ov.min = 1, prop = TRUE, inclusive = TRUE, impute = TRUE, allNA = NA_real_ ) ```

## Arguments

 `x` numeric or logical matrix. If not a matrix, it will be coerced to one. `ov.min` minimum frequency of observed values required per column. If `prop` = TRUE, then this is a decimal between 0 and 1. If `prop` = FALSE, then this is a integer between 0 and `nrow(x)`. `prop` logical vector of length 1 specifying whether `ov.min` should refer to the proportion of observed values (TRUE) or the count of observed values (FALSE). `inclusive` logical vector of length 1 specifying whether the sum should be calculated if the frequency of observed values in a column is exactly equal to `ov.min`. `impute` logical vector of length 1 specifying if missing values should be imputed with the mean of observed values of `x[, i]`. If TRUE (default), this will make sums over the same rows with different amounts of observed data comparable. `allNA` numeric vector of length 1 specifying what value should be returned for columns that are all NA. This is most applicable when `ov.min = 0` and `inclusive = TRUE`. The default is NA, which differs from `colSums` with `na.rm = TRUE` where 0 is returned. Note, the value is overwritten by NA if the frequency of observed values in that column is less than (or equal to) that specified by `ov.min`.

## Details

Conceptually this function does: ```apply(X = x, MARGIN = 2, FUN = sum_if, ov.min = ov.min, prop = prop, inclusive = inclusive)```. But for computational efficiency purposes it does not because then the observed values conditioning would not be vectorized. Instead, it uses `colSums` and then inserts NAs for columns that have too few observed values.

## Value

numeric vector of length = `ncol(x)` with names = `colnames(x)` providing the sum of each column or NA depending on the frequency of observed values.

`colMeans_if` `rowSums_if` `rowMeans_if` `colSums`
 ```1 2 3 4 5 6 7 8 9``` ```colSums_if(airquality) colSums_if(x = airquality, ov.min = 150, prop = FALSE) x <- data.frame("x" = c(1, 2, NA), "y" = c(1, NA, NA), "z" = c(NA, NA, NA)) colSums_if(x) colSums_if(x, ov.min = 0) colSums_if(x, ov.min = 0, allNA = 0) identical(x = colSums(x, na.rm = TRUE), y = colSums_if(x, impute = FALSE, ov.min = 0, allNA = 0)) # identical to # colSums(x, na.rm = TRUE) ```