analyze_cat_multi: Estimation of weighted proportions of multiple subsets of...

View source: R/analyzing.R

analyze_cat_multiR Documentation

Estimation of weighted proportions of multiple subsets of categorical data

Description

Given categorical data, subsetting information, and the weights for the individual observations, calculate estimated proportions by category and Goodman's multinomial confidence intervals for each subset. This can be done with data without subsetting by not providing values for split_vars. An example of using split_vars would be if the data ratings of indicators where the indicators each need to be estimated separately and the indicator information is stored in data$indicator in which case you would use split_var = "indicator". If indicators appear more than once with different ratings because there were different criteria for different objectives and the objective was stored in data$objective then you would use split_vars = c("indicator", "objective").

Usage

analyze_cat_multi(
  data,
  weights,
  id_var,
  cat_var,
  wgt_var,
  split_vars = NULL,
  definitions = NULL,
  conf = 80,
  verbose = FALSE
)

Arguments

data

Data frame. Categorical data with the unique identifiers for each observation/row in the variable id_var and the assigned category for each observation/row in cat_var. If the data are being subset by unique combinations of values in one or more additional variables, those variables must be specified in split_vars. Note that the unique identifiers do not have to be unique for the whole of data so long as they are unique within each subset of data.

weights

Data frame. This must contain the weighting information using the variables id_var with a unique identifier for each observation/row and wgt_var with the relative numeric weight of each observation/row.

id_var

Character string. The name of the variable in data and weights that contains the unique identifiers for the observations. The values in this variable must be unique within subsets by split_vars or simply unique if split_vars = NULL.

cat_var

Character string. The name of the variable in data and (if being used) definitions that contains the category values.

wgt_var

Character string. The name of the variable in weights that contains the numeric weight values.

split_vars

Optional character vector. One or more character strings corresponding to variable names in data and (if being used) definitions. The data will be subset for the calculations by unique combinations of values in these variables. Each subset must have only unique values in the variable id_var. If NULL then no subsetting will take place. Defaults to NULL.

definitions

Optional data frame. The possible categories for the observations to be classed into, which may include categories that do not appear in data because no observations met their criteria. Must contain at least the variable cat_var with ALL possible categories. If split_vars != NULL then it must also contain all variables in split_vars and will be subset in the same way as data, in which case each subset must contain ALL possible categories for that subset.

conf

Numeric. The confidence level in percent. Defaults to 80.

verbose

Logical. If TRUE then the function will generate additional messages as it executes. Defaults to FALSE.

Value

A data frame containing the categories, counts of observations, weighted estimated proportions, and confidence intervals. If subset using split_vars then all those variables will be included and the estimates will be per unique combination of values within those variables.


nstauffer/aim.analysis documentation built on Nov. 2, 2023, 12:52 a.m.