mChoice: Methods for Storing and Analyzing Multiple Choice Variables
In harrelfe/Hmisc: Harrell Miscellaneous

mChoice

R Documentation

Methods for Storing and Analyzing Multiple Choice Variables

Description

mChoice is a function that is useful for grouping variables that represent individual choices on a multiple choice question. These choices are typically factor or character values but may be of any type. Levels of component factor variables need not be the same; all unique levels (or unique character values) are collected over all of the multiple variables. Then a new character vector is formed with integer choice numbers separated by semicolons. Optimally, a database system would have exported the semicolon-separated character strings with a levels attribute containing strings defining value labels corresponding to the integer choice numbers. mChoice is a function for creating a multiple-choice variable after the fact. mChoice variables are explicitly handed by the describe and summary.formula functions. NAs or blanks in input variables are ignored.

format.mChoice will convert the multiple choice representation to text form by substituting levels for integer codes. as.double.mChoice converts the mChoice object to a binary numeric matrix, one column per used level (or all levels of drop=FALSE. This is called by the user by invoking as.numeric. There is a print method and a summary method, and a print method for the summary.mChoice object. The summary method computes frequencies of all two-way choice combinations, the frequencies of the top 5 combinations, information about which other choices are present when each given choice is present, and the frequency distribution of the number of choices per observation. This summary output is used in the describe function. The print method returns an html character string if options(prType='html') is in effect if render=FALSE or renders the html otherwise. This is used by print.describe and is most effective when short=TRUE is specified to summary.

in.mChoice creates a logical vector the same length as x whose elements are TRUE when the observation in x contains at least one of the codes or value labels in the second argument.

match.mChoice creates an integer vector of the indexes of all elements in table which contain any of the speicified levels

nmChoice returns an integer vector of the number of choices that were made

is.mChoice returns TRUE is the argument is a multiple choice variable.

Usage

mChoice(..., label='',
        sort.levels=c('original','alphabetic'), 
        add.none=FALSE, drop=TRUE, ignoreNA=TRUE)

## S3 method for class 'mChoice'
format(x, minlength=NULL, sep=";", ...)

## S3 method for class 'mChoice'
as.double(x, drop=FALSE, ...)

## S3 method for class 'mChoice'
print(x, quote=FALSE, max.levels=NULL,
       width=getOption("width"), ...)

## S3 method for class 'mChoice'
as.character(x, ...)

## S3 method for class 'mChoice'
summary(object, ncombos=5, minlength=NULL,
  drop=TRUE, short=FALSE, ...)

## S3 method for class 'summary.mChoice'
print(x, prlabel=TRUE, render=TRUE, ...)

## S3 method for class 'mChoice'
x[..., drop=FALSE]

match.mChoice(x, table, nomatch=NA, incomparables=FALSE)

inmChoice(x, values, condition=c('any', 'all'))

inmChoicelike(x, values, condition=c('any', 'all'),
              ignore.case=FALSE, fixed=FALSE)

nmChoice(object)

is.mChoice(x)

## S3 method for class 'mChoice'
Summary(..., na.rm)

Arguments

`na.rm`	Logical: remove `NA`'s from data
`table`	a vector (mChoice) of values to be matched against.
`nomatch`	value to return if a value for `x` does not exist in `table`.
`incomparables`	logical whether incomparable values should be compaired.
`...`	a series of vectors
`label`	a character string `label` attribute to attach to the matrix created by `mChoice`
`sort.levels`	set `sort.levels="alphabetic"` to sort the columns of the matrix created by `mChoice` alphabetically by category rather than by the original order of levels in component factor variables (if there were any input variables that were factors)
`add.none`	Set `add.none` to `TRUE` to make a new category `'none'` if it doesn't already exist and if there is an observations with no choices selected.
`drop`	set `drop=FALSE` to keep unused factor levels as columns of the matrix produced by `mChoice`
`ignoreNA`	set to `FALSE` to keep any `NA`s present in data as a real level. Prior to Hmisc 4.7-2 `FALSE` was the default.
`x`	an object of class `"mchoice"` such as that created by `mChoice`. For `is.mChoice` is any object.
`object`	an object of class `"mchoice"` such as that created by `mChoice`
`ncombos`	maximum number of combos.
`width`	With of a line of text to be formated
`quote`	quote the output
`max.levels`	max levels to be displayed
`minlength`	By default no abbreviation of levels is done in `format` and `summary`. Specify a positive integer to use abbreviation in those functions. See `abbreviate`.
`short`	set to `TRUE` to have `summary.mChoice` use integer choice numbers in its tables, and to print the choice level definitions at the top
`sep`	character to use to separate levels when formatting
`prlabel`	set to `FALSE` to keep `print.summary.mChoice` from printing the variable label and number of unique values. Ignore for html output.
`render`	applies of `options(prType='html')` is in effect. Set to `FALSE` to return the html text instead of rendering the html.
`values`	a scalar or vector. If `values` is integer, it is the choice codes, and if it is a character vector, it is assumed to be value labels. For `inmChoicelike` `values` must be character strings which are pieces of choice labels.
`condition`	set to `'all'` for `inmChoice` to require that all choices in `values` be present instead of the default of any of them present.
`ignore.case`	set to `TRUE` to have `inmChoicelike` ignore case in the data when matching on `values`
`fixed`	see `grep`

Value

mChoice returns a character vector of class "mChoice" plus attributes "levels" and "label". summary.mChoice returns an object of class "summary.mChoice". inmChoice and inmChoicelike return a logical vector. format.mChoice returns a character vector, and as.double.mChoice returns a binary numeric matrix. nmChoice returns an integer vector. print.summary.mChoice returns an html character string if options(prType='html') is in effect.

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

Examples

options(digits=3)
set.seed(3)
n <- 20
sex <- factor(sample(c("m","f"), n, rep=TRUE))
age <- rnorm(n, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), n, rep=TRUE))


# Generate a 3-choice variable; each of 3 variables has 5 possible levels
symp <- c('Headache','Stomach Ache','Hangnail',
          'Muscle Ache','Depressed')
symptom1 <- sample(symp, n, TRUE)
symptom2 <- sample(symp, n, TRUE)
symptom3 <- sample(symp, n, TRUE)
cbind(symptom1, symptom2, symptom3)[1:5,]
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
Symptoms
print(Symptoms, long=TRUE)
format(Symptoms[1:5])
inmChoice(Symptoms,'Headache')
inmChoicelike(Symptoms, 'head', ignore.case=TRUE)
levels(Symptoms)
inmChoice(Symptoms, 3)
# Find all subjects with either of two symptoms
inmChoice(Symptoms, c('Headache','Hangnail'))
# Note: In this example, some subjects have the same symptom checked
# multiple times; in practice these redundant selections would be NAs
# mChoice will ignore these redundant selections
# Find all subjects with both symptoms
inmChoice(Symptoms, c('Headache', 'Hangnail'), condition='all')

meanage <- N <- numeric(5)
for(j in 1:5) {
 meanage[j] <- mean(age[inmChoice(Symptoms,j)])
 N[j] <- sum(inmChoice(Symptoms,j))
}
names(meanage) <- names(N) <- levels(Symptoms)
meanage
N

# Manually compute mean age for 2 symptoms
mean(age[symptom1=='Headache' | symptom2=='Headache' | symptom3=='Headache'])
mean(age[symptom1=='Hangnail' | symptom2=='Hangnail' | symptom3=='Hangnail'])

summary(Symptoms)

#Frequency table sex*treatment, sex*Symptoms
summary(sex ~ treatment + Symptoms, fun=table)
# Check:
ma <- inmChoice(Symptoms, 'Muscle Ache')
table(sex[ma])

# could also do:
# summary(sex ~ treatment + mChoice(symptom1,symptom2,symptom3), fun=table)

#Compute mean age, separately by 3 variables
summary(age ~ sex + treatment + Symptoms)


summary(age ~ sex + treatment + Symptoms, method="cross")

f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE)
f
# trio of numbers represent 25th, 50th, 75th percentile
print(f, long=TRUE)

harrelfe/Hmisc documentation built on June 13, 2025, 7:22 a.m.