nominalCoding: Coding Schemes for Nominal Variables

Description Usage Arguments Details Value Examples

View source: R/utility_functions.R

Description

Creates a numeric coding scheme for a nominal variable with two or more categories.

Usage

1
nominalCoding(x, type = "Effects", levels = NULL, label = NULL, weights = NULL)

Arguments

x

a vector of levels.

type

the type of coding to use, either 'Dummy', 'Effects', or 'Intercept'.

levels

an optional value used to specify the reference group with dummy, simple, and effects coding schemes, or more generally, a vector with the unique levels used to map a desired order when creating the design matrix.

label

an optional character string giving the label for the nominal variable.

weights

an optional vector of weights to be assigned to each level of the nominal variable when applying the coefficient coding scheme.

Details

With dummy coding, each additional category in a nominal variable is compared against a reference category. For K categories, K - 1 dummy variables are created, coded as 1 for the presence of a category and 0 otherwise. The intercept term is interpreted as the cell mean for the reference category.

With simple coding, each additional category in a nominal variable is compared against a reference category. For K categories, K - 1 dummy variables are created, coded as (K-1)/K for the presence of a category, -1/K otherwise. The intercept term is interpreted as the grand mean (the mean of the cell means).

With effects coding (also known as deviation coding), each additional category in a nominal variable is compared against the grand mean. For K categories, K - 1 dummy variables are created, coded as 1 for the presence of a category, -1 for the presence of the reference category, and 0 otherwise. The intercept term is interpreted as the grand mean.

With intercept coding, a separate dummy variable is specified for each category in the nominal variable, coded as 1 when the category is present and 0 otherwise. This coding scheme requires that the model have no intercept term. Instead, predictions for the dependent variable for each category are estimated separately. Therefore, for K categories, K dummy variables are created.

With coefficient coding, a single dummy variable is specified, and a desired weight is assigned for each level of the nominal variable. This is useful for testing hypothesized ordered relationships.

Value

Given K levels for the inputted variable, returns a matrix with K - 1 columns (dummy, simple, and effects coding schemes), K columns (intercept coding schemes), or 1 column (coefficient coding schemes) with the new numerical coding scheme.

Examples

1
2
3
4
5
6
x = rep( c('low','med','high'), each = 2 ) # 3 levels
data.frame( x, nominalCoding( x, type = 'Dummy' ) )
data.frame( x, nominalCoding( x, type = 'Simple' ) )
data.frame( x, nominalCoding( x, type = 'Effects' ) )
data.frame( x, nominalCoding( x, type = 'Intercept' ) )
data.frame( x, nominalCoding( x, type = 'Coefficient', weights = rnorm(3) ) )

rettopnivek/utilityf documentation built on March 1, 2021, 7:05 p.m.