coding: Generate Common Coding Schemes for Categorical Predictors
In rettopnivek/designmatrix: Interfaces for Specifying Design Matrices

Description Usage Arguments Details Value Examples

View source: R/coding.R

A convenience function for implementing common coding schemes used with categorical variables (e.g., dummy/treatment coding, effect/sum coding, etc.).

coding(
  dm,
  type = "Intercept",
  variables = NULL,
  columns = NULL,
  index = NULL,
  start = NULL
)

`dm`	An object of class `designmatrix`.
`type`	A keyword (can be capitalized) indicating the type of coding scheme to apply. Currently the function implements 5 types: Intercept Generates a column of ones. Keywords include 'intercept', 'grand mean', and 'int'. Identity For K levels in the subset of grouping variables, generates K columns with a value of one for the kth level and zero otherwise. Keywords include 'identity' or 'I'. Dummy For K levels in the subset of grouping variables, generates K - 1 columns. Assigns a value of one to the kth group and zero otherwise. One level is excluded as a reference group (by default the first level) with values set to 0 across the K - 1 columns. Keywords include 'dummy', 'treatment', and 'DC'. Effect For K levels in the subset of grouping variables, generates K - 1 columns. Assigns a value of one to the kth group and zero otherwise. One level is excluded as a reference group (by default the first level) with values set to -1 across the K - 1 columns. Keywords include 'effect', 'effects', 'sum', 'anova', 'aov', and 'EC'. Linear For K levels in the subset of grouping variables, generates a single column with one to K even steps, adjusted to be orthogonal (i.e., sum to zero). Keywords include 'linear', 'trend', and 'L'.
`variables`	The subset of grouping variables to consider when implementing the coding scheme.
`columns`	An optional vector indicating which columns of the design matrix to update.
`index`	An optional value/vector for changing the reference group/order when implement dummy/effect coding or linear trends. Cannot exceed the number of levels for the subset of grouping variables.
`start`	An optional value indicating the starting column in the design matrix to begin updating.

Dummy/Treatment coding: For K levels, K - 1 dichotomous variables are created where each level of the subset of grouping variables is contrasted against a reference level. The intercept has a specific interpretation - the mean of the reference level. Coefficients associated with the K - 1 dichotomous variables indicate the difference in means of the given level relative to the reference level. As an example, consider the data set PlantGrowth, which as a single grouping variable 'group', with three levels: 'ctrl', 'trt1', and 'trt2'. Dummy coding is useful here, setting the 'ctrl' level as the reference and creating 2 dichotomous variables to estimate the difference between 'ctrl' and 'trt1' and 'trt2' respectively.
Effect/Sum coding: For K levels, K - 1 dichotomous variables are created where each level of the subset of grouping variables is contrasted against the grand mean. One level must be specified as a reference, with a value fixed to -1 across the dichotomous variables. The intercept has a specific interpretation - the grand mean of the sample. Coefficients associated with the K - 1 dichotomous variables indicate the difference in means of the given level relative to the grand mean. The difference between the grand mean and the reference level is the negative of the sum of the coefficients. This is the typical coding scheme used in the linear model underlying analysis of variance (i.e., ANOVA).
Linear trends: If one can assume the levels of the predictor are evenly spaced (i.e., an interval variable), a linear trend can be specified. There are several ways to specify a linear trend - here, the trend is specified as to be orthogonal, by setting the values as 1 to K and then centering them (i.e., subtracting the mean). This means that the intercept can be interpreted as the grand mean.

A matrix, the subset of columns in the summary matrix in the designmatrix object. Note the subset<- method can infer which columns should be updated based on the output of the coding function.

# Identity coding
dm = designmatrix( PlantGrowth, list( 'weight', 'group' ) )
# Update summary matrix
subset( dm ) = coding( dm, type = 'I' )
# Update full design matrix
dm = designmatrix( dm ); print( dm )

# Dummy coding
dm = designmatrix( PlantGrowth, list( 'weight', 'group' ) )
subset( dm ) = coding( dm, type = 'DC' )
# Update full design matrix
dm = designmatrix( dm ); print( dm )

# Effect coding
dm = designmatrix( ToothGrowth, list( 'len', c( 'supp', 'dose' ) ) )
# Specify coding separately for each variable
subset( dm ) = coding( dm, type = 'EC', variables = 'supp' )
# Second row already has coding for variable 'supp'
subset( dm ) = coding( dm, type = 'EC', variables = 'dose', start = 3 )
# Update full design matrix
dm = designmatrix( dm ); print( dm )

# Linear trend
dm = designmatrix( ToothGrowth, list( 'len', c( 'supp', 'dose' ) ) )
# Implement linear trend only for 'dose' variable
subset( dm ) = coding( dm, type = 'L', variables = 'dose' )
# Different coding schemes can be mixed and matched
subset( dm ) = coding( dm, type = 'EC', variables = 'supp', start = 3 )
# Update full design matrix
dm = designmatrix( dm ); print( dm )