binarizeCategoricalVariable | R Documentation |

Given a categorical variable, this function creates a set of indicator variables for the various possible sets of levels.

binarizeCategoricalVariable( x, levelOrder = NULL, ignore = NULL, minCount = 3, val1 = 0, val2 = 1, includePairwise = TRUE, includeLevelVsAll = FALSE, dropFirstLevelVsAll = FALSE, dropUninformative = TRUE, namePrefix = "", levelSep = NULL, nameForAll = "all", levelSep.pairwise = if (length(levelSep)==0) ".vs." else levelSep, levelSep.vsAll = if (length(levelSep)==0) (if (nameForAll=="") "" else ".vs.") else levelSep, checkNames = FALSE, includeLevelInformation = TRUE)

`x` |
A vector with categorical values. |

`levelOrder` |
Optional specification of the levels (unique values) of |

`ignore` |
Optional specification of levels of |

`minCount` |
Levels of |

`val1` |
Value for the lower level in binary comparisons. |

`val2` |
Value for the higher level in binary comparisons. |

`includePairwise` |
Logical: should pairwise binary indicators be included? For each pair of levels, the indicator is |

`includeLevelVsAll` |
Logical: should binary indicators for each level be included? The indicator is |

`dropFirstLevelVsAll` |
Logical: should the column representing first level vs. all be dropped? This makes the resulting matrix of indicators usable for regression models. |

`dropUninformative` |
Logical: should uninformative (constant) columns be dropped? |

`namePrefix` |
Prefix to be used in column names of the output. |

`nameForAll` |
When naming columns that represent a level vs. all others, |

`levelSep` |
Separator for levels to be used in column names of the output. If |

`levelSep.pairwise` |
Separator for levels to be used in column names for pairwise indicators in the output. |

`levelSep.vsAll` |
Separator for levels to be used in column names for level vs. all indicators in the output. |

`checkNames` |
Logical: should the names of the output be made into syntactically correct R language names? |

`includeLevelInformation` |
Logical: should information about which levels are represented by which columns be included in the attributes of the output? |

The function creates two types of indicators. The first is one level (unique value) of `x`

vs. all
others, i.e., for a given level, the indicator is `val2`

(usually 1) for all elements of `x`

that
equal the level, and `val1`

(usually 0)
otherwise. Column names for these indicators are the concatenation of `namePrefix`

, the level,
`nameSep`

and `nameForAll`

. The level vs. all indicators are created for all levels that have at
least `minCounts`

samples, are present in `levelOrder`

(if it is non-NULL) and are not included in
`ignore`

.

The second type of indicator encodes binary comparisons. For each pair of levels (both with at least
`minCount`

samples), the indicator is `val2`

(usually 1) for the higher level and `val1`

(usually 0) for the lower level. The level order is given by `levelOrder`

(which defaults to the sorted
levels of `x`

), assumed to be sorted in increasing order. All levels with at least `minCount`

samples that are included in `levelOrder`

and not included in `ignore`

are included.

A matrix containing the indicators variabels, one in each column. When `includeLevelInformation`

is
`TRUE`

, the attribute `includedLevels`

is a table with one column per output column and two rows,
giving the two levels (unique values of x) represented by the column.

Peter Langfelder

Variations and wrappers for this function:
`binarizeCategoricalColumns`

for binarizing several columns of a matrix or data frame

set.seed(2); x = sample(c("A", "B", "C"), 15, replace = TRUE); out = binarizeCategoricalVariable(x, includePairwise = TRUE, includeLevelVsAll = TRUE); data.frame(x, out); attr(out, "includedLevels") # A different naming for level vs. all columns binarizeCategoricalVariable(x, includeLevelVsAll = TRUE, nameForAll = "");

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.