Home

/

GitHub

/

In prlitics/dumdum: An Easy Way to Make Dummy (Binary) Variables

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Currently, the go-to way of making dummy variables in base R is with a series of ifelse() statements. This can be an arduous process if you have a variable with several values--which is pretty common in social science analyses (e.g., race, education level, employment status, political party identification, etc.). This cuts down on the work, gives you the ability to easily rename the new, dummied variables, and also makes selecting reference categories simple. It also allows you to make dummies across many variables at once.

This document describes how to use the dumdum package and it's two core functions dummify() and dummify_across() to make constructing dummy variables (i.e., binary, 0/1 variables) much less cumbersome.

Table of Contents{#link0}

dummify()
- data
- var
- reference
- dumNames
- keepNA
dummify_across()
- data
- vars
- `reference

knitr::opts_chunk$set(include = FALSE)
library(dumdum)

`dummify()`{#link1}

dummify() has 4 arguments: data, var, reference, dumNames, and keepNA. It accepts a data frame and variable name, in " ", and returns a data frame that is equal to the one passed except it has additional columns with the new dummy variables. With the default options, these columns follow the naming convention OriginalVarName_DUM_Value. It only accepts one variable at a time. To analyze multiple variables, use dummify_across().

Return to top.

`data` {#link2}

Currenty, dummify() accepts objects that return TRUE when passed to is.data.frame(). Future updates may seek to expand this usage if there is a demand for it.

Return to top.

`var` {#link3}

var accepts a single variable name, enclosed in strings (e.g., "var" not var). var the values in var can be a character, interval, or factor. Currently, dummify() will temporarily recode factor variables as characters and then reconvert them to factor using the original levels. This results in a warning so that you, the user, know that this happened and advises you to check and make sure the levels are in the order you want them to be.

dummify(data = iris, var = "Species")

var also accepts the column index of the variable that you want to make into dummy variables.

dummify(data = iris, var = 5)

Return to top.

`reference`{#link4}

reference is the first of two optional variables in dummify(). Its default is NULL. In regression analysis using sets of dummy variables, at least one of the variables must be excluded to avoid perfect multicollinearity. This is the "reference category"--or the category that all the other dummy variables are compared to with regards to their coefficients and statistical significance testing. This option allows users to make a reference category for their variables and to specify which value (or values) they want to serve as the reference.

reference can be either TRUE or a character vector of values within the variable being dummified. If the user tries to pass a value that is not within the actual values of the variable, dummify() will pass a warning alerting them of this and proceed with the default renaming procedure.

If reference is set to TRUE it will exclude the first value it encounters as the reference. If it is set to a character vector, it will exclude all the values specified in the vector.

dummify(data = iris, var = "Species", reference = T)

dummify(data = iris, var = "Species", reference = "setosa")

dummify(data = iris, var = "Species", reference = c("setosa", "virginica"))

Return to top.

`dumNames`{#link5}

dumNames allows the user to overwrite the default naming convention and specify their own names for the variables to be made. It defaults to NULL. It accepts either a character vector or a named character vector. Currently, if the vector of names does not equal the number of values passed to the function (e.g., the number of values excluding reference categories), it will pass back a warning and instead proceed with the default renaming procedure.

dummify(data = iris, var = "Species", dumNames = c("BristlePointed","BlueFlag","Virginia"))
dummify(data = iris, var = "Species", dumNames = c("BristlePointed" = "setosa",
  "BlueFlag" = "versicolor","Virginia" = "virginica"))

Return to top.

`keepNA`

Researchers may consider NA values as informative, and thus may want to have a column where 1 indicates if the values were NA and 0 if they were not. The default behavior (keepNA = TRUE) constructs such a column. If users do not want to consider NAs in dummying, they can set keepNA = FALSE.

`dummify_across()`{#link6}

dummify_across() allows users to create dummies from multiple variables at the same time. This can be handy if you are working with a dataset that has multiple categorical variables, such as respondent sex and martial status. It accepts three arguments: data, vars, and reference.

dummify_across() is essentially a wrapper that calls dummify() multiple times over each variable specified.

Return to top.

`data` {#link7}

Currenty, dummify_across() accepts objects that return TRUE when passed to is.data.frame(). Future updates may seek to expand this usage if there is a demand for it.

Return to top.

`vars` {#link8}

vars are the variables that you, the user, want to make dummies out of. It accepts either a character vector or a vector of column indices. It will not accept vectors with only one entry. If you would like to only make dummies out of one variable, use dummify() instead.

#install.packages("palmerpenguins")

library(palmerpenguins)

dummify_across(penguins, c("species","island","sex"))
dummify_across(penguins, c(1,2,7))

Return to top.

`reference`{#link9}

reference is an option that allows you to indicate if you want all of the variables to have a reference category. It defaults to NULL. It accepts TRUE, which will create reference categories out of the first-encountered value for all of the variables.

Future updates to dumdum will allow users to specify which variables they would like to have reference values--and the values they want to exclude from each variable.

dummify_across(penguins, c("species","island","sex"), reference = TRUE)

Return to top.

prlitics/dumdum documentation built on Aug. 12, 2020, 12:54 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

prlitics/dumdum
An Easy Way to Make Dummy (Binary) Variables

In prlitics/dumdum: An Easy Way to Make Dummy (Binary) Variables

Table of Contents{#link0}

`dummify()`{#link1}

`data` {#link2}

`var` {#link3}

`reference`{#link4}

`dumNames`{#link5}

`keepNA`

`dummify_across()`{#link6}

`data` {#link7}

`vars` {#link8}

`reference`{#link9}

R Package Documentation

Browse R Packages

We want your feedback!

prlitics/dumdum An Easy Way to Make Dummy (Binary) Variables

In prlitics/dumdum: An Easy Way to Make Dummy (Binary) Variables

Table of Contents{#link0}

dummify(){#link1}

data {#link2}

var {#link3}

reference{#link4}

dumNames{#link5}

keepNA

dummify_across(){#link6}

data {#link7}

vars {#link8}

reference{#link9}

R Package Documentation

Browse R Packages

We want your feedback!

prlitics/dumdum
An Easy Way to Make Dummy (Binary) Variables

`dummify()`{#link1}

`data` {#link2}

`var` {#link3}

`reference`{#link4}

`dumNames`{#link5}

`keepNA`

`dummify_across()`{#link6}

`data` {#link7}

`vars` {#link8}

`reference`{#link9}