knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

dumdum dumdum_hexagon

Make dummy variables easily in R

version size issues

Dummy variables (binary, 0/1 variables) are a frequent part of analyses in the social sciences. But making them can be an arduous process in base R, which tends rely on numerous ifelse()statements. This can be especially trying and headache-inducing if you have a variable with several values (e.g., race, education level, employment status, political party identification, etc.).

dumdum cuts down on the work, gives you the ability to easily rename the new, dummied variables, and also makes selecting reference categories simple. It also allows you to make dummies across many variables at once.

There are currently a few other packages that helps users make dummy variables such as ml, caret, tidymodels, and fastdummies. What sets dumdum apart is that it was designed with social-scientists and non-machine learning folk in mind. Its functions are flexible and intuitive. It also requires no package dependencies; all that's necessary to have installed are the base R packages!

Installation

You can install the latest version from GitHub with:

# install.packages("devtools")
devtools::install_github("prlitics/dumdum")

or

# install.packages("remotes")
remotes::install_github("prlitics/dumdum")

Functions

There are two functions in dumdum

Background in-depth look at dummify()

dummify() has 4 arguments; 2 mandatory and 2 options. The options are set to NULL as default.

dummify(data, var, reference = NULL, dumNames = NULL)

data & var requirements

reference and dumNames options.

Examples

These examples are going to use the Palmer Penguins dataset because there are a number of "dummy-able" variables in it. (And, also, like, penguins!!)

library(dumdum)
penguins<-palmerpenguins::penguins
knitr::kable(head(penguins))

dummify {#dummify}

Let's say that you want to make a dummy variable for the penguins' sex because you plan to run a regression where you check to see if sex is predictive of body mass. To make dummy variables out of species, you could do this with dummify():

penguins<- palmerpenguins::penguins
penguins_dummied<-dummify(data = penguins, var = "sex")
knitr::kable(head(penguins_dummied))

If you wanted to set "male" as the reference category, you could do:

penguins_dummied<-dummify(data = penguins, var = "sex", reference = "male")
knitr::kable(head(penguins_dummied))

The default naming convention is to make sure that the user knows what the 1 is in reference to in that column. You can also rename the columns.

penguins_dummied<-dummify(data = penguins, var = "sex", reference = "male", dumNames = c("f","unknown"))
knitr::kable(head(penguins_dummied))

If you didn't want to worry about putting the list of column names in the right order:

penguins_dummied<-dummify(data = penguins, var = "sex", reference = "male", dumNames = c("f"="female","unknown"=NA))
knitr::kable(head(penguins_dummied))

dummify_across

Let's say that there are multiple variables that you want to dummy across. In the case of the penguins, you might want to dummy species, as well as the island and sex. You can do so with dummify_across().

dummify_across() is a wrapper for dummify() that allows you to pass multiple variables at once. Like dummify(), you specify a data frame object and you specify a set of variables (vars) that you want to be dummified. These can either be names or column indices.

penguins_dummied<-dummify_across(data = penguins, vars = c("sex","species","island"))
knitr::kable(head(penguins_dummied))

You can also pass along whether or not you want dummify_across() to leave out a reference column for the variables you selected:

penguins_dummied<-dummify_across(data = penguins, vars = c("sex","species","island"), reference = TRUE)
knitr::kable(head(penguins_dummied))

Currently, dummify_across() will only leave out the first encountered variable as a reference. Future updates to the package will allow you to specify which variables you want to have reference categories for--as well as the values for those references.

Playing nice with pipes

I personally am a huge fan of the tidyverse; it's what allowed me to get my feet wet with R before I could truly dive into it. I know a lot of potential dumdum users would also use the tidyverse, so it was important to me that dumdum functions were pipe-able.

library(magrittr)
pen_df <- penguins %>%
  dummify("sex")
knitr::kable(head(pen_df))
pen_df <- penguins %>%
  dummify_across(c("sex","island","species"))
knitr::kable(head(pen_df))

Bugs or suggestions

If you have any bugs or suggestions, let me know! Always happy for constructive feedback.

Acknowledgements

Huge thanks to Sabrina Marasa, who tested the package on the Mac version of R.

License

This function is distributed under a MIT license.

Citation

citation("dumdum")

References

Data & packages used in this readme.



prlitics/dumdum documentation built on Aug. 12, 2020, 12:54 a.m.