make.dummies: Create a Dummy Matrix

View source: R/tool_misc.R

make.dummiesR Documentation

Create a Dummy Matrix


Contrast-coded dummy matrix (treatment coding) created from a factor


make.dummies(x, ...)

## Default S3 method:
make.dummies(x, base = 1L, base.add = TRUE, ...)

## S3 method for class 'data.frame'
make.dummies(x, col, base = 1L, base.add = TRUE, ...)

## S3 method for class 'pdata.frame'
make.dummies(x, col, base = 1L, base.add = TRUE, ...)



a factor from which the dummies are created (x is coerced to factor if not yet a factor) for the default method or a data data frame/pdata.frame for the respective method.


further arguments.


integer or character, specifies the reference level (base), if integer it refers to position in levels(x), if character the name of a level,


logical, if TRUE the reference level (base) is added to the return value as first column, if FALSE the reference level is not included.


character (only for the data frame and pdata.frame methods), to specify the column which is used to derive the dummies from,


This function creates a matrix of dummies from the levels of a factor in treatment coding. In model estimations, it is usually preferable to not create the dummy matrix prior to estimation but to simply specify a factor in the formula and let the estimation function handle the creation of the dummies.

This function is merely a convenience wrapper around stats::contr.treatment to ease the dummy matrix creation process shall the dummy matrix be explicitly required. See Examples for a use case in LSDV (least squares dummy variable) model estimation.

The default method uses a factor as main input (or something coercible to a factor) to derive the dummy matrix from. Methods for data frame and pdata.frame are available as well and have the additional argument col to specify the the column from which the dummies are created; both methods merge the dummy matrix to the data frame/pdata.frame yielding a ready-to-use data set. See also Examples for use cases.


For the default method, a matrix containing the contrast-coded dummies (treatment coding), dimensions are n x n where n = length(levels(x)) if argument base.add = TRUE or n = length(levels(x)-1) if base.add = FALSE; for the data frame and pdata.frame method, a data frame or pdata.frame, respectively, with the dummies appropriately merged to the input as last columns (column names are derived from the name of the column used to create the dummies and its levels).


Kevin Tappe

See Also

stats::contr.treatment(), stats::contrasts()


data("Grunfeld", package = "plm")
Grunfeld <- Grunfeld[1:100, ] # reduce data set (down to 5 firms)

## default method
make.dummies(Grunfeld$firm) # gives 5 x 5 matrix (5 firms, base level incl.)
make.dummies(Grunfeld$firm, base = 2L, base.add = FALSE) # gives 5 x 4 matrix

## data frame method
Grun.dummies <- make.dummies(Grunfeld, col = "firm")

## pdata.frame method
pGrun <- pdata.frame(Grunfeld)
pGrun.dummies <- make.dummies(pGrun, col = "firm")

## Model estimation:
## estimate within model (individual/firm effects) and LSDV models (firm dummies)
# within model:
plm(inv ~ value + capital, data = pGrun, model = "within")

## LSDV with user-created dummies by make.dummies:
form_dummies <- paste0("firm", c(1:5), collapse = "+")
form_dummies <- formula(paste0("inv ~ value + capital + ", form_dummies))
plm(form_dummies, data = pGrun.dummies, model = "pooling") # last dummy is dropped

# LSDV via factor(year) -> let estimation function generate dummies:
plm(inv ~ value + capital + factor(firm), data = pGrun, model = "pooling")

plm documentation built on April 9, 2023, 5:06 p.m.