expandCategorical: Expand Data Frame by Re-expressing Categorical Data as Counts

View source: R/expandCategorical.R

expandCategoricalR Documentation

Expand Data Frame by Re-expressing Categorical Data as Counts


Expands the rows of a data frame by re-expressing observations of a categorical variable specified by catvar, such that the column(s) corresponding to catvar are replaced by a factor specifying the possible categories for each observation and a vector of 0/1 counts over these categories.


expandCategorical(data, catvar, sep = ".", countvar = "count",
                  idvar = "id", as.ordered = FALSE, group = TRUE) 



a data frame.


a character vector specifying factors in data whose interaction will form the basis of the expansion.


a character string used to separate the concatenated values of catvar in the name of the new interaction factor.


(optional) a character string to be used for the name of the new count variable.


(optional) a character string to be used for the name of the new factor identifying the original rows (cases).


logical - whether the new interaction factor should be of class "ordered".


logical: whether or not to group individuals with common values over all covariates.


Each row of the data frame is replicated c times, where c is the number of levels of the interaction of the factors specified by catvar. In the expanded data frame, the columns specified by catvar are replaced by a factor specifying the r possible categories for each case, named by the concatenated values of catvar separated by sep. The ordering of factor levels will be preserved in the creation of the new factor, but this factor will not be of class "ordered" unless the argument as.ordered = TRUE. A variable with name countvar is added to the data frame which is equal to 1 for the observed category in each case and 0 elsewhere. Finally a factor with name idvar is added to index the cases.


The expanded data frame as described in Details.


Re-expressing categorical data in this way allows a multinomial response to be modelled as a poisson response, see examples.


Heather Turner


Anderson, J. A. (1984) Regression and Ordered Categorical Variables. J. R. Statist. Soc. B, 46(1), 1-30.

See Also

gnm, multinom, reshape


### Example from help(multinom, package = "nnet")
bwt.mu <- multinom(low ~ ., data = bwt)

## Equivalent using gnm - include unestimable main effects in model so 
## that interactions with low0 automatically set to zero, else could use
## 'constrain' argument. 
bwtLong <- expandCategorical(bwt, "low", group = FALSE)
bwt.po <- gnm(count ~  low*(. - id), eliminate = id, data = bwtLong, family =
summary(bwt.po) # same deviance; df reflect extra id parameters

### Example from ?backPain
backPainLong <- expandCategorical(backPain, "pain")

## Fit models described in Table 5 of Anderson (1984)

noRelationship <- gnm(count ~ pain, eliminate = id,
                      family = "poisson", data = backPainLong)

oneDimensional <- update(noRelationship,
                         ~ . + Mult(pain, x1 + x2 + x3))

gnm documentation built on April 29, 2022, 5:06 p.m.