View source: R/groupMeanCenter.R
gmc | R Documentation |
Multilevel modelers often need to include predictors like the within-group mean and the deviations of individuals around the mean. This function makes it easy (almost foolproof) to calculate those variables.
gmc(dframe, x, by, FUN = mean, suffix = c("_mn", "_dev"), fulldataframe = TRUE)
dframe |
a data frame. |
x |
Variable names or a vector of variable names. Do NOT supply a variable like dat$x1, do supply a quoted variable name "x1" or a vector c("x1", "x2") |
by |
A grouping variable name or a vector of grouping names. Do NOT supply a variable like dat$xfactor, do supply a name "xfactor", or a vector c("xfac1", "xfac2"). |
FUN |
Defaults to the mean, have not tested alternatives |
suffix |
The suffixes to be added to column 1 and column 2 |
fulldataframe |
Default TRUE. original data frame is returned with new columna added (which I would call "Stata style"). If FALSE, this will return only newly created columns, the variables with suffix[1] and suffix[2] appended to names. TRUE is easier (maybe safer), but also wastes memory. |
This was originally just for "group mean-centered" data, but now is more general, can accept functions like median to calculate center and then deviations about that center value within the group.
Similar to Stata egen, except more versatile and fun! Will create 2 new columns for each variable, with suffixes for the summary and deviations (default suffixes are "_mn" and "_dev". Rows will match the rows of the original data frame, so it will be easy to merge or cbind them back together.
Depending on fulldataframe
, either a new data frame
with center and deviation columns, or or original data frame
with "x_mn" and "x_dev" variables appended (Stata style).
Paul Johnson
## Make a data frame out of the state data collection (see ?state) data(state) statenew <- as.data.frame(state.x77) statenew$region <- state.region statenew$state <- rownames(statenew) head(statenew.gmc1 <- gmc(statenew, c("Income", "Population"), by = "region")) head(statenew.gmc2 <- gmc(statenew, c("Income", "Population"), by = "region", fulldataframe = FALSE)) ## Note dangerous step: assumes row alignment is correct. ## return has rownames from original set to identify danger head(statenew2 <- cbind(statenew, statenew.gmc2)) if(!all.equal(rownames(statenew), rownames(statenew.gmc2))){ warning("Data row-alignment probable error") } ## The following box plots should be identical boxplot(Income ~ region, statenew.gmc1) boxplot((Income_mn + Income_dev) ~ region, statenew.gmc1) ## Multiple by variables fakedat <- data.frame(i = 1:200, j = gl(4, 50), k = gl(20, 10), y1 = rnorm(200), y2 = rnorm(200)) head(gmc(fakedat, "y1", by = "k"), 20) head(gmc(fakedat, "y1", by = c("j", "k"), fulldataframe = FALSE), 40) head(gmc(fakedat, c("y1", "y2"), by = c("j", "k"), fulldataframe = FALSE)) ## Check missing value management fakedat[2, "k"] <- NA fakedat[4, "j"] <- NA##' head(gmc(fakedat, "y1", by = "k"), 20) head(gmc(fakedat, "y1", by = c("j", "k"), fulldataframe = FALSE), 40)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.