Creating and Estimating Generalized Multilevel Permutation Models (GMPMs)
gmpmCreate creates a GMPM object without performing permutation runs.
gmpmEstimate performs permutation runs, and
gmpm is a convenience
function that performs a call to gmpmCreate followed by a call to gmpmEstimate.
1 2 3 4 5
A symbolic formula describing the regression model to be fitted, including a multilevel blocking factor following the '|' symbol. Details of model specification can be found in the section ‘Details’.
One of the following options: a valid glm class (e.g., binomial, gaussian, poisson, etc.), “multinomial”, and “user” (for a user-defined model). Options “multinomial” and “user” must be entered within quotation marks.
A data frame containing the data model will be fit to. If left unspecified, gmpm will search for the data within the parent environment.
Names of all independent variables (IVs) included as predictors in the model formula, entered as a vector of character strings. All other predictors in the model will be considered ‘covariates’ and will not be subject to randomization.
A gmpmControl object used to control the fitting function. This is a list one or more of the following options. Any option not specified in the list will be given a default value.
A GMPM object
Other arguments to be passed on to the underlying function that performs the actual regression (see below).
Generalized Multilevel Permutation Models use regression to fit a model to the data under the original labeling of experimental conditions, and then compares the parameter estimates from this “original fit” against null-hypothesis distributions obtained from fits to random relabelings of the data set. These functions are especially useful for time-series data with difficult or unknown sources of dependency. They can also be used for multivariate analyses. Currently, p-values are provided for categorical predictors and interactions between categorical predictors and continuous covariates, but not for continuous covariates themselves.
The ‘formula’ option to the model specifies the form of the
model to be fit. The left side of the formula identifies the
dependent variable (DV), which is usually a vector of: real numbers
(for continuous data), whole numbers (for counts), or 0s and 1s (for
binomial data). It can also be a factor with two-levels (binomial
data) or more (multinomial data). Lastly, one can also use the
cbind syntax for binomial or multinomial DVs (see example
The right hand side of the formula specifies the predictors in the
regression model, using standard R format (see
further information). The predictors can include both design
variables (IVs) and covariates. Any IVs to be randomized must be
identified in the ‘ivars’ option; all other variables are
assumed to be covariates, and will not be randomized. All IVs will be
treated as categorical factors; continuous IVs are not allowed. The
IVs are internally coded using ‘deviation’ coding, such that
the codes sum to zero. The function
be used to retrieve this internal coding.
The ‘family’ option specifies the type of dependent variable.
This option determines which of various regression subroutines are
used to fit the model. All of the options for generalized linear
models (glms) are available (see
family for details); if
you choose one of these options then gmpm will call the underlying
glm to perform the actual
regression. If the option ‘multinomial’ is chosen, then gmpm
will fit a multinomial model using
multinom in package
nnet. In case tweaks to these underlying functions are
necessary, it is possible to specify options to
gmpmCreate that will be passed along during the fitting process.
For example, with multinomial models it is often necessary to increase
the maximum number of iterations, which can be done by the argument
‘maxit’. User-defined models are not yet implemented, but left
for future development.
Each function returns a
Dale J. Barr <email@example.com>
Barr, D. J. (in preparation). Generalized Multilevel Permutation Models.
kb07 for an example of usage with
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# create a df for a within-subject design # with no main effect df1 <- data.frame(SubjID=rep(1:20,each=2), A=factor(rep(c("a1","a2"),times=10)), Y=rnorm(40)*2) # parametric and non-parametric analyses should look similar df1.gmpm <- gmpm(Y ~ A | SubjID, gaussian, df1, c("A")) # parametric and non-parametric analyses should look similar summary(df1.gmpm) summary(aov(Y ~ A + Error(SubjID), df1)) # now create horrible dependencies by reduplicating observations df2 <- rbind(df1,df1,df1,df1,df1,df1,df1,df1,df1) # parametric analysis more likely to give Type I error summary(aov(Y ~ A + Error(SubjID), df2)) df2.gmpm <- gmpm(Y ~ A | SubjID, gaussian, df2, c("A"))