generateAnovaDatasets: Generate ANOVA type datasets
In kollerma/robustlmm: Robust Linear Mixed Effects Models

generateAnovaDatasets

R Documentation

Generate ANOVA type datasets

Description

Generate balanced datasets with multiple factors. All combinations of all factor variables are generated, i.e., a fully crossed dataset will be generated. numberOfReplicates specifies the number of replications per unique combination.

Usage

generateAnovaDatasets(
  numberOfDatasetsToGenerate,
  numberOfLevelsInFixedFactor,
  numberOfSubjects,
  numberOfReplicates,
  errorGenerator = rnorm,
  randomEffectGenerator = rnorm,
  trueBeta = 1,
  trueSigma = 4,
  trueTheta = 1,
  ...,
  arrange = FALSE
)

Arguments

`numberOfDatasetsToGenerate`	number of datasets to generate.
`numberOfLevelsInFixedFactor`	scalar or vector with the number of levels per fixed factor or grouping variable.
`numberOfSubjects`	scalar or vector with the number of levels per variance component.
`numberOfReplicates`	number of replicates per unique combination of fixed factor and variance component.
`errorGenerator`	random number generator used for the errors.
`randomEffectGenerator`	random number generator used for the spherical random effects.
`trueBeta`	scalar or vector with the true values of the fixed effects coefficients. Can be of length one in which case it will be replicated to the required length if needed.
`trueSigma`	scalar with the true value of the error scale.
`trueTheta`	scalar of vector with the true values for the variance component coefficients, not including sigma. Can be of length one in which case it will be replicated to the required length if needed.
`...`	all additional arguments are added to the returned list.
`arrange`	If `TRUE`, the observations in the dataset are arranged such that the call to `arrange` in `varComprob` does not break the observation- group relationship. This requires package dplyr to be installed.

Details

numberOfLevelsInFixedFactor can either be a scalar or a vector with the number of levels for each fixed effects group. If numberOfLevelsInFixedFactor is a scalar, the value of 1 is allowed. This can be used to generate a dataset with an intercept only. If numberOfLevelsInFixedFactor is a vector with more than one entry, then all the values need to be larger than one.

numberOfSubjects can also be a scalar of a vector with the number of levels for each variance component. Each group needs to have more than one level. The vector is sorted descending before the names are assigned. This ensures that, when running lmer, the order of the random effects does not change. lmer also sorts the random effects by decending number of levels.

In order to save memory, only the generated random effects and the errors are stored. The dataset is only created on demand when the method generateData in the returned list is evaluated.

The random variables are generated in a way that one can simulate more datasets easily. When starting from the same seed, the first generated datasets will be the same as for the a previous call of generateAnovaDatasets with a smaller number of datasets to generate, see examples.

Value

list with generators and the original arguments

`generateData`:	function to generate data taking one argument, the dataset index.
`createXMatrix`:	function to generate X matrix taking one argument, the result of `generateData`.
`createZMatrix`:	function to generate Z matrix taking one argument, the result of `generateData`.
`createLambdaMatrix`:	function to generate Lambda matrix taking one argument, the result of `generateData`.
`randomEffects`:	function to return the generated random effects taking one argument, the dataset index.
`sphericalRandomeffects`:	function to return the generated spherical random effects taking one argument, the dataset index.
`errors`:	function to return the generated errors taking one argument, the dataset index.
`allRandomEffects`:	function without arguments that returns the matrix of all generated random effects.
`allErrors`:	function without arguments that returns the matrix of all generated errors.
`numberOfDatasets`:	`numberOfDatasetsToGenerate` as supplied
`numberOfLevelsInFixedFactor`:	`numberOfLevelsInFixedFactor` as supplied
`numberOfSubjects`:	`numberOfSubjects` sorted.
`numberOfReplicates`:	`numberOfReplicates` as supplied
`numberOfRows`:	number of rows in the generated dataset
`trueBeta`:	true values used for beta
`trueSigma`:	true value used for sigma
`trueTheta`:	true values used for theta
`formula`:	formula to fit the model using `lmer`
`...`:	additional arguments passed via `...`

Author(s)

Manuel Koller

Examples

  oneWay <- generateAnovaDatasets(2, 1, 5, 4)
  head(oneWay$generateData(1))
  head(oneWay$generateData(2))
  oneWay$formula
  head(oneWay$randomEffects(1))
  head(oneWay$sphericalRandomEffects(1))
  head(oneWay$errors(1))

  twoWayFixedRandom <- generateAnovaDatasets(2, 3, 5, 4)
  head(twoWayFixedRandom$generateData(1))
  twoWayFixedRandom$formula

  twoWayRandom <- generateAnovaDatasets(2, 1, c(3, 5), 4)
  head(twoWayRandom$generateData(1))
  twoWayRandom$formula

  large <- generateAnovaDatasets(2, c(10, 15), c(20, 30), 5)
  head(large$generateData(1))
  large$formula

  ## illustration how to generate more datasets
  set.seed(1)
  datasets1 <- generateAnovaDatasets(2, 1, 5, 4)
  set.seed(1)
  datasets2 <- generateAnovaDatasets(3, 1, 5, 4)
  stopifnot(all.equal(datasets1$generateData(1), datasets2$generateData(1)),
            all.equal(datasets1$generateData(2), datasets2$generateData(2)))

kollerma/robustlmm documentation built on June 14, 2025, 11:05 a.m.