generateAnovaDatasets: Generate ANOVA type datasets

View source: R/generateAnovaDatasets.R

generateAnovaDatasetsR Documentation

Generate ANOVA type datasets

Description

Generate balanced datasets with multiple factors. All combinations of all factor variables are generated, i.e., a fully crossed dataset will be generated. numberOfReplicates specifies the number of replications per unique combination.

Usage

generateAnovaDatasets(
  numberOfDatasetsToGenerate,
  numberOfLevelsInFixedFactor,
  numberOfSubjects,
  numberOfReplicates,
  errorGenerator = rnorm,
  randomEffectGenerator = rnorm,
  trueBeta = 1,
  trueSigma = 4,
  trueTheta = 1,
  ...,
  arrange = FALSE
)

Arguments

numberOfDatasetsToGenerate

number of datasets to generate.

numberOfLevelsInFixedFactor

scalar or vector with the number of levels per fixed factor or grouping variable.

numberOfSubjects

scalar or vector with the number of levels per variance component.

numberOfReplicates

number of replicates per unique combination of fixed factor and variance component.

errorGenerator

random number generator used for the errors.

randomEffectGenerator

random number generator used for the spherical random effects.

trueBeta

scalar or vector with the true values of the fixed effects coefficients. Can be of length one in which case it will be replicated to the required length if needed.

trueSigma

scalar with the true value of the error scale.

trueTheta

scalar of vector with the true values for the variance component coefficients, not including sigma. Can be of length one in which case it will be replicated to the required length if needed.

...

all additional arguments are added to the returned list.

arrange

If TRUE, the observations in the dataset are arranged such that the call to arrange in varComprob does not break the observation- group relationship. This requires package dplyr to be installed.

Details

numberOfLevelsInFixedFactor can either be a scalar or a vector with the number of levels for each fixed effects group. If numberOfLevelsInFixedFactor is a scalar, the value of 1 is allowed. This can be used to generate a dataset with an intercept only. If numberOfLevelsInFixedFactor is a vector with more than one entry, then all the values need to be larger than one.

numberOfSubjects can also be a scalar of a vector with the number of levels for each variance component. Each group needs to have more than one level. The vector is sorted descending before the names are assigned. This ensures that, when running lmer, the order of the random effects does not change. lmer also sorts the random effects by decending number of levels.

In order to save memory, only the generated random effects and the errors are stored. The dataset is only created on demand when the method generateData in the returned list is evaluated.

The random variables are generated in a way that one can simulate more datasets easily. When starting from the same seed, the first generated datasets will be the same as for the a previous call of generateAnovaDatasets with a smaller number of datasets to generate, see examples.

Value

list with generators and the original arguments

generateData:

function to generate data taking one argument, the dataset index.

createXMatrix:

function to generate X matrix taking one argument, the result of generateData.

createZMatrix:

function to generate Z matrix taking one argument, the result of generateData.

createLambdaMatrix:

function to generate Lambda matrix taking one argument, the result of generateData.

randomEffects:

function to return the generated random effects taking one argument, the dataset index.

sphericalRandomeffects:

function to return the generated spherical random effects taking one argument, the dataset index.

errors:

function to return the generated errors taking one argument, the dataset index.

allRandomEffects:

function without arguments that returns the matrix of all generated random effects.

allErrors:

function without arguments that returns the matrix of all generated errors.

numberOfDatasets:

numberOfDatasetsToGenerate as supplied

numberOfLevelsInFixedFactor:

numberOfLevelsInFixedFactor as supplied

numberOfSubjects:

numberOfSubjects sorted.

numberOfReplicates:

numberOfReplicates as supplied

numberOfRows:

number of rows in the generated dataset

trueBeta:

true values used for beta

trueSigma:

true value used for sigma

trueTheta:

true values used for theta

formula:

formula to fit the model using lmer

...:

additional arguments passed via ...

Author(s)

Manuel Koller

See Also

generateMixedEffectDatasets and createDatasetsFromList

Examples

  oneWay <- generateAnovaDatasets(2, 1, 5, 4)
  head(oneWay$generateData(1))
  head(oneWay$generateData(2))
  oneWay$formula
  head(oneWay$randomEffects(1))
  head(oneWay$sphericalRandomEffects(1))
  head(oneWay$errors(1))

  twoWayFixedRandom <- generateAnovaDatasets(2, 3, 5, 4)
  head(twoWayFixedRandom$generateData(1))
  twoWayFixedRandom$formula

  twoWayRandom <- generateAnovaDatasets(2, 1, c(3, 5), 4)
  head(twoWayRandom$generateData(1))
  twoWayRandom$formula

  large <- generateAnovaDatasets(2, c(10, 15), c(20, 30), 5)
  head(large$generateData(1))
  large$formula

  ## illustration how to generate more datasets
  set.seed(1)
  datasets1 <- generateAnovaDatasets(2, 1, 5, 4)
  set.seed(1)
  datasets2 <- generateAnovaDatasets(3, 1, 5, 4)
  stopifnot(all.equal(datasets1$generateData(1), datasets2$generateData(1)),
            all.equal(datasets1$generateData(2), datasets2$generateData(2)))

kollerma/robustlmm documentation built on Jan. 14, 2024, 2:18 a.m.