createDataExt: Simulate test data

Description Usage Arguments Note Author(s) Examples

View source: R/createData.R

Description

This function creates synthetic dataset with various problems such as overdispersion, zero-inflation, etc. and was modified to use also externally generated environmental data.

Usage

1
2
3
4
5
6
createDataExt(replicates = 1, sampleSize = 10, extPredictors = NULL,
  intercept = 0, fixedEffects = 1, quadraticFixedEffects = NULL,
  numGroups = 10, randomEffectVariance = 1, overdispersion = 0,
  family = gaussian(), scale = 1, cor = 0, roundPoissonVariance = NULL,
  pZeroInflation = 0, binomialTrials = 1, temporalAutocorrelation = 0,
  spatialAutocorrelation = 0, factorResponse = F)

Arguments

replicates

number of datasets to create

sampleSize

sample size of the dataset

extPredictors

dataframe of environmental data generated externally

intercept

intercept (linear scale)

fixedEffects

vector of fixed effects (linear scale)

quadraticFixedEffects

vector of quadratic fixed effects (linear scale)

numGroups

number of groups for the random effect

randomEffectVariance

variance of the random effect (intercept)

overdispersion

if this is a numeric value, it will be used as the sd of a random normal variate that is added to the linear predictor. Alternatively, a random function can be provided that takes as input the linear predictor.

family

family

scale

scale if the distribution has a scale (e.g. sd for the Gaussian)

cor

correlation between predictors

roundPoissonVariance

if set, this creates a uniform noise on the possion response. The aim of this is to create heteroscedasticity

pZeroInflation

probability to set any data point to zero

binomialTrials

Number of trials for the binomial. Only active if family == binomial

temporalAutocorrelation

strength of temporalAutocorrelation

spatialAutocorrelation

strength of spatial Autocorrelation

factorResponse

should the response be transformed to a factor (intended to be used for 0/1 data)

Note

The basic structure of this function was taken from createData in the package DHARMa by Florian Hartig.

Author(s)

Lisa Huelsmann

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
require(lme4)


# complex environment

testData = createDataExt(sampleSize = 2000, intercept = 5, fixedEffects = c(2, 2, 0), cor = 0.5)

pairs(testData[, grepl("Environment", names(testData))])

fittedModel <- lmer(observedResponse ~ Environment1 + Environment2 + Environment3 + (1|group), data = testData)
summary(fittedModel)



# External predictors

nvar <- 4
mu <- sample(seq(0.01, 0.5, length.out = 50), nvar, replace = TRUE)
data_sim <- corrEnv(n = 2000, nvar = nvar, ngrad = 3, mu = mu, rho = 0.9, rho.non.corr = 0)

testData = createDataExt(sampleSize = 2000, extPredictors = data_sim[[2]], fixedEffects = c(2, 2, 1, 0))

hist(testData$observedResponse)
pairs(testData[, grepl("Environment", names(testData))])

fittedModel <- lmer(observedResponse ~ Environment1 + Environment2 + Environment3 + Environment4 + (1|group), data = testData)
summary(fittedModel)



### FROM DHARMa



# with zero-inflation

testData = createDataExt(sampleSize = 500, intercept = 2, fixedEffects = c(1),
                      overdispersion = 0, family = poisson(), quadraticFixedEffects = c(-3),
                      randomEffectVariance = 0, pZeroInflation = 0.6)

par(mfrow = c(1,2))
plot(testData$Environment1, testData$observedResponse)
hist(testData$observedResponse)


# binomial with multiple trials

testData = createDataExt(sampleSize = 40, intercept = 2, fixedEffects = c(1),
                      overdispersion = 0, family = binomial(), quadraticFixedEffects = c(-3),
                      randomEffectVariance = 0, binomialTrials = 20)

plot(observedResponse1 / observedResponse0 ~ Environment1, data = testData, ylab = "Proportion 1")


# spatial / temporal correlation

testData = createDataExt(sampleSize = 100, family = poisson(), spatialAutocorrelation = 3,
                      temporalAutocorrelation = 3)

par(mfrow = c(1,2))
plot(log(observedResponse) ~ time, data = testData)
plot(log(observedResponse) ~ x, data = testData)

rbslandau/ecoteach documentation built on May 26, 2019, 12:35 a.m.