Description Usage Arguments Value
simulation.study
implements a simulation framework sampling repeatedly
from linear regression models and GLMs, allowing for between-sample heterogeneity.
The purpose is to allow the study of AIC and related statistics in the
context of model selection, with prediction quality as target.
1 2 3 4 5 6 7 8 9 | simulation.study(type = "lm", nsims = 1000, nsamples = c(20, 50, 100, 200,
500, 1000, 2000, 5000, 10000), alpha = 0, beta.x = 1, nX = 10, nZ = 5,
meanX = 0, meanZ = 0, XZCov = diag(nX + nZ), varmeanX = 0,
varmeanZ = 0, simulate.from.data = FALSE, X = NULL, Y = NULL,
var.res = 1, var.RE.Intercept = 0, var.RE.X = 0, rho = NULL,
epsilon = NULL, corsim.var = NULL, noise.epsilon = NULL,
step.k = qchisq(0.05, 1, lower.tail = FALSE), keep.dredge = FALSE,
Xin.or.out = rep(TRUE, nX), glm.family = NULL, glm.offset = NULL,
binomial.n = 1, filename = "results")
|
type |
Character string determining what type of model to fit. At present, available model types are "lm" and "glm", with the former the default. |
nsims |
Number of simulated data sets to analyse for each sample size |
nsamples |
Vector of integers containing the sample sizes |
alpha |
Intercept for simulation model |
beta.x |
Either: vector of slopes for the X covariates; or a single numeric values for a constant slope for all X's |
nX |
Number of "real" covariates |
nZ |
Number of "spurious" covariates |
meanX |
Either: vector of means for the X covariates; or a single numeric value for a constant mean across all X's |
meanZ |
As for meanX but for the Z covariates |
XZCov |
Covariance matrix of the X's and Z's. Must be of dimension (nX+nZ) by
(nX+nZ). Ignored if |
varmeanX |
Either: vector of variances for the means of the X covariates; or a single numeric value for a constant mean across all X's. Non-zero values will produce a different set of covariate means for each individual simulated data set |
varmeanZ |
As for varmeanX but for the Z covariates |
simulate.from.data |
Logical. If |
X |
Matrix of "real" covariates; only used if |
Y |
Vector of "real" response variables; only used if |
var.res |
Residual variance of the simulation model |
var.RE.Intercept |
Random effect variance for the intercept |
var.RE.X |
Either: vector of random effect variances for the X covariate slopes; or a single numeric value for no random slopes in the X's |
rho |
A numeric constant specifying the mean correlation between the X's and the Z's |
epsilon |
A numeric constant specifying the level of variability around the mean
correlation rho; note that a necessary condition is |
corsim.var |
If generating the covariance matrices using rho and epsilon, we to specify the variances (which otherwise are in the leading diagonal of XZCov) |
noise.epsilon |
A numeric constant used to specify whether XZCov is to vary from sample to sample. Higher values indicate more variability; note that this cannot be greater than 1 minus the largest absolute value of (off-diagonal) correlations in the corresponding correlation matrix |
step.k |
Numeric value of the AIC criterion in the stepwise analysis; defaults to about 3.84, corresponding to a p-value of 0.05 for both adding and removing variables |
keep.dredge |
Logical constant on whether to keep the dredge outputs
( |
Xin.or.out |
Vector of length nX (or |
glm.family |
If a GLM is to be fitted, the error distribution must be supplied (to the standard family argument to glm). |
glm.offset |
An (optional) offset can be supplied if fitting a GLM. (Not currently implemented.) |
binomial.n |
If fitting a binomial GLM, the number of trials per sample. Must be either a scalar (in which case the same number of trials are used for each sample) or a vector of length nsamples. (Default is 1) |
filename |
Character string providing the root for the output files. Intermediate files are saved as "filenameX.RData" where X is an incremental count from 1 to length(nsamples). The final output is in "filename.RData". |
If keep.dredge==FALSE
(the default), the output is a list of length equal to the length
of nsamples, each containing two matrices, reg.bias
(prediction bias for each
sample) and reg.rmse
(root mean square error of prediction for each sample). Each
of these two matrices has length nsims
and four columns, corresponding to model
selection by AICc, AIC, BIC and stepwise regression. If keep.dredge==TRUE
, then the output
is a list of lists, with a top level list with length equal to the length of nsamples as before, and
with the next level having length equal to nsims
; this inner list contains the full model set
output from dredge
, converted to a matrix for storage efficiency.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.