simRCPdata | R Documentation |
Simulates a data set from a mixture-of-experts model for RCP (for region of common profile) types.
simRCPdata(nRCP=3, S=20, n=200, p.x=3, p.w=0, alpha=NULL, tau=NULL, beta=NULL, gamma=NULL, logDisps=NULL, powers=NULL, X=NULL, W=NULL, offset=NULL, dist="Bernoulli")
nRCP |
Integer giving the number of RCPs |
S |
Integer giving the number of species |
n |
Integer giving the number of observations (sites) |
p.x |
Integer giving the number of covariates (including the intercept) for the model for the latent RCP types |
p.w |
Integer giving the number of covariates (excluding the intercept) for the model for the species data |
alpha |
Numeric vector of length S. Specifies the mean prevalence for each species, on the logit scale |
tau |
Numeric matrix of dimension c(nRCP-1,S). Specifies each species difference from the mean to each RCPs mean for the first nRCP-1 RCPs. The last RCP means are calculated using the sum-to-zero constraints |
beta |
Numeric matrix of dimension c(nRCP-1,p.x). Specifies the RCP's dependence on the covariates (in X) |
gamma |
Numeric matrix of dimension c(n,p.w). Specifies the species' dependence on the covariates (in W) |
logDisps |
Logartihm of the (over-)dispersion parameters for each species for negative binomial, Tweedie and Normal models |
powers |
Power parameters for each species for Tweedie model |
X |
Numeric matrix of dimension c(n,p.x). Specifies the covariates for the RCP model. Must include the intercept, if one is wanted. Default is random numbers in a matrix of the right size. |
W |
Numeric matrix of dimension c(n,p.w). Specifies the covariates for the species model. Must not include the intercept. Unless you want it included twice. Default is to give random levels of a two-level factor. |
offset |
Numeric vector of size n. Specifies any offset to be included into the species level model. |
dist |
Text string. Specifies the distribution of the species data. Current options are "Bernoulli" (default), "Poisson", "NegBin", "Tweedie" and "Normal". |
A data frame that contains the outcomes (species data) and the covariates (environmental data and species-level covariates). This data.frame has a number of special attirbutes, which are information about the model underlying the data. They are:
RCPs |
the true, but unobserved, RCP types |
pis |
the true prior probabilities |
alpha |
the species overall prevalences, on linear predictor scale |
tau |
the deviation from alpha for each RCP type, on linear predictor scale |
beta |
the parameters controlling how the RCP types depend on the covariates |
gamma |
the parameters controlling how each species depends on the species-level covariates |
logDisps |
the logarithm of the dispersion parameter for each species |
mu |
the probabilities of each species occuring in each RCP type |
Scott D. Foster
Foster, S.D., Givens, G.H., Dornan, G.J., Dunstan, P.K. and Darnell, R. (2013) Modelling Regions of Common Profiles Using Biological and Environmental Data. Environmetrics.
#generates synthetic data set.seed( 151) n <- 100 S <- 10 nRCP <- 3 my.dist <- "NegBin" X <- as.data.frame( cbind( x1=runif( n, min=-10, max=10), x2=runif( n, min=-10, max=10))) Offy <- log( runif( n, min=30, max=60)) pols <- list() pols[[1]] <- poly( X$x1, degree=3) #important to scale covariates so that regimix can get half-way decent starting values pols[[2]] <- poly( X$x2, degree=3) X <- as.matrix( cbind( 1, X, pols[[1]], pols[[2]])) colnames( X) <- c("const", 'x1', 'x2', paste( "x1",1:3,sep='.'), paste( "x2",1:3,sep='.')) p.x <- ncol( X[,-(2:3)]) p.w <- 3 W <- matrix(sample( c(0,1), size=(n*p.w), replace=TRUE), nrow=n, ncol=p.w) colnames( W) <- paste( "w",1:3,sep=".") alpha <- rnorm( S) tau.var <- 0.5 b <- sqrt( tau.var/2) #a double exponential for RCP effects tau <- matrix( rexp( n=(nRCP-1)*S, rate=1/b) - rexp( n=(nRCP-1)*S, rate=1/b), nrow=nRCP-1, ncol=S) beta <- 0.2 * matrix( c(-1.2, -2.6, 0.2, -23.4, -16.7, -18.7, -59.2, -76.0, -14.2, -28.3, -36.8, -17.8, -92.9,-2.7), nrow=nRCP-1, ncol=p.x) gamma <- matrix( rnorm( S*p.w), ncol=p.w, nrow=S) logDisp <- log( rexp( S, 1)) set.seed(121) simDat <- simRCPdata( nRCP=nRCP, S=S, p.x=p.x, p.w=p.w, n=n, alpha=alpha, tau=tau, beta=beta, gamma=gamma, X=X[,-(2:3)], W=W, dist=my.dist, logDisp=logDisp, offset=Offy)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.