simData | R Documentation |
Simulates data from a (multivariate) hidden Markov model. Movement data are assumed to be in Cartesian coordinates (not longitude/latitude) and can be generated with or without observation error attributable to temporal irregularity or location measurement error.
simData( nbAnimals = 1, nbStates = 2, dist, Par, beta = NULL, delta = NULL, formula = ~1, formulaDelta = NULL, mixtures = 1, formulaPi = NULL, covs = NULL, nbCovs = 0, spatialCovs = NULL, zeroInflation = NULL, oneInflation = NULL, circularAngleMean = NULL, centers = NULL, centroids = NULL, angleCovs = NULL, obsPerAnimal = c(500, 1500), initialPosition = c(0, 0), DM = NULL, userBounds = NULL, workBounds = NULL, betaRef = NULL, mvnCoords = NULL, stateNames = NULL, model = NULL, states = FALSE, retrySims = 0, lambda = NULL, errorEllipse = NULL, ncores = 1 ) simHierData( nbAnimals = 1, hierStates, hierDist, Par, hierBeta = NULL, hierDelta = NULL, hierFormula = NULL, hierFormulaDelta = NULL, mixtures = 1, formulaPi = NULL, covs = NULL, nbHierCovs = NULL, spatialCovs = NULL, zeroInflation = NULL, oneInflation = NULL, circularAngleMean = NULL, centers = NULL, centroids = NULL, angleCovs = NULL, obsPerLevel, initialPosition = c(0, 0), DM = NULL, userBounds = NULL, workBounds = NULL, mvnCoords = NULL, model = NULL, states = FALSE, retrySims = 0, lambda = NULL, errorEllipse = NULL, ncores = 1 )
nbAnimals |
Number of observed individuals to simulate. |
nbStates |
Number of behavioural states to simulate. |
dist |
A named list indicating the probability distributions of the data streams. Currently
supported distributions are 'bern', 'beta', 'cat', 'exp', 'gamma', 'lnorm', 'logis', 'negbinom', 'norm', 'mvnorm2' (bivariate normal distribution), 'mvnorm3' (trivariate normal distribution),
'pois', 'rw_norm' (normal random walk), 'rw_mvnorm2' (bivariate normal random walk), 'rw_mvnorm3' (trivariate normal random walk), 'vm', 'vmConsensus', 'weibull', and 'wrpcauchy'. For example,
|
Par |
A named list containing vectors of initial state-dependent probability distribution parameters for
each data stream specified in If |
beta |
Matrix of regression parameters for the transition probabilities (more information in "Details"). |
delta |
Initial value for the initial distribution of the HMM. Default: |
formula |
Regression formula for the transition probability covariates. Default: |
formulaDelta |
Regression formula for the initial distribution. Default: |
mixtures |
Number of mixtures for the state transition probabilities (i.e. discrete random effects *sensu* DeRuiter et al. 2017). Default: |
formulaPi |
Regression formula for the mixture distribution probabilities. Default: |
covs |
Covariate values to include in the simulated data, as a dataframe. The names of any covariates specified by |
nbCovs |
Number of covariates to simulate (0 by default). Does not need to be specified if
|
spatialCovs |
List of |
zeroInflation |
A named list of logicals indicating whether the probability distributions of the data streams should be zero-inflated. If |
oneInflation |
A named list of logicals indicating whether the probability distributions of the data streams should be one-inflated. If |
circularAngleMean |
An optional named list indicating whether to use circular-linear (FALSE) or circular-circular (TRUE)
regression on the mean of circular distributions ('vm' and 'wrpcauchy') for turning angles. For example,
Alternatively, |
centers |
2-column matrix providing the x-coordinates (column 1) and y-coordinates (column 2) for any activity centers (e.g., potential
centers of attraction or repulsion) from which distance and angle covariates will be calculated based on the simulated location data. These distance and angle
covariates can be included in |
centroids |
List where each element is a data frame consisting of at least |
angleCovs |
Character vector indicating the names of any circular-circular regression angular covariates in |
obsPerAnimal |
Either the number of observations per animal (if single value) or the bounds of the number of observations per animal (if vector of two values). In the latter case,
the numbers of obervations generated for each animal are uniformously picked from this interval. Alternatively, |
initialPosition |
2-vector providing the x- and y-coordinates of the initial position for all animals. Alternatively, |
DM |
An optional named list indicating the design matrices to be used for the probability distribution parameters of each data
stream. Each element of Design matrices specified using formulas allow standard functions in R formulas
(e.g., |
userBounds |
An optional named list of 2-column matrices specifying bounds on the natural (i.e, real) scale of the probability
distribution parameters for each data stream. For example, for a 2-state model using the wrapped Cauchy ('wrpcauchy') distribution for
a data stream named 'angle' with |
workBounds |
An optional named list of 2-column matrices specifying bounds on the working scale of the probability distribution, transition probability, and initial distribution parameters. For each matrix, the first column pertains to the lower bound and the second column the upper bound.
For data streams, each element of |
betaRef |
Numeric vector of length |
mvnCoords |
Character string indicating the name of location data that are to be simulated using a multivariate normal distribution. For example, if |
stateNames |
Optional character vector of length nbStates indicating state names. |
model |
A |
states |
|
retrySims |
Number of times to attempt to simulate data within the spatial extent of |
lambda |
Observation rate for location data. If |
errorEllipse |
List providing the upper bound for the semi-major axis ( |
ncores |
Number of cores to use for parallel processing. Default: 1 (no parallel processing). |
hierStates |
A hierarchical model structure |
hierDist |
A hierarchical data structure |
hierBeta |
A hierarchical data structure |
hierDelta |
A hierarchical data structure |
hierFormula |
A hierarchical formula structure for the transition probability covariates for each level of the hierarchy ('formula'). Default: |
hierFormulaDelta |
A hierarchical formula structure for the initial distribution covariates for each level of the hierarchy ('formulaDelta'). Default: |
nbHierCovs |
A hierarchical data structure |
obsPerLevel |
A hierarchical data structure |
simHierData
is very similar to simData
except that instead of simply specifying the number of states (nbStates
), distributions (dist
), observations (obsPerAnimal
), covariates (nbCovs
), and a single t.p.m. formula (formula
), the hierStates
argument specifies the hierarchical nature of the states,
the hierDist
argument specifies the hierarchical nature of the data streams, the obsPerLevel
argument specifies the number of observations for each level of the hierarchy, the nbHierCovs
argument specifies the number of covariates for each level of the hierarchy, and the hierFormula
argument specifies a t.p.m. formula for each level of the hierarchy.
All of the hierarhcial arguments in simHierData
are specified as Node
objects from the data.tree
package.
x- and y-coordinate location data are generated only if valid 'step' and 'angle' data streams are specified. Vaild distributions for 'step' include 'gamma', 'weibull', 'exp', and 'lnorm'. Valid distributions for 'angle' include 'vm' and 'wrpcauchy'. If only a valid 'step' data stream is specified, then only x-coordinates are generated.
If DM
is specified for a particular data stream, then the initial values are specified on
the working (i.e., beta) scale of the parameters. The working scale of each parameter is determined by the link function used.
The function getParDM
is intended to help with obtaining initial values on the working scale when specifying a design matrix and other
parameter constraints.
Simulated data that are temporally regular (i.e., lambda=NULL
) and without location measurement error (i.e., errorEllipse=NULL
) are returned
as a momentuHMMData
(or momentuHierHMMData
) object suitable for analysis using fitHMM
.
Simulated location data that are temporally-irregular (i.e., lambda>0
) and/or with location measurement error (i.e., errorEllipse!=NULL
) are returned
as a data frame suitable for analysis using crawlWrap
.
The matrix beta
of regression coefficients for the transition probabilities has
one row for the intercept, plus one row for each covariate, and one column for
each non-diagonal element of the transition probability matrix. For example, in a 3-state
HMM with 2 formula
covariates, the matrix beta
has three rows (intercept + two covariates)
and six columns (six non-diagonal elements in the 3x3 transition probability matrix -
filled in row-wise).
In a covariate-free model (default), beta
has one row, for the intercept.
State-specific formulas can be specified in DM
using special formula functions. These special functions can take
the names paste0("state",1:nbStates)
(where the integer indicates the state-specific formula). For example,
DM=list(step=list(mean=~cov1+state1(cov2),sd=~cov2+state2(cov1)))
includes cov1
on the mean parameter for all states, cov2
on the mean parameter for state 1, cov2
on the sd parameter for all states, and cov1
on the sd parameter for state 2.
State- and parameter-specific formulas can be specified for transition probabilities in formula
using special formula functions.
These special functions can take the names paste0("state",1:nbStates)
(where the integer indicates the current state from which transitions occur),
paste0("toState",1:nbStates)
(where the integer indicates the state to which transitions occur),
or paste0("betaCol",nbStates*(nbStates-1))
(where the integer indicates the column of the beta
matrix). For example with nbStates=3
,
formula=~cov1+betaCol1(cov2)+state3(cov3)+toState1(cov4)
includes cov1
on all transition probability parameters, cov2
on the beta
column corresponding
to the transition from state 1->2, cov3
on transition probabilities from state 3 (i.e., beta
columns corresponding to state transitions 3->1 and 3->2),
and cov4
on transition probabilities to state 1 (i.e., beta
columns corresponding to state transitions 2->1 and 3->1).
Cyclical relationships (e.g., hourly, monthly) may be simulated using the consinor(x,period)
special formula function for covariate x
and sine curve period of time length period
. For example, if
the data are hourly, a 24-hour cycle can be simulated using ~cosinor(cov1,24)
, where the covariate cov1
is a repeating series
of integers 0,1,...,23,0,1,...,23,0,1,...
(note that simData
will not do this for you, the appropriate covariate must be specified using the covs
argument; see example below).
The cosinor(x,period)
function converts x
to 2 covariates
cosinorCos(x)=cos(2*pi*x/period)
and consinorSin(x)=sin(2*pi*x/period
for inclusion in the model (i.e., 2 additional parameters per state). The amplitude of the sine wave
is thus sqrt(B_cos^2 + B_sin^2)
, where B_cos
and B_sin
are the working parameters correponding to cosinorCos(x)
and cosinorSin(x)
, respectively (e.g., see Cornelissen 2014).
When the circular-circular regression model is used, the special function angleFormula(cov,strength,by)
can be used in DM
for the mean of angular distributions (i.e. 'vm', 'vmConsensus', and 'wrpcauchy'),
where cov
is an angle covariate (e.g. wind direction), strength
is a positive real covariate (e.g. wind speed), and by
is an optional factor variable for individual- or group-level effects (e.g. ID, sex). This allows angle covariates to be weighted based on their strength or importance at time step t as in
Rivest et al. (2016).
If the length of covariate values passed (either through 'covs', or 'model') is not the same
as the number of observations suggested by 'nbAnimals' and 'obsPerAnimal' (or 'obsPerLevel' for simHierData
), then the series of
covariates is either shortened (removing last values - if too long) or extended (starting
over from the first values - if too short).
For simData
, when covariates are not included in formulaDelta
(i.e. formulaDelta=NULL
), then delta
is specified as a vector of length nbStates
that
sums to 1. When covariates are included in formulaDelta
, then delta
must be specified
as a k x (nbStates
-1) matrix of working parameters, where k is the number of regression coefficients and the columns correspond to states 2:nbStates
. For example, in a 3-state
HMM with formulaDelta=~cov1+cov2
, the matrix delta
has three rows (intercept + two covariates)
and 2 columns (corresponding to states 2 and 3). The initial distribution working parameters are transformed to the real scale as exp(covsDelta*Delta)/rowSums(exp(covsDelta*Delta))
, where covsDelta
is the N x k design matrix, Delta=cbind(rep(0,k),delta)
is a k x nbStates
matrix of working parameters,
and N=length(unique(data$ID))
.
For simHierData
, delta
must be specified
as a k x (nbStates
-1) matrix of working parameters, where k is the number of regression coefficients and the columns correspond to states 2:nbStates
.
If the simulated data are temporally regular (i.e., lambda=NULL
) with no measurement error (i.e., errorEllipse=NULL
), an object momentuHMMData
(or momentuHierHMMData
),
i.e., a dataframe of:
ID |
The ID(s) of the observed animal(s) |
... |
Data streams as specified by |
x |
Either easting or longitude (if data streams include valid non-negative distribution for 'step') |
y |
Either norting or latitude (if data streams include valid non-negative distribution for 'step') |
... |
Covariates (if any) |
If simulated location data are temporally irregular (i.e., lambda>0
) and/or include measurement error (i.e., errorEllipse!=NULL
), a dataframe of:
time |
Numeric time of each observed (and missing) observation |
ID |
The ID(s) of the observed animal(s) |
x |
Either easting or longitude observed location |
y |
Either norting or latitude observed location |
... |
Data streams that are not derived from location (if applicable) |
... |
Covariates at temporally-regular true ( |
mux |
Either easting or longitude true location |
muy |
Either norting or latitude true location |
error_semimajor_axis |
error ellipse semi-major axis (if applicable) |
error_semiminor_axis |
error ellipse semi-minor axis (if applicable) |
error_ellipse_orientation |
error ellipse orientation (if applicable) |
ln.sd.x |
log of the square root of the x-variance of bivariate normal error (if applicable; required for error ellipse models in |
ln.sd.y |
log of the square root of the y-variance of bivariate normal error (if applicable; required for error ellipse models in |
error.corr |
correlation term of bivariate normal error (if applicable; required for error ellipse models in |
Cornelissen, G. 2014. Cosinor-based rhythmometry. Theoretical Biology and Medical Modelling 11:16.
McClintock BT, London JM, Cameron MF, Boveng PL. 2015. Modelling animal movement using the Argos satellite telemetry location error ellipse. Methods in Ecology and Evolution 6(3):266-277.
Rivest, LP, Duchesne, T, Nicosia, A, Fortin, D, 2016. A general angular regression model for the analysis of data on animal movement in ecology. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65(3):445-463.
Leos-Barajas, V., Gangloff, E.J., Adam, T., Langrock, R., van Beest, F.M., Nabe-Nielsen, J. and Morales, J.M. 2017. Multi-scale modeling of animal movement and general behavior data using hidden Markov models with hierarchical structures. Journal of Agricultural, Biological and Environmental Statistics, 22 (3), 232-248.
prepData
, simObsData
# 1. Pass a fitted model to simulate from # (m is a momentuHMM object - as returned by fitHMM - automatically loaded with the package) # We keep the default nbAnimals=1. m <- example$m obsPerAnimal=c(50,100) data <- simData(model=m,obsPerAnimal=obsPerAnimal) ## Not run: # 2. Pass the parameters of the model to simulate from stepPar <- c(1,10,1,5,0.2,0.3) # mean_1, mean_2, sd_1, sd_2, zeromass_1, zeromass_2 anglePar <- c(pi,0,0.5,2) # mean_1, mean_2, concentration_1, concentration_2 omegaPar <- c(1,10,10,1) # shape1_1, shape1_2, shape2_1, shape2_2 stepDist <- "gamma" angleDist <- "vm" omegaDist <- "beta" data <- simData(nbAnimals=4,nbStates=2,dist=list(step=stepDist,angle=angleDist,omega=omegaDist), Par=list(step=stepPar,angle=anglePar,omega=omegaPar),nbCovs=2, zeroInflation=list(step=TRUE), obsPerAnimal=obsPerAnimal) # 3. Include covariates # (note that it is useless to specify "nbCovs", which are overruled # by the number of columns of "cov") cov <- data.frame(temp=log(rnorm(500,20,5))) stepPar <- c(log(10),0.1,log(100),-0.1,log(5),log(25)) # working scale parameters for step DM anglePar <- c(pi,0,0.5,2) # mean_1, mean_2, concentration_1, concentration_2 stepDist <- "gamma" angleDist <- "vm" data <- simData(nbAnimals=2,nbStates=2,dist=list(step=stepDist,angle=angleDist), Par=list(step=stepPar,angle=anglePar), DM=list(step=list(mean=~temp,sd=~1)), covs=cov, obsPerAnimal=obsPerAnimal) # 4. Include example 'forest' spatial covariate raster layer # nbAnimals and obsPerAnimal kept small to reduce example run time spatialCov<-list(forest=forest) data <- simData(nbAnimals=1,nbStates=2,dist=list(step=stepDist,angle=angleDist), Par=list(step=c(100,1000,50,100),angle=c(0,0,0.1,5)), beta=matrix(c(5,-10,-25,50),nrow=2,ncol=2,byrow=TRUE), formula=~forest,spatialCovs=spatialCov, obsPerAnimal=250,states=TRUE, retrySims=100) # 5. Specify design matrix for 'omega' data stream # natural scale parameters for step and angle stepPar <- c(1,10,1,5) # shape_1, shape_2, scale_1, scale_2 anglePar <- c(pi,0,0.5,0.7) # mean_1, mean_2, concentration_1, concentration_2 # working scale parameters for omega DM omegaPar <- c(log(1),0.1,log(10),-0.1,log(10),-0.1,log(1),0.1) stepDist <- "weibull" angleDist <- "wrpcauchy" omegaDist <- "beta" data <- simData(nbStates=2,dist=list(step=stepDist,angle=angleDist,omega=omegaDist), Par=list(step=stepPar,angle=anglePar,omega=omegaPar),nbCovs=2, DM=list(omega=list(shape1=~cov1,shape2=~cov2)), obsPerAnimal=obsPerAnimal,states=TRUE) # 6. Include temporal irregularity and location measurement error lambda <- 2 # expect 2 observations per time step errorEllipse <- list(M=50,m=25,r=180) obsData <- simData(model=m,obsPerAnimal=obsPerAnimal, lambda=lambda, errorEllipse=errorEllipse) # 7. Cosinor and state-dependent formulas nbStates<-2 dist<-list(step="gamma") Par<-list(step=c(100,1000,50,100)) # include 24-hour cycle on all transition probabilities # include 12-hour cycle on transitions from state 2 formula=~cosinor(hour24,24)+state2(cosinor(hour12,12)) # specify appropriate covariates covs<-data.frame(hour24=0:23,hour12=0:11) beta<-matrix(c(-1.5,1,1,NA,NA,-1.5,-1,-1,1,1),5,2) # row names for beta not required but can be helpful rownames(beta)<-c("(Intercept)", "cosinorCos(hour24, 24)", "cosinorSin(hour24, 24)", "cosinorCos(hour12, 12)", "cosinorSin(hour12, 12)") data.cos<-simData(nbStates=nbStates,dist=dist,Par=Par, beta=beta,formula=formula,covs=covs) # 8. Piecewise constant B-spline on step length mean and angle concentration nObs <- 1000 # length of simulated track cov <- data.frame(time=1:nObs) # time covariate for splines dist <- list(step="gamma",angle="vm") stepDM <- list(mean=~splines2::bSpline(time,df=2,degree=0),sd=~1) angleDM <- list(mean=~1,concentration=~splines2::bSpline(time,df=2,degree=0)) DM <- list(step=stepDM,angle=angleDM) Par <- list(step=c(log(1000),1,-1,log(100)),angle=c(0,log(10),2,-5)) data.spline<-simData(obsPerAnimal=nObs,nbStates=1,dist=dist,Par=Par,DM=DM,covs=cov) # 9. Initial state (delta) based on covariate nObs <- 100 dist <- list(step="gamma",angle="vm") Par <- list(step=c(100,1000,50,100),angle=c(0,0,0.01,0.75)) # create sex covariate cov <- data.frame(sex=factor(rep(c("F","M"),each=nObs))) # sex covariate formulaDelta <- ~ sex + 0 # Female begins in state 1, male begins in state 2 delta <- matrix(c(-100,100),2,1,dimnames=list(c("sexF","sexM"),"state 2")) data.delta<-simData(nbAnimals=2,obsPerAnimal=nObs,nbStates=2,dist=dist,Par=Par, delta=delta,formulaDelta=formulaDelta,covs=cov, beta=matrix(-1.5,1,2),states=TRUE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.