Nothing
#' Creates a dataframe with all the design data for a particular parameter in a
#' crm model
#'
#' Creates a dataframe with all the design data for a particular parameter in a
#' crm model which currently includes "cjs" or "js". These design data are
#' fundamentally different than the design data created for \code{mark} models
#' as explained below.
#'
#' This function is intended to be called from \code{\link{make.design.data}}.
#' It takes the data in \code{x} and creates a dataframe with all of the data
#' needed to fit capture-recapture models (crm) which currently includes "cjs"
#' (Cormack-Jolly-Seber) or "js" (POPAN formulation of the Jolly-Seber model).
#' Before jumping into the details it is useful to have an understanding of the
#' differences between MARK (via the \code{mark in RMark} function) and the
#' package \code{mra} written by Trent McDonald and how they relate to the
#' implementation in \code{\link{cjs_admb}}. With MARK, animals can be placed in
#' groups and the parameters for the model specified via PIMs (parameter
#' index matrices) link the parameters to the specific animals. For
#' example, if for a particular group the \code{Phi} PIM is
#'
#' \preformatted{
#' 1 2 3
#' 4 5
#' 6
#' }
#'
#' Then animals in that group that were first caught/released on occasion 1
#' have the parameters 1,2,3 for the 3 occasions. Those first caught/released
#' on occasion 2 have parameters 4 and 5 for occasions 2 and 3 and those first
#' caught/released on occasion 3 have parameter 6 for occasion 3. Another
#' group of animals would have a different set of indices for the same set of
#' parameters so they could be discriminated. Many people find this rather
#' confusing until they get used to it but even then if you have many different
#' groups and many occasions, then the indexing process is prone to error.
#' Thus, the rationale for RMark which automates the PIM construction and its
#' use is largely transparent to the user. What RMark does is to create a data
#' structure called design data that automatically assigns \code{design data}
#' to the indices in the PIM. For example, 1 to 6 would be given the data used
#' to create that group and 1 to 3 would be assigned to cohort 1, ... and 1
#' would be assigned to occasion 1 and 2 and 4 would be assigned to occasion 2
#' etc. It also creates an age field which follows the diagonals and can be
#' initialized with the intial age at the time of first capture which is group
#' specific. With a formula and these design data a design matrix can be
#' constructed for the model where the row in the design matrix is the
#' parameter index (e.g., the first 6 rows would be for parameters 1 to 6 as
#' shown above). That would be all good except for individual covariates which
#' are not group-specific. MARK handles individual covariates by specifying
#' the covariate name (eg "weight") as a string in the design matrix. Then for
#' each capture history in plugs in the actual covariate values for that animal
#' to complete the design matrix for that animal. For more details see Laake
#' and Rexstad (2008).
#'
#' From a brief look at package \code{mra} and personal communication with
#' Trent McDonald, I give the following brief and possibly incorrect
#' description of the pacakge \code{mra} at the time of writing (28 Aug 2008).
#' In that package, the whole concept of PIMS is abandoned and instead
#' covariates are constructed for each occasion for each animal. Thus, each
#' animal is effectively put in its own group and it has a parameter for each
#' occasion. This novel approach is quite effective at blurring the lines
#' between design data and individual covariate data and it removes the needs
#' for PIMS because each animal (or unique capture history) has a real
#' parameter for each occasion. The downside of the pacakge \code{mra} is that
#' every covariate is assumed to be time-varying and any factor variables like
#' \code{time} are coded manually as dummy variables for each level rather than
#' using the R facilities for handling factor variables in the formula to
#' create the design matrix.
#'
#' In the \code{crm,cjs_admb,js} functions in this package I have used basic idea in
#' \code{mra} but I have taken a different approach to model development that
#' allows for time-varying covariates but does not restrict each covariate to
#' be time-varying and factor variables are used as such which removes the need
#' to construct dummy variables; although the latter could still be used. First
#' an example is in order to explain how this all works. Consider the follwing
#' set of capture histories for small capture-recapture data set with 4 capture
#' occasions:
#'
#' \preformatted{ 1001 0111 0011 }
#'
#' To relate the current structure to the concept of PIMS I define the
#' following matrix
#'
#' \preformatted{
#'
#' 1 2 3
#' 4 5 6
#' 7 8 9
#' }
#'
#' If you think of these as \code{Phi} parameter indices, then 1 to 3 are
#' survivals for the intervals 1-2,2-3,3-4 for animal 1, and 4-6 and 7-9 are
#' the same for animals 2 and 3. This matrix would have a row for each animal.
#' Now you'll notice that for animal 2 parameter 4 is not needed and for animal
#' 3, parameters 7 and 8 are not needed because they are prior to their entry
#' in the study. While that is certainly true there is no harm in having them
#' and the advantage comes in being able to have a complete matrix in R rather
#' than having a triangular one.
#'
#' So now we are finally getting to the description of what this function does.
#' It constructs a dataframe with a row for each animal-occasion. Following on
#' with the example above, depending on how the arguments are set the following
#' dataframe could be constructed:
#'
#' \preformatted{
#' row time Time cohort Cohort age Age initial.age
#' 1 1 0 1 0 0 0 0
#' 2 2 1 1 0 1 1 0
#' 3 3 2 1 0 2 2 0
#' 4 1 0 2 1 0 0 0
#' 5 2 1 2 1 1 1 0
#' 6 3 2 2 1 2 2 0
#' 7 1 0 3 2 0 0 0
#' 8 2 1 3 2 1 1 0
#' 9 3 2 3 2 2 2 0
#' }
#'
#' The fields starting with a lowercase character (time,cohort,age) are created
#' as factor variables and those with an uppercase are created as numeric
#' variables. Note: the \code{age} field is bounded below by the minimum
#' \code{initial.age} to avoid creating factor levels with non-existent data
#' for occasions prior to first capture that are not used. For example, an
#' animal first caught on occasion 2 with an \code{initial.age=0} is
#' technically -1 on occasion 1 with a \code{time.interval} of 1. However, that
#' parameter would never be used in the model and we don't want a factor level
#' of -1.
#'
#' A formula of ~time would create a design matrix with 3 columns (one for each
#' factor level) and ~Time would create one with 2 columns with the first being
#' an intercept and the second with the numeric value of Time.
#'
#' Now here is the simplicity of it. The following few expressions in R will
#' convert this dataframe into a matrix of real parameters (assuming
#' \code{beta=c(1,1,1)} that are structured like the square PIM matrix without
#' the use of PIMs.
#'
#' \preformatted{
#' nocc=4
#' x=data.frame(ch=c("1001","0111","0011"),stringsAsFactors=FALSE)
#' beta=c(1,1,1)
#' x.proc=process.data(x,model="cjs")
#' Phi.dmdf=make.design.data(x.proc)$Phi
#' Phi.dm=create.dm(Phi.dmdf,~time)
#' Phimat=matrix(plogis(Phi.dm%*%beta),nrow=nrow(x),ncol=nocc-1,byrow=TRUE)
#' }
#'
#' Note that the order of the columns for \code{Phi.dmdf} differs slightly from
#' what is shown above. Also, \code{plogis} is an R function that computes the
#' inverse-logit. Once you have the matrix of \code{Phi} and \code{p} values
#' the calculation of the likelihood is straightforward using the formulation
#' of Pledger et al. (2003) (see \code{\link{cjs.lnl}}). The values in the
#' design dataframe are not limited to these fields. The 2 arguments
#' \code{time.varying} and \code{fields} are vectors of character strings which
#' specify the names of the dataframe columns in \code{x} that should be
#' included. For example if \code{x} contained a field \code{sex} with the
#' values "M","F","M" for the 3 records in our example, and the argument
#' \code{fields=c("sex")} was used then a column named \code{sex} would be
#' included in design dataframe with the values
#' "M","M","M","F","F","F","M","M","M". The value of the column \code{sex} in
#' \code{x} is repeated for each of the occasions for that animal(capture
#' history). Now if the value of the field changes for each occasion then we
#' use the argument \code{time.varying} instead. To differentiate the values
#' in the dataframe \code{x} the columns are named with an occasion number.
#' For example, if the variable was named \code{cov} and it was to be used for
#' \code{Phi}, then the variables would be named \code{cov1,cov2,cov3} in
#' \code{x}. Let's say that x was structured as follows:
#'
#' \preformatted{
#' ch cov1 cov2 cov3
#' 1001 1 0 1
#' 0111 0 2 1
#' 0011 0 0 0
#' }
#'
#' If you specified the argument \code{time.varying=c("cov")} then in the
#' design dataframe a field named \code{cov} would be created and the values
#' would be \code{1,0,1,0,2,1,0,0,0}. Thus the value is both animal and
#' occasion specific. Had the covariate been used for \code{p} then they would
#' be named \code{cov2,cov3,cov4} because the covariate is for those occasions
#' for \code{p} whereas for \code{Phi} the covariate is labelled with the
#' occasion that begins the interval. Any number of fields can be specified in \code{fields} and
#' \code{time.varying} that are specified in \code{x}.
#'
#' The input dataframe \code{x} has a few minor requirements on its structure.
#' First, it must contain a field called \code{ch} which contains the
#' capture-history as a string. Note that in general strings are converted to
#' factor variables by default when they are put into a dataframe but as shown
#' above that can be controlled by the argument \code{stringsAsFactors=FALSE}.
#' The capture history should be composed only of 0 or 1 and they should all be
#' the same length (at present no error checking on this). Although it is not
#' necessary, the dataframe can contain a field named \code{freq} which
#' specifies the frequency of that capture history. If the value of
#' \code{freq} is negative then these are treated as loss on capture at the
#' final capture occasion (last 1). If \code{freq} is missing then a value of
#' 1 is assumed for all records. Another optional field is \code{initial.age}
#' which specifies the age of the animal at the time it was first captured.
#' This field is used to construct the \code{age} and \code{Age} fields in the
#' design dataframe. The default is to assume \code{initial.age=0} which means
#' the \code{age} is really time since first marked. Any other fields in
#' \code{x} are user-specified and can be a combination of factor and numeric
#' variables that are either time-invariant or time-varying (and named
#' appropriately).
#'
#' The behavior of create.dmdf can vary depending on the values of begin.time and
#' time.intervals. An explanation of these values and how they can be used is given
#' in \code{\link{process.data}}.
#'
#' @param x processed dataframe from function \code{\link{process.data}}
#' @param parameter list with fields defining each values for each parameter;
#' as created by \code{\link{setup.parameters}}
#' @param time.varying vector of field names that are time-varying for this
#' parameter
#' @param fields character vector containing field names for variables in x to
#' be included in design matrix dataframe; if NULL all other than ch are
#' included
#' @return A dataframe with all of the individual covariate data and the
#' standard design data of time, Time, cohort, Cohort, age and Age; where
#' lowercase first letter implies a factor and uppercase is a numeric variable
#' for those variables.
#' @author Jeff Laake
#' @references Laake, J. and E. Rexstad (2007). RMark -- an alternative
#' approach to building linear models in MARK. Program MARK: A Gentle
#' Introduction. E. Cooch and G. C. White.
#'
#' Pledger, S., K. H. Pollock, et al. (2003). Open capture-recapture models
#' with heterogeneity: I. Cormack-Jolly-Seber model. Biometrics 59(4):786-794.
create.dmdf=function(x,parameter,time.varying=NULL,fields=NULL)
{
last= -parameter$num
begin.num=parameter$begin+1
chp = process.ch(x$data$ch)
firstseen=chp$first
nocc=x$nocc
time.intervals=x$time.intervals
if(is.null(x$data$begin.time))
begin.time=x$begin.time
else
begin.time=x$data$begin.time
# create base df
df=create.base.dmdf(x,parameter)
if(is.null(df))return(NULL)
# add time,age,choort
if(length(begin.time)==1 & is.vector(time.intervals))
{
timedf=data.frame(occ=1:nocc,time=begin.time+c(0,cumsum(time.intervals)))
if(parameter$interval)
if(!all(time.intervals==1))
timedf$time.interval=c(time.intervals,NA)
df=merge(df,timedf,by="occ")
timedf=data.frame(id=x$data$id,cohort=timedf$time[firstseen])
df=merge(df,timedf,by="id")
}else
{
if(length(begin.time)==1)
begin.time=rep(begin.time,nrow(x$data))
if(is.vector(time.intervals))
time.intervals=matrix(time.intervals,nrow=nrow(x$data),ncol=length(time.intervals),byrow=TRUE)
cum.time.intervals=t(apply(time.intervals,1,cumsum))
times=begin.time+cbind(rep(0,nrow(x$data)),cum.time.intervals)
cohort=rep(times[cbind(1:nrow(x$data),firstseen)],each=ncol(times))
if(!parameter$interval | all(time.intervals==1))
timedf=data.frame(occid=apply(expand.grid(occ=1:nocc,id=factor(1:nrow(x$data))),1,paste,collapse=""),
cohort=cohort,time=as.vector(t(times)))
else
timedf=data.frame(occid=apply(expand.grid(occ=1:nocc,id=factor(1:nrow(x$data))),1,paste,collapse=""),
cohort=cohort,time=as.vector(t(times)),time.interval=as.vector(t(cbind(time.intervals,rep(NA,nrow(x$data))))))
df$occid=paste(df$occ,df$id,sep="")
df=merge(df,timedf,by="occid")
df$idocc=NULL
}
if(any(time.intervals==0))
{
firstocc=data.frame(id=factor(1:nrow(x$data)),firstocc=firstseen)
df=merge(df,firstocc,by="id")
}
df=df[order(df$seq),]
df$age=df$time-df$cohort
if(!is.null(x$data$initial.age)) df$age=df$age+x$data$initial.age[df$id]
# if there are any time varying covariates then construct the data matrix for
# those covariates. That matrix is appended to other data in x that is non-time varying
times=df$time[df$id==1]
# occ=1:(nocc+parameter$num)
tcv=NULL
if(!is.null(time.varying))
{
for (i in 1:length(time.varying))
{
vnames=paste(time.varying[i],times,sep="")
if( !all(vnames %in% names(x$data)))
stop("Missing time varying variable ",paste(vnames[!vnames%in%names(x$data)],collapse=","))
if(i==1)
tcv=data.frame(as.vector(t(as.matrix(x$data[, vnames]))))
else
tcv=cbind(tcv,as.vector(t(as.matrix(x$data[, vnames]))))
x$data=x$data[,!names(x$data)%in%vnames]
}
names(tcv)=time.varying
}
if(!is.null(tcv))
df=cbind(df,tcv)
# add static fields
if(is.null(fields))
fields=names(x$data)[!names(x$data)%in%c("ch","initial.age")]
else
fields=c(fields,"id")
df=merge(df,x$data[,fields],by="id")
df=df[order(df$seq),]
df$seq=NULL
df$Time=df$time-min(df$time)
df$Cohort=df$cohort-min(df$cohort)
# df$Age=df$age-max(0,min(df$age))
df$age[df$age<0]=0
df$time=factor(df$time)
df$Age=df$age
df$age=factor(df$age)
df$cohort=factor(df$cohort)
if("group"%in%names(df))
levels(df$group)=apply(x$group.covariates,1,paste,collapse="")
if(!is.null(x$strata_data)&!is.null(df$stratum))
df=cbind(df,x$strata_data)
return(df)
}
create.base.dmdf=function(x,parameter)
{
last= -parameter$num
begin.num=parameter$begin+1
nocc=x$nocc + parameter$num
occasions=begin.num:(parameter$begin+nocc)
sl=factor(x$strata.labels)
nstrata=length(sl)
# MVMS model is currently split off but eventually this should be merged in with
# other bi-level MS models
if(nchar(x$model)>=4 & substr(x$model,1,4)=="MVMS")
{
if(!is.null(parameter$obs) & parameter$obs)
dfl=mvms_design_data(x$strata.list$df.states,x$strata.list$df,transition=parameter$tostrata)
else
dfl=mvms_design_data(x$strata.list$df.states,transition=parameter$tostrata)
df=expand.grid(occ=occasions,id=factor(1:nrow(x$data)))
dfl=dfl[rep(1:nrow(dfl),each=nrow(df)),,drop=FALSE]
df=cbind(df,dfl)
if(parameter$tostrata)
df=df[order(df$id,df$occ,df$stratum,df$tostratum),]
else
df=df[order(df$id,df$occ,df$stratum),]
df$seq=1:nrow(df)
rownames(df)=df$seq
} else
{
if(!is.null(x$strata.list))
{
st.index=which("states"==names(x$strata.list))
oth.index=which("states"!=names(x$strata.list))
oth.name=names(x$strata.list)[oth.index]
oth=factor(rep(x$strata.list[[oth.index]],each=length(x$strata.list$states)))
states=factor(rep(x$strata.list$states,times=length(x$strata.list[[oth.index]])))
# For 2-level independent strata model
if(parameter$whichlevel!=0)
{
if(parameter$whichlevel==1)
sl=factor(x$strata.list$states)
else
sl=factor(x$strata.list[[oth.index]])
nstrata=length(sl)
x$strata.list=NULL
}
}
# not by stratum
if(!parameter$bystratum)
{
df=expand.grid(occ=occasions,id=factor(1:nrow(x$data)))
df$seq=1:nrow(df)
}
else
{
# from stratum - to stratum transition
if(parameter$tostrata)
{
df=expand.grid(tostratum=sl[1:nstrata],stratum=sl[1:nstrata],occ=occasions,id=factor(1:nrow(x$data)))
df=df[,c("stratum","tostratum","occ","id")]
df$seq=1:nrow(df)
}
# by stratum
else
{
df=expand.grid(stratum=sl[1:nstrata],occ=occasions,id=factor(1:nrow(x$data)))
df$seq=1:nrow(df)
}
# two-level stratification
if(!is.null(x$strata.list))
{
state.df=data.frame(stratum=sl[1:nstrata],state=states,oth=oth)
df=merge(df,state.df,by="stratum")
if(parameter$tostrata)
{
state.df=data.frame(tostratum=sl[1:nstrata],tostate=states,tooth=oth)
df=merge(df,state.df,by="tostratum")
}
if(parameter$whichlevel!=0)
{
if(parameter$whichlevel==1){
names(df)[names(df)=="stratum"]="state"
names(df)[names(df)=="tostratum"]="tostate"
}
if(parameter$whichlevel==2) {
names(df)[names(df)=="stratum"]=oth.name
names(df)[names(df)=="tostratum"]=paste("to",oth.name,sep="")
}
}else
{
names(df)[names(df)=="oth"]=oth.name
names(df)[names(df)=="tooth"]=paste("to",oth.name,sep="")
}
}else
{
if(parameter$whichlevel!=0)
{
if(parameter$whichlevel==1)names(df)[names(df)=="stratum"]="state"
if(parameter$whichlevel==2) names(df)[names(df)=="stratum"]=oth.name
if(parameter$whichlevel==1)names(df)[names(df)=="tostratum"]="tostate"
if(parameter$whichlevel==2) names(df)[names(df)=="tostratum"]=paste("to",oth.name,sep="")
}
}
}
}
return(df)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.