create.dmdf: Creates a dataframe with all the design data for a particular...

View source: R/create.dmdf.R

create.dmdfR Documentation

Creates a dataframe with all the design data for a particular parameter in a crm model

Description

Creates a dataframe with all the design data for a particular parameter in a crm model which currently includes "cjs" or "js". These design data are fundamentally different than the design data created for mark models as explained below.

Usage

create.dmdf(x, parameter, time.varying = NULL, fields = NULL)

Arguments

x

processed dataframe from function process.data

parameter

list with fields defining each values for each parameter; as created by setup.parameters

time.varying

vector of field names that are time-varying for this parameter

fields

character vector containing field names for variables in x to be included in design matrix dataframe; if NULL all other than ch are included

Details

This function is intended to be called from make.design.data. It takes the data in x and creates a dataframe with all of the data needed to fit capture-recapture models (crm) which currently includes "cjs" (Cormack-Jolly-Seber) or "js" (POPAN formulation of the Jolly-Seber model). Before jumping into the details it is useful to have an understanding of the differences between MARK (via the mark in RMark function) and the package mra written by Trent McDonald and how they relate to the implementation in cjs_admb. With MARK, animals can be placed in groups and the parameters for the model specified via PIMs (parameter index matrices) link the parameters to the specific animals. For example, if for a particular group the Phi PIM is

 
  1 2 3 
    4 5 
      6 

Then animals in that group that were first caught/released on occasion 1 have the parameters 1,2,3 for the 3 occasions. Those first caught/released on occasion 2 have parameters 4 and 5 for occasions 2 and 3 and those first caught/released on occasion 3 have parameter 6 for occasion 3. Another group of animals would have a different set of indices for the same set of parameters so they could be discriminated. Many people find this rather confusing until they get used to it but even then if you have many different groups and many occasions, then the indexing process is prone to error. Thus, the rationale for RMark which automates the PIM construction and its use is largely transparent to the user. What RMark does is to create a data structure called design data that automatically assigns design data to the indices in the PIM. For example, 1 to 6 would be given the data used to create that group and 1 to 3 would be assigned to cohort 1, ... and 1 would be assigned to occasion 1 and 2 and 4 would be assigned to occasion 2 etc. It also creates an age field which follows the diagonals and can be initialized with the intial age at the time of first capture which is group specific. With a formula and these design data a design matrix can be constructed for the model where the row in the design matrix is the parameter index (e.g., the first 6 rows would be for parameters 1 to 6 as shown above). That would be all good except for individual covariates which are not group-specific. MARK handles individual covariates by specifying the covariate name (eg "weight") as a string in the design matrix. Then for each capture history in plugs in the actual covariate values for that animal to complete the design matrix for that animal. For more details see Laake and Rexstad (2008).

From a brief look at package mra and personal communication with Trent McDonald, I give the following brief and possibly incorrect description of the pacakge mra at the time of writing (28 Aug 2008). In that package, the whole concept of PIMS is abandoned and instead covariates are constructed for each occasion for each animal. Thus, each animal is effectively put in its own group and it has a parameter for each occasion. This novel approach is quite effective at blurring the lines between design data and individual covariate data and it removes the needs for PIMS because each animal (or unique capture history) has a real parameter for each occasion. The downside of the pacakge mra is that every covariate is assumed to be time-varying and any factor variables like time are coded manually as dummy variables for each level rather than using the R facilities for handling factor variables in the formula to create the design matrix.

In the crm,cjs_admb,js functions in this package I have used basic idea in mra but I have taken a different approach to model development that allows for time-varying covariates but does not restrict each covariate to be time-varying and factor variables are used as such which removes the need to construct dummy variables; although the latter could still be used. First an example is in order to explain how this all works. Consider the follwing set of capture histories for small capture-recapture data set with 4 capture occasions:

 1001 0111 0011 

To relate the current structure to the concept of PIMS I define the following matrix


 1 2 3 
 4 5 6 
 7 8 9 

If you think of these as Phi parameter indices, then 1 to 3 are survivals for the intervals 1-2,2-3,3-4 for animal 1, and 4-6 and 7-9 are the same for animals 2 and 3. This matrix would have a row for each animal. Now you'll notice that for animal 2 parameter 4 is not needed and for animal 3, parameters 7 and 8 are not needed because they are prior to their entry in the study. While that is certainly true there is no harm in having them and the advantage comes in being able to have a complete matrix in R rather than having a triangular one.

So now we are finally getting to the description of what this function does. It constructs a dataframe with a row for each animal-occasion. Following on with the example above, depending on how the arguments are set the following dataframe could be constructed:

 
              row time Time cohort Cohort age Age initial.age 
                1    1    0     1     0     0     0    0
                2    2    1     1     0     1     1    0 
                3    3    2     1     0     2     2    0 
                4    1    0     2     1     0     0    0 
                5    2    1     2     1     1     1    0 
                6    3    2     2     1     2     2    0 
                7    1    0     3     2     0     0    0
                8    2    1     3     2     1     1    0 
                9    3    2     3     2     2     2    0 

The fields starting with a lowercase character (time,cohort,age) are created as factor variables and those with an uppercase are created as numeric variables. Note: the age field is bounded below by the minimum initial.age to avoid creating factor levels with non-existent data for occasions prior to first capture that are not used. For example, an animal first caught on occasion 2 with an initial.age=0 is technically -1 on occasion 1 with a time.interval of 1. However, that parameter would never be used in the model and we don't want a factor level of -1.

A formula of ~time would create a design matrix with 3 columns (one for each factor level) and ~Time would create one with 2 columns with the first being an intercept and the second with the numeric value of Time.

Now here is the simplicity of it. The following few expressions in R will convert this dataframe into a matrix of real parameters (assuming beta=c(1,1,1) that are structured like the square PIM matrix without the use of PIMs.

 
nocc=4
x=data.frame(ch=c("1001","0111","0011"),stringsAsFactors=FALSE)
beta=c(1,1,1) 
x.proc=process.data(x,model="cjs")
Phi.dmdf=make.design.data(x.proc)$Phi 
Phi.dm=create.dm(Phi.dmdf,~time)
Phimat=matrix(plogis(Phi.dm

Note that the order of the columns for Phi.dmdf differs slightly from what is shown above. Also, plogis is an R function that computes the inverse-logit. Once you have the matrix of Phi and p values the calculation of the likelihood is straightforward using the formulation of Pledger et al. (2003) (see cjs.lnl). The values in the design dataframe are not limited to these fields. The 2 arguments time.varying and fields are vectors of character strings which specify the names of the dataframe columns in x that should be included. For example if x contained a field sex with the values "M","F","M" for the 3 records in our example, and the argument fields=c("sex") was used then a column named sex would be included in design dataframe with the values "M","M","M","F","F","F","M","M","M". The value of the column sex in x is repeated for each of the occasions for that animal(capture history). Now if the value of the field changes for each occasion then we use the argument time.varying instead. To differentiate the values in the dataframe x the columns are named with an occasion number. For example, if the variable was named cov and it was to be used for Phi, then the variables would be named cov1,cov2,cov3 in x. Let's say that x was structured as follows:

 
ch   cov1 cov2 cov3 
1001   1   0     1 
0111   0   2     1 
0011   0   0     0 

If you specified the argument time.varying=c("cov") then in the design dataframe a field named cov would be created and the values would be 1,0,1,0,2,1,0,0,0. Thus the value is both animal and occasion specific. Had the covariate been used for p then they would be named cov2,cov3,cov4 because the covariate is for those occasions for p whereas for Phi the covariate is labelled with the occasion that begins the interval. Any number of fields can be specified in fields and time.varying that are specified in x.

The input dataframe x has a few minor requirements on its structure. First, it must contain a field called ch which contains the capture-history as a string. Note that in general strings are converted to factor variables by default when they are put into a dataframe but as shown above that can be controlled by the argument stringsAsFactors=FALSE. The capture history should be composed only of 0 or 1 and they should all be the same length (at present no error checking on this). Although it is not necessary, the dataframe can contain a field named freq which specifies the frequency of that capture history. If the value of freq is negative then these are treated as loss on capture at the final capture occasion (last 1). If freq is missing then a value of 1 is assumed for all records. Another optional field is initial.age which specifies the age of the animal at the time it was first captured. This field is used to construct the age and Age fields in the design dataframe. The default is to assume initial.age=0 which means the age is really time since first marked. Any other fields in x are user-specified and can be a combination of factor and numeric variables that are either time-invariant or time-varying (and named appropriately).

The behavior of create.dmdf can vary depending on the values of begin.time and time.intervals. An explanation of these values and how they can be used is given in process.data.

Value

A dataframe with all of the individual covariate data and the standard design data of time, Time, cohort, Cohort, age and Age; where lowercase first letter implies a factor and uppercase is a numeric variable for those variables.

Author(s)

Jeff Laake

References

Laake, J. and E. Rexstad (2007). RMark – an alternative approach to building linear models in MARK. Program MARK: A Gentle Introduction. E. Cooch and G. C. White.

Pledger, S., K. H. Pollock, et al. (2003). Open capture-recapture models with heterogeneity: I. Cormack-Jolly-Seber model. Biometrics 59(4):786-794.


marked documentation built on Oct. 19, 2023, 5:06 p.m.