Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/make.mark.model.R

Creates a MARK model object that contains a MARK input file with PIMS and design matrix specific to the data and model structure and formulae specified for the model parameters.

1 2 3 4 5 6 7 | ```
make.mark.model(data, ddl, parameters = list(), title = "",
model.name = NULL, initial = NULL, call = NULL,
default.fixed = TRUE, options = NULL, profile.int = FALSE,
chat = NULL, simplify = TRUE, input.links = NULL,
parm.specific = FALSE, mlogit0 = FALSE, hessian = FALSE,
accumulate = TRUE, icvalues = NULL, wrap = TRUE, nodes = 101,
useddl = FALSE, check.model = FALSE)
``` |

`data` |
Data list resulting from function |

`ddl` |
Design data list from function |

`parameters` |
List of parameter formula specifications |

`title` |
Title for the analysis (optional) |

`model.name` |
Model name to override default name (optional) |

`initial` |
Vector of named or unnamed initial values for beta parameters or previously run model (optional) |

`call` |
Pass function call when this function is called from another
function (e.g. |

`default.fixed` |
if TRUE, real parameters for which the design data have been deleted are fixed to default values |

`options` |
character string of options for Proc Estimate statement in MARK .inp file |

`profile.int` |
if TRUE, requests MARK to compute profile intervals |

`chat` |
pre-specified value for chat used by MARK in calculations of model output |

`simplify` |
if FALSE, does not simplify PIM structure |

`input.links` |
specifies set of link functions for parameters with non-simplified structure |

`parm.specific` |
if TRUE, forces a link to be specified for each parameter |

`mlogit0` |
if TRUE, any real parameter that is fixed to 0 and has an mlogit link will have its link changed to logit so it can be simplified |

`hessian` |
if TRUE specifies to MARK to use hessian rather than second partial matrix |

`accumulate` |
if TRUE accumulate like data values into frequencies |

`icvalues` |
numeric vector of individual covariate values for computation of real values |

`wrap` |
if TRUE, data lines are wrapped to be length 80; if length of a row is not a problem set to FALSE and it will run faster |

`nodes` |
number of integration nodes for individual random effects (min 15, max 505, default 101) |

`useddl` |
If TRUE and there are no missing rows or parameters (deleted) then it will use ddl in place of full.ddl that is created internally. |

`check.model` |
if TRUE, code does an internal consistency check between PIMs and design data when making model. |

This function is called by `mark`

to create the model but it can
be called directly to create but not run the model. All of the arguments
have default values except for the first 2 that specify the processed data
list (`data`

) and the design data list (`ddl`

). If only these 2
arguments are specified default models are used for the parameters. For
example, following with the example from `process.data`

and
`make.design.data`

, the default model can be created with:

`mymodel=make.mark.model(proc.example.data,ddl)`

The call to `make.mark.model`

creates a model object but does not do
the analysis. The function returns a list that includes several fields
including a design matrix and the MARK input file that will be used with
`MARK.EXE`

to run the analysis from function
`run.mark.model`

. The following shows the names of the list
elements in mymodel:

1 2 3 4 | ```
names(mymodel) [1] "data" "model" "title" "model.name"
"links" [6] "mixtures" "call" "parameters" "input" "number.of.groups" [11]
"group.labels" "nocc" "begin.time" "covariates" "fixed" [16] "design.matrix"
"pims" "design.data" "strata.labels" "mlogit.list" [21] "simplify"
``` |

The list is defined to be a mark object which is done by assigning a class vector to the list. The classes for an R object can be viewed with the class function as shown below:

1 | ```
class(mymodel) [1] "mark" "CJS"
``` |

Each MARK model has 2 class
values. The first identifies it as a mark object and the second identifies
the type of mark analysis, which is the default "CJS" (recaptures only) in
this case. The use of the class feature has advantages in using generic
functions and identifying types of objects. An object of class `mark`

is defined in more detail in function `mark`

.

To fit non-trivial models it is necessary to understand the remaining
calling arguments of `make.mark.model`

and R formula notation. The
model formulae are specified with the calling argument `parameters`

.
It uses a similar list structure as the `parameters`

argument in
`make.design.data`

. It expects to get a list with elements named
to match the parameters in the particular analysis (e.g., Phi and p in CJS)
and each list element is a list, so it is a list of lists). For each
parameter, the possible list elements are ```
formula, link, fixed,
component, component.name, remove.intercept
```

. In addition, for closed
capture models and robust design model, the element `share`

is included
in the list for p (capture probabilities) and GammaDoublePrime
(respectively) to indicate whether the model is shared (share=TRUE) or
not-shared (the default) (share=FALSE) with c (recapture probabilities) and
GammaPrime respectively.

`formula`

specifies the model for the parameter using R formula
notation. An R formula is denoted with a `~`

followed by variables in
an equation format possibly using the `+`

, `*`

, and `:`

operators. For example, `~sex+age`

is an additive model with the main
effects of `sex`

and `age`

. Whereas, `~sex*age`

includes the
main effects and the interaction and it is equivalent to the formula
specified as `~sex+age+sex:age`

where `sex:age`

is the interaction
term. The model `~age+sex:age`

is slightly different in that the main
effect for `sex`

is dropped which means that intercept of the
`age`

effect is common among the sexes but the age pattern could vary
between the sexes. The model `~sex*Age`

which is equivalent to
`~sex + Age + sex:Age`

has only 4 parameters and specifies a linear
trend with age and both the intercept and slope vary by sex. One additional
operator that can be useful is `I()`

which allows computation of
variables on the fly. For example, the addition of the Agesq variable in
the design data (as described above) can be avoided by using the notation
`~Age + I(Age^2)`

which specifies use of a linear and quadratic effect
for age. Note that specifying the model `~age + I(age^2)`

would be
incorrect and would create an error because `age`

is a factor variable
whereas `Age`

is not.

As an example, consider developing a model in which Phi varies by age and p follows a linear trend over time. This model could be specified and run as follows:

1 2 3 4 | ```
p.Time=list(formula=~Time) Phi.age=list(formula=~age)
Model.p.Time.Phi.age=make.mark.model(proc.example.data,ddl,
parameters=list(Phi=Phi.age,p=p.Time))
Model.p.Time.Phi.age=run.mark.model(Model.p.Time.Phi.age)
``` |

The first 2 commands define the p and Phi models that are used in the
`parameter`

list in the call to `make.mark.model`

. This is a good
approach for defining models because it clearly documents the models, the
definitions can then be used in many calls to `make.mark.model`

and it
will allow a variety of models to be developed quickly by creating different
combinations of the parameter models. Using the notation above with the
period separating the parameter name and the description (eg., p.Time) gives
the further advantage that all possible models can be developed quickly with
the functions `create.model.list`

and
`mark.wrapper`

.

Model formula can use any field in the design data and any individual
covariates defined in `data`

. The restrictions on individual
covariates that was present in versions before 1.3 have now been removed.
You can now use interactions of individual covariates with all design data
covariates and products of individual covariates. You can specify
interactions of individual covariates and factor variables in the design
data with the formula notation. For example, `~region*x1`

describes a
model in which the slope of `x1`

varies by `region`

. Also,
`~time*x1`

describes a model in which the slope for `x1`

varied by
time; however, there would be only one value of the covariate per animal so
this is not a time varying covariate model. Models with time varying
covariates are now more easily described with the improvements in version
1.3 but there are restrictions on how the time varying individual covariates
are named. An example would be a trap dependence model in which capture
probability on occasion i+1 depends on whether they were captured on
occasion i. If there are n occasions in a CJS model, the 0/1 (not
caught/caught) for occasions 1 to n-1 would be n-1 individual covariates to
describe recapture probability for occasions 2 to n. For times 2 to n, a
design data field could be defined such that the variable timex is 1 if
time==x and 0 otherwise. The time varying covariates must be named with a
time suffix on the base name of the covariate. In this example they would be
named as `x2,. . .,xn`

and the model could be specified as ```
~time
+ x
```

for time variation and a constant trap effect or as ```
~time +
time:x
```

for a trap effect that varied by time. If in the
`process.data`

call, the argument `begin.time`

was set to
the year 1990, then the variables would have to be named x1991,x1992,...
because the first recapture occasion would be 1991. Note that the times are
different for different parameters. For example, survival is labeled based
on the beginning of the time interval which would be 1990 so the variables
should be named appropriately for the parameter model in which they will be
used.

In previous versions to handle time varying covariates with a constant effect, it was necessary to use the component feature of the parameter specification to be able to append one or more arbitrary columns to the design matrix. That is no longer required for individual covariates and the component feature was removed in v2.0.8.

There are three other elements of the parameter list that can be useful on
occasion. The first is `link`

which specifies the link function for
transforming between the beta and real parameters. The possible values are
"logit", "log", "identity" and "mlogit(*)" where * is a numeric identifier.
The "sin" link is not allowed because all models are specified using a
design matrix. The typical default values are assigned to each parameter
(eg "logit" for probabilities, "log" for N, and "mlogit" for pent in POPAN),
so in most cases it will not be necessary to specify a link function.

The second is `fixed`

which allows real parameters to be set at fixed
values. The values for `fixed`

can be either a single value or a list
with 5 alternate forms for ease in specifying the fixed parameters.
Specifying `fixed=value`

will set all parameters of that type to the
specified value. For example, `Phi=list(fixed=1)`

will set all Phi to
1. This can be useful for situations like specifying F in the
Burnham/Barker models to all have the same value of 1. Fixed values can
also be specified as a list in which values are specified for certain
indices, times, ages, cohorts, and groups. The first 3 will be the most
useful. The first list format is the most general and flexible but it
requires an understanding of the PIM structure and index numbers for the
real parameters. For example,

`Phi=list(formula=~time, fixed=list(index=c(1,4,7),value=1))`

specifies Phi varying by time, but the real parameters 1,4,7 are set to 1.
The `value`

field is either a single constant or its length must match
the number of indices. For example,

`Phi=list(formula=~time, fixed=list(index=c(1,4,7),value=c(1,0,1)))`

sets real parameters 1 and 7 to 1 and real parameter 4 to 0. Technically,
the index/value format for fixed is not wedded to the parameter type (i.e.,
values for p can be assigned within Phi list), but for the sake of clarity
they should be restricted to fixing real parameters associated with the
particular parameter type. The `time`

and `age`

formats for fixed
will probably be the most useful. The format fixed=list(time=x, value=y)
will set all real parameters (of that type) for time x to value y. For
example,

`p=list(formula=~time,fixed=list(time=1986,value=1))`

sets up time varying capture probability but all values of p for 1986 are set to 1. This can be quite useful to set all values to 0 in years with no sampling (e.g.,

1 |

).
The `age`

, `cohort`

and `group`

formats work in a similar
fashion. It is important to recognize that the value you specify for
`time`

, `age`

, `cohort`

and `group`

must match the
values in the design data list. This is another reason to add binned fields
for age, time etc with `add.design.data`

after creating the
default design data with `make.design.data`

. Also note that the
values for `time`

and `cohort`

are affected by the
`begin.time`

argument specified in `process.data`

. Had I
not specified `begin.time=1980`

, to set p in the last occasion (1986),
the specification would be

`p=list(formula=~time,fixed=list(time=7,value=1))`

because begin.time defaults to 1. The advantage of the time-, age-, and
cohort- formats over the index-format is that it will work regardless of the
group definition which can easily be changed by changing the `groups`

argument in `process.data`

. The index-format will be dependent
on the group structure because the indexing of the PIMS will most likely
change with changes in the group structure.

Parameters can also be fixed at default values by deleting the specific rows
of the design data. See `make.design.data`

and material below.
The default value for fixing parameters for deleted design data can be
changed with the `default=value`

in the parameter list.

Another useful element of the parameter list is the
`remove.intercept`

argument. It is set to TRUE to forcefully remove
the intercept. In R notation this can be done by specifiying the formula
notation ~-1+... but in formula with nested interactions of factor variables
and additive factor variables the -1 notation will not remove the intercept.
It will simply adjust the column definitions but will keep the same number
of columns and the model will be overparameterized. The problem occurs with
nested factor variables like tostratum within stratum for multistrata
designs (see `mstrata`

). As shown in that example, you can
build a formula -1+stratum:tostratum to have transitions that are
stratum-specific. If however you also want to add a sex effect and you
specify -1+sex+stratum:tostratum it will add 2 columns for sex labelled M
and F when in fact you only want to add one column because the intercept is
already contained within the stratum:tostratum term. The argument
remove.intercept will forcefully remove the intercept but it needs to be
able to find a column with all 1's. For example,
Psi=list(formula=~sex+stratum:tostratum,remove.intercept=TRUE) will work but
Psi=list(formula=~-1+sex+stratum:tostratum,remove.intercept=TRUE) will not
work. Also, the -1 notation should be used when there is not an added
factor variable because

1 |

will not work because while the stratum:tostratum effectively includes an intercept it is equivalent to using an identity matrix and is not specified as treatment contrast with one of the columns as all 1's.

The final argument of the parameter list is contrast which can be used to change the contrast used for model.matrix. It uses the default if none specified. The form is shown in ?model.matrix.

The argument simplify determines whether the pims are simplified such that only indices for unique and fixed real parameters are used. For example, with an all different PIM structure with CJS with K occasions there are K*(K-1) real parameters for Phi and p. However, if you use simplify=TRUE with the default model of Phi(.)p(.), the pims are re-indexed to be 1 for all the Phi's and 2 for all the p's because there are only 2 unique real parameters for that model. Using simplify can speed analysis markedly and probably should always be used. This was left as an argument only to test that the simplification was working and produced the same likelihood and real parameter estimates with and without simplification. It only adjust the rows of the design matrix and not the columns. There are some restrictions for simplification. Real parameters that are given a fixed value are maintained in the design matrix although it does simplify amongst the fixed parameters. For example, if there are 50 real parameters all fixed to a value of 1 and 30 all fixed to a value of 0, they are reduced to 2 real parameters fixed to 1 and 0. Also, real parameters such as Psi in Multistrata and pent in POPAN that use multinomial logits are not simplified because they must maintain the structure created by the multinomial logit link. All other parameters in those models are simplified. The only downside of simplification is that the labels for real parameters in the MARK output are unreliable because there is no longer a single label for the real parameter because it may represent many different real parameters in the all-different PIM structure. This is not a problem with the labels in R because the real parameter estimates are translated back to the all-different PIM structure with the proper labels.

The argument `default.fixed`

is related to deletion of design data (see
`make.design.data`

). If design data are deleted and
`default.fixed=T`

the missing real parameters are fixed at a reasonable
default to represent structural "zeros". For example, p is set to 0, Phi is
set to 1, pent is set to 0, etc. For some parameters there are no
reasonable values (e.g., N in POPAN), so not all parameters will have
defaults. If a parameter does not have a default or if
`default.fixed=F`

then the row in the design matrix for that parameter
is all zeroes and its real value will depend on the link function. For
example, with "logit" link the real parameter value will be 0.5 and for the
log link it will be 1. As long as the inverse link is defined for 0 it will
not matter in those cases in which the real parameter is not used because it
represents data that do not exist. For example, in a "CJS" model if initial
captures (releases) only occur every other occasion, but recaptures
(resightings) occurred every occasion, then every other cohort (row) in the
PIM would have no data. Those rows (cohorts) could be deleted from the
design data and it would not matter if the real parameter was fixed.
However, for the case of a Jolly-Seber type model (eg POPAN or Pradel
models) in which the likelihood includes a probability for the leading
zeroes and first 1 in a capture history (a likelihood component for the
first capture of unmaked animals), and groups represent cohorts that enter
through time, you must fix the real parameters for the unused portion of the
PIM (ie for occasions prior to time of birth for the cohort) such that the
estimated probability of observing the structural 0 is 1. This is easily
done by setting the pent (probability of entry) to 0 or by setting the
probability of capture to 0 or both. In that case if
`default.fixed=F`

, the probabilities for all those parameters would be
incorrectly set to 0.5 for p and something non-zero but not predetermined
for pent because of the multinomial logit. Now it may be possible that the
model would correctly estimate these as 0 if the real parameters were kept
in the design, but we know what those values are in that case and they need
not be estimated. If it is acceptable to set `default.fixed=F`

, the
functions such as `summary.mark`

recognize the non-estimated
real parameters and they are not shown in the summaries because in essence
they do not exist. If `default.fixed=T`

the parameters are displayed
with their fixed value and for `summary.mark(mymodel,se=TRUE)`

, they
are listed as "Fixed".

Missing design data does have implications for specifying formula but only when interactions are specified. With missing design data various factors may not be fully crossed. For example, with 2 factors A and B, each with 2 levels, the data are fully crossed if there are data with values A1&B1, A1&B2, A2&B1 and A2&B2. If data exist for each of the 4 combinations then you can described the interaction model as ~A*B and it will estimate 4 values. However, if data are missing for one of more of the 4 cells then the "interaction" formula should be specified as ~-1+A:B or ~-1+B:A or ~-1+A the combinations with data. An example of this could be a marking program with multiple sites which resighted at all occasions but which only marked at sites on alternating occasions. In that case time is nested within site and time-site interaction models would be specified as ~-1+time:site.

The argument `title`

supplies a character string that is used to label
the output. The argument `model.name`

is a descriptor for the model
used in the analysis. The code constructs a model name from the formula
specified in the call (e.g., `Phi(~1)p(~time)`

) but on occasion the
name may be too long or verbose, so it can be over-ridden with the
`model.name`

argument.

The final argument `initial`

can be used to provide initial estimates
for the beta parameters. It is either 1) a single starting value for each
parameter, 2) an unnamed vector of values (one for each parameter), 3) named
vector of values, or 4) the name of `mark`

object that has already been
run. For cases 3 and 4, the code only uses appropriate initial beta
estimates in which the column names of the design matrix (for model) or
vector names match the column names in the design matrix of the model to be
run. Any remaining beta parameters without an initial value specified are
assigned a 0 initial value. If case 4 is used the models must have the same
number of rows in the design matrix and thus presumably the same structure.
As long as the vector elements are named (#3), the length of the
`initial`

vector no longer needs to match the number of parameters in
the new model as long as the elements are named. The names can be retrieved
either from the column names of the design matrix or from
`rownames(x$results$beta)`

where `x`

is the name of the
`mark`

object.

`options`

is a character string that is tacked onto the ```
Proc
Estimate
```

statement for the MARK .inp file. It can be used to request
options such as NoStandDM (to not standardize the design matrix) or
SIMANNEAL (to request use of the simulated annealing optimization method) or
any existing or new options that can be set on the estimate proc.

model: a MARK object except for the elements `output`

and
`results`

. See `mark`

for a detailed description of the
list contents.

Jeff Laake

`process.data`

,`make.design.data`

,
`run.mark.model`

`mark`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ```
# This example is excluded from testing to reduce package check time
data(dipper)
#
# Process data
#
dipper.processed=process.data(dipper,groups=("sex"))
#
# Create default design data
#
dipper.ddl=make.design.data(dipper.processed)
#
# Add Flood covariates for Phi and p that have different values
#
dipper.ddl$Phi$Flood=0
dipper.ddl$Phi$Flood[dipper.ddl$Phi$time==2 | dipper.ddl$Phi$time==3]=1
dipper.ddl$p$Flood=0
dipper.ddl$p$Flood[dipper.ddl$p$time==3]=1
#
# Define range of models for Phi
#
Phidot=list(formula=~1)
Phitime=list(formula=~time)
PhiFlood=list(formula=~Flood)
#
# Define range of models for p
#
pdot=list(formula=~1)
ptime=list(formula=~time)
#
# Make assortment of models
#
dipper.phidot.pdot=make.mark.model(dipper.processed,dipper.ddl,
parameters=list(Phi=Phidot,p=pdot))
dipper.phidot.ptime=make.mark.model(dipper.processed,dipper.ddl,
parameters=list(Phi=Phidot,p=ptime))
dipper.phiFlood.pdot=make.mark.model(dipper.processed,dipper.ddl,
parameters=list(Phi=PhiFlood, p=pdot))
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.