require(rmarkdown) require(knitr) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(GenEst) vers <- packageVersion("GenEst") today <- Sys.Date()
This vignette walks through an example of GenEst at the command line and
was constructed using GenEst version r vers
on r today
.
To obtain the most recent version of GenEst, download the most recent version build from USGS or CRAN.
For this vignette, we will be using a completely generic, mock dataset provided with the GenEst package, which contains Searcher Efficiency (SE), Carcass Persistence (CP), Search Schedule (SS), Density Weighted Proportion (DWP), and Carcass Observation (CO) Data.
data(mock) names(mock)
The central function for searcher efficiency analyses is pkm
, which, in its
most basic form, conducts a singular searcher efficiency analysis (i.e., a
singular set of $p$ and $k$ formulae and a singular size classification of
carcasses). As a first example, we will ignore the size category and use
intercept-only models for both $p$ and $k$:
data_SE <- mock$SE pkModel <- pkm(formula_p = p ~ 1, formula_k = k ~ 1, data = data_SE)
Here, we have taken advantage of pkm
's default behavior of selecting
observation columns (see ?pkm
for details).
head(data_SE)
If we wanted to explicitly control the observations, we would use the obsCol
argument:
pkModel <- pkm(formula_p = p ~ 1, formula_k = k ~ 1, data = data_SE, obsCol = c("Search1", "Search2", "Search3", "Search4") )
Note that the search observations must be entered in order such that no
carcasses have non-detected observations (i.e., 0
) after detected
observations (i.e., 1
). Further, no carcasses can be detected more than
once.
If successfully fit, a pkm
model output contains a number of elements,
some printed automatically:
pkModel
and others available upon request:
names(pkModel) pkModel$cells
The plot
function has been defined for pkm
objects, such that one can
simply run
plot(pkModel)
to visualize the model's output.
You can generate random draws of the $p$ and $k$ parameters for each cell
grouping (in pkModel
there are no predictors, so there is one cell grouping
called "all") using the rpk
function which, like other r*
functions in
R (e.g., rnorm
, runif
) takes the number of random draws (n
) as the
first argument:
rpk(n = 10, pkModel)
You can complicate the $p$ and $k$ formulae independently
pkm(formula_p = p ~ Visibility, formula_k = k ~ HabitatType, data = data_SE, obsCol = c("Search1", "Search2", "Search3", "Search4") )
And you can fix $k$ at a nominal value between 0 and 1 (inclusive) using the
kFixed
argument
pkm(formula_p = p ~ Visibility, kFixed = 0.7, data = data_SE, obsCol = c("Search1", "Search2", "Search3", "Search4"))
If the arg allCombos = TRUE
is provided, pkm
fits a set of pkm
models
defined as all allowable models simpler than, and including, the provided model
for both formulae (where "allowable" means that any interaction terms have all
component terms included in the model).
Consider the following model set analysis, where visibility and habitat type are included in the $p$ formula but only habitat type is in the $k$ formula. This generates a set of 10 models:
pkmModSet <- pkm(formula_p = p ~ Visibility*HabitatType, formula_k = k ~ HabitatType, data = data_SE, obsCol = c("Search1", "Search2", "Search3", "Search4"), allCombos = TRUE ) class(pkmModSet) names(pkmModSet)
The plot
function is defined for the pkmSet
class, and by default,
creates a new plot window on command for each sub-model. If we want to only
plot a specific single (or subset) of models from the full set, we can utilize
the specificModel
argument:
plot(pkmModSet, specificModel = "p ~ Visibility + HabitatType; k ~ 1")
The resulting model outputs can be compared in an AICc table
aicc(pkmModSet)
Often, carcasses are grouped in multiple size classes, and we are interested
in analyzing a set of models separately for each size class. To do so, we
use the sizeCol
arg to tell pkm
which column in data_CP
gives the carcass
size class. If, in addition, allCombos = TRUE
, pkm
will fit a pkmSet
that
runs for each unique size class in the column identified by the sizeCol
argument:
pkmModSetSize <- pkm(formula_p = p ~ Visibility*HabitatType, formula_k = k ~ HabitatType, data = data_SE, obsCol = c("Search1", "Search2", "Search3", "Search4"), sizeCol = "Size", allCombos = TRUE) class(pkmModSetSize)
The pkmSetSize
object is a list where each element corresponds to a
different unique size class, and contains the associated pkmSet
object, which
itself is a list of pkm
outputs:
names(pkmModSetSize) names(pkmModSetSize[[1]])
The central function for carcass persistence analyses is cpm
, which, in its
simplest form, conducts a singular carcass persistence analysis (i.e., a
singular set of $l$ and $s$ formulae and a singular size classification of
carcasses). Note that we use $l$ and $s$ to reference $location$ and $scale$
as the parameters for survival models, following survreg
, however we also
provide an alternative parameterization (using parameters $a$ and $b$,
referred to as "ab
" or "ppersist"). As a first example, we will ignore the
size category, use intercept-only models for both $l$ and $s$, and use the
Weibull distribution:
data_CP <- mock$CP cpModel <- cpm(formula_l = l ~ 1, formula_s = s ~ 1, data = data_CP, left = "LastPresentDecimalDays", right = "FirstAbsentDecimalDays", dist = "weibull" )
If successfully fit, a cpm
model output contains a number of elements,
some printed automatically:
cpModel
and others available upon request:
names(cpModel) cpModel$cells
The plot
function has been defined for cpm
objects, such that one can
simply run
plot(cpModel)
to visualize the model's output.
You can generate random draws of the $l$ and $s$ (or $a$ and $b$) parameters
for each cell grouping (in cpModel
there are no predictors, so there is one
cell grouping called "all") using the rcp
function which, like other r*
functions in R (e.g., rnorm
) takes the number of random draws (n
)
as the first argument:
rcp(n = 10, cpModel) rcp(n = 10, cpModel, type = "ppersist")
You can complicate the $l$ and $s$ formulae independently
cpm(formula_l = l ~ Visibility * GroundCover, formula_s = s ~ 1, data = data_CP, left = "LastPresentDecimalDays", right = "FirstAbsentDecimalDays", dist = "weibull" )
Given that the exponential only has one parameter ($l$, location), a model
for scale (formula_s
) is not required:
cpModExp <- cpm(formula_l = l ~ Visibility * GroundCover, data = data_CP, left = "LastPresentDecimalDays", right = "FirstAbsentDecimalDays", dist = "exponential" )
If the arg allCombos = TRUE
is provided, cpm
fits a set of cpm
models
defined as all allowable models simpler than, and including, the provided model
formulae (where "allowable" means that any interaction terms have all component
terms included in the model).
In addition, cpm
with allCombos
can include any subset of the four base
distributions (exponential, weibull, lognormal, loglogistic) and crosses them
with the predictor models.
Consider the following model set analysis, where Visibility
and Season
are
included in the $l$ formula but only Visibility
is in the $s$ formula, and
only the exponential and lognormal distributions are included. This generates a
set of 15 models:
cpmModSet <- cpm(formula_l = l ~ Visibility * Season, formula_s = s ~ Visibility, data = data_CP, left = "LastPresentDecimalDays", right = "FirstAbsentDecimalDays", dist = c("exponential", "lognormal"), allCombos = TRUE ) class(cpmModSet) names(cpmModSet)
The resulting model outputs can be compared in an AICc table
aicc(cpmModSet)
The plot
function is defined for the cpmSet
class, and by default,
creates a new plot window on command for each sub-model. If we want to only
plot a specific single (or subset) of models from the full set, we can utilize
the specificModel
argument:
plot(cpmModSet, specificModel = "dist: lognormal; l ~ Visibility * Season; s ~ Visibility" )
Often, carcasses are grouped in multiple size classes, and we are interested
in analyzing a set of models separately for each size class. To do so, we
furnish cpm
with sizeCol
, which is the name of the column in data_CP
that
gives the size classes of the carcasses. If, in addition, allCombos = TRUE
,
then cpm
returns a cpmSet
for each unique size class in the column
identified by the sizeCol
argument:
cpmModSetSize <- cpm(formula_l = l ~ Visibility * Season, formula_s = s ~ Visibility, data = data_CP, left = "LastPresentDecimalDays", right = "FirstAbsentDecimalDays", dist = c("exponential", "lognormal"), sizeCol = "Size", allCombos = TRUE) class(cpmModSetSize)
The cpmSetSize
object is a list where each element corresponds to a
different unique size class, and contains the associated cpmSet
o bject, which
itself is a list of cpm
outputs:
names(cpmModSetSize) names(cpmModSetSize[[1]]) class(cpmModSetSize[[1]])
For the purposes of mortality estimation, we calculate carcass-specific detection probabilities (see below), which may be difficult to generalize, given the specific history of each observed carcass. Thus, we also provide a simple means to calculate generic detection probabilities that are cell-specific, rather than carcass-specific.
For any estimation of detection probability ($\hat{g}$), we need to have singular SE and CP models to use for each of the size classes. Here, we use the best-fit of the models for each size class:
pkMods <- c("S" = "p ~ 1; k ~ 1", "L" = "p ~ 1; k ~ 1", "M" = "p ~ 1; k ~ 1", "XL" = "p ~ 1; k ~ HabitatType" ) cpMods <- c("S" = "dist: exponential; l ~ Season; NULL", "L" = "dist: exponential; l ~ 1; NULL", "M" = "dist: exponential; l ~ 1; NULL", "XL" = "dist: exponential; l ~ 1; NULL" )
The estgGenericSize
function produces n
random draws of generic (i.e.,
cell-specific, not carcass-sepecific) detection probabilities for
each of the possible carcass cell combinations across the selected SE and CP
models across the size classes. estgGeneric
is a single-size-class
version of function and estgGenericSize
actually loops over
estgGeneric
. The generic $\hat{g}$ is estimated according to a particular
search schedule. When we pass averageSS
a full data_SS
table like we have
here, it will assume that columns filled exclusively with 0s and 1s represent
search schedules for units and will create the average search schedule across
the units.
data_SS <- mock$SS avgSS <- averageSS(data_SS) gsGeneric <- estgGenericSize(nsim = 1000, days = avgSS, modelSetSize_SE = pkmModSetSize, modelSetSize_CP = cpmModSetSize, modelSizeSelections_SE = pkMods, modelSizeSelections_CP = cpMods )
The output from estgGeneric
can be simply summarized
summary(gsGeneric)
or plotted.
plot(gsGeneric)
When estimating mortality, detection probability is determined for individual carcasses based on the dates when they are observed, size class values, associated covariates, the searcher efficiency and carcass persistence models, and the search schedule. The carcass-specific detection probabilities (as opposed to the generic/cell-specific detection probabilities above) are therefore calculated before estimating the total mortality. Although it is possible to estimate these detection probabilities separately, they are best interpreted in the context of a full mortality estimation.
The estM
function is the general wrapper function for estimating M
,
whether for a single size class or multiple size classes. Prior to estimation,
we need to reduce the model-set-size complexed to just a single chosen
model per size class, corresponding to the pkMods
and cpMods
vectors
given above. To reduce the model set complexity, we can use the trimSetSize
function:
pkmModSize <- trimSetSize(pkmModSetSize, pkMods) cpmModSize <- trimSetSize(cpmModSetSize, cpMods)
In addition to the models and search schedule data, estM
requires
density-weighted proportion (DWP) and carcass observation (CO) data. If more
than one size class is represented in the data, a required input is also the
column names associated with the DWP value for each size class (argument
DWPCol
in estM
):
data_CO <- mock$CO data_DWP <- mock$DWP head(data_DWP) DWPcolnames <- names(pkmModSize) eM <- estM(data_CO = data_CO, data_SS = data_SS, data_DWP = data_DWP, frac = 1, model_SE = pkmModSize, model_CP = cpmModSize, unitCol = "Unit", COdate = "DateFound", SSdate = "DateSearched", sizeCol = "Size", nsim = 1000)
estM
returns an object that contains the random draws of pkm
and cpm
parameters (named pk
and ab
, respectively) and the estimated carcass-level
detection parameters (g
), arrival intervals (Aj
), and associated total
mortality (Mhat
) values for each simulation. These Mhat
values should be
considered in combination, and can be summarized and plotted simply:
summary(eM) plot(eM)
It is possible to split the resulting mortality estimation into components that are denoted according to covariates in either the search schedule or carcass observation data sets.
First, a temporal split:
M_season <- calcSplits(M = eM, split_SS = "Construction", split_CO = NULL, data_SS = data_SS, data_CO = data_CO ) summary(M_season) plot(M_season)
Next, a carcass split:
M_class <- calcSplits(M = eM, split_SS = NULL, split_CO = "Split", data_SS = data_SS, data_CO = data_CO ) summary(M_class) plot(M_class)
And finally, if two splits are included, the mortality estimation is expanded fully factorially:
M_SbyC <- calcSplits(M = eM, split_SS = "Construction", split_CO = "Split", data_SS = data_SS, data_CO = data_CO ) summary(M_SbyC) plot(M_SbyC)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.