View source: R/fit_hglm_occupancy_models.R
fit_hglm_occupancy_models | R Documentation |
Estimate probability of occupancy for a set of features in a set of
planning units. Models are fitted as hierarchical generalized linear models
that account for for imperfect detection (following Royle & Link 2006)
using JAGS (via runjags::run.jags()
). To limit over-fitting,
covariate coefficients are sampled using a Laplace prior distribution
(equivalent to L1 regularization used in machine learning contexts)
(Park & Casella 2008).
fit_hglm_occupancy_models(
site_data,
feature_data,
site_detection_columns,
site_n_surveys_columns,
site_env_vars_columns,
feature_survey_sensitivity_column,
feature_survey_specificity_column,
jags_n_samples = rep(10000, length(site_detection_columns)),
jags_n_burnin = rep(1000, length(site_detection_columns)),
jags_n_thin = rep(100, length(site_detection_columns)),
jags_n_adapt = rep(1000, length(site_detection_columns)),
jags_n_chains = rep(4, length(site_detection_columns)),
n_folds = rep(5, length(site_detection_columns)),
n_threads = 1,
seed = 500,
verbose = FALSE
)
site_data |
|
feature_data |
|
site_detection_columns |
|
site_n_surveys_columns |
|
site_env_vars_columns |
|
feature_survey_sensitivity_column |
|
feature_survey_specificity_column |
|
jags_n_samples |
|
jags_n_burnin |
|
jags_n_thin |
|
jags_n_adapt |
|
jags_n_chains |
|
n_folds |
|
n_threads |
|
seed |
|
verbose |
|
This function (i) prepares the data for model fitting, (ii) fits the models, and (iii) assesses the performance of the models. These analyses are performed separately for each feature. For a given feature:
The data are prepared for model fitting by partitioning the data using
k-fold cross-validation (set via argument to n_folds
). The
training and evaluation folds are constructed
in such a manner as to ensure that each training and evaluation
fold contains at least one presence and one absence observation.
A model for fit separately for each fold (see
inst/jags/model.jags
for model code). To assess convergence,
the multi-variate potential scale reduction factor
(MPSRF) statistic is calculated for each model.
The performance of the cross-validation models is evaluated.
Specifically, the TSS, sensitivity, and specificity statistics are
calculated (if relevant, weighted by the argument to
site_weights_data
). These performance values are calculated using
the models' training and evaluation folds. To assess convergence,
the maximum MPSRF statistic for the models fit for each feature
is calculated.
A list
object containing:
list
of list
objects containing the models.
tibble::tibble()
object containing
predictions for each feature.
tibble::tibble()
object containing the
performance of the best models for each feature. It contains the following
columns:
name of the feature.
maximum multi-variate potential scale reduction factor (MPSRF) value for the models. A MPSRF value less than 1.05 means that all coefficients in a given model have converged, and so a value less than 1.05 in this column means that all the models fit for a given feature have successfully converged.
mean TSS statistic for models calculated using training data in cross-validation.
standard deviation in TSS statistics for models calculated using training data in cross-validation.
mean sensitivity statistic for models calculated using training data in cross-validation.
standard deviation in sensitivity statistics for models calculated using training data in cross-validation.
mean specificity statistic for models calculated using training data in cross-validation.
standard deviation in specificity statistics for models calculated using training data in cross-validation.
mean TSS statistic for models calculated using test data in cross-validation.
standard deviation in TSS statistics for models calculated using test data in cross-validation.
mean sensitivity statistic for models calculated using test data in cross-validation.
standard deviation in sensitivity statistics for models calculated using test data in cross-validation.
mean specificity statistic for models calculated using test data in cross-validation.
standard deviation in specificity statistics for models calculated using test data in cross-validation.
This function requires the JAGS software to be installed. For information on installing the JAGS software, please consult the documentation for the rjags package.
Park T & Casella G (2008) The Bayesian lasso. Journal of the American Statistical Association, 103: 681–686.
Royle JA & Link WA (2006) Generalized site occupancy models allowing for false positive and false negative errors. Ecology, 87: 835–841.
## Not run:
# set seeds for reproducibility
set.seed(123)
# simulate data for 200 sites, 2 features, and 3 environmental variables
site_data <- simulate_site_data(n_sites = 30, n_features = 2, prop = 0.1)
feature_data <- simulate_feature_data(n_features = 2, prop = 1)
# print JAGS model code
cat(readLines(system.file("jags", "model.jags", package = "surveyvoi")),
sep = "\n")
# fit models
# note that we use a small number of MCMC iterations so that the example
# finishes quickly, you probably want to use the defaults for real work
results <- fit_hglm_occupancy_models(
site_data, feature_data,
c("f1", "f2"), c("n1", "n2"), c("e1", "e2", "e3"),
"survey_sensitivity", "survey_specificity",
n_folds = rep(5, 2),
jags_n_samples = rep(250, 2), jags_n_burnin = rep(250, 2),
jags_n_thin = rep(1, 2), jags_n_adapt = rep(100, 2),
n_threads = 1)
# print model predictions
print(results$predictions)
# print model performance
print(results$performance, width = Inf)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.