View source: R/fit_hglm_occupancy_models.R
fit_hglm_occupancy_models | R Documentation |
Estimate probability of occupancy for a set of features in a set of
planning units. Models are fitted as hierarchical generalized linear models
that account for for imperfect detection (following Royle & Link 2006)
using JAGS (via runjags::run.jags()
). To limit over-fitting,
covariate coefficients are sampled using a Laplace prior distribution
(equivalent to L1 regularization used in machine learning contexts)
(Park & Casella 2008).
fit_hglm_occupancy_models( site_data, feature_data, site_detection_columns, site_n_surveys_columns, site_env_vars_columns, feature_survey_sensitivity_column, feature_survey_specificity_column, jags_n_samples = rep(10000, length(site_detection_columns)), jags_n_burnin = rep(1000, length(site_detection_columns)), jags_n_thin = rep(100, length(site_detection_columns)), jags_n_adapt = rep(1000, length(site_detection_columns)), jags_n_chains = rep(4, length(site_detection_columns)), n_folds = rep(5, length(site_detection_columns)), n_threads = 1, seed = 500, verbose = FALSE )
site_data |
|
feature_data |
|
site_detection_columns |
|
site_n_surveys_columns |
|
site_env_vars_columns |
|
feature_survey_sensitivity_column |
|
feature_survey_specificity_column |
|
jags_n_samples |
|
jags_n_burnin |
|
jags_n_thin |
|
jags_n_adapt |
|
jags_n_chains |
|
n_folds |
|
n_threads |
|
seed |
|
verbose |
|
This function (i) prepares the data for model fitting, (ii) fits the models, and (iii) assesses the performance of the models. These analyses are performed separately for each feature. For a given feature:
The data are prepared for model fitting by partitioning the data using
k-fold cross-validation (set via argument to n_folds
). The
training and evaluation folds are constructed
in such a manner as to ensure that each training and evaluation
fold contains at least one presence and one absence observation.
A model for fit separately for each fold (see
inst/jags/model.jags
for model code). To assess convergence,
the multi-variate potential scale reduction factor
(MPSRF) statistic is calculated for each model.
The performance of the cross-validation models is evaluated.
Specifically, the TSS, sensitivity, and specificity statistics are
calculated (if relevant, weighted by the argument to
site_weights_data
). These performance values are calculated using
the models' training and evaluation folds. To assess convergence,
the maximum MPSRF statistic for the models fit for each feature
is calculated.
A list
object containing:
list
of list
objects containing the models.
tibble::tibble()
object containing
predictions for each feature.
tibble::tibble()
object containing the
performance of the best models for each feature. It contains the following
columns:
name of the feature.
maximum multi-variate potential scale reduction factor (MPSRF) value for the models. A MPSRF value less than 1.05 means that all coefficients in a given model have converged, and so a value less than 1.05 in this column means that all the models fit for a given feature have successfully converged.
mean TSS statistic for models calculated using training data in cross-validation.
standard deviation in TSS statistics for models calculated using training data in cross-validation.
mean sensitivity statistic for models calculated using training data in cross-validation.
standard deviation in sensitivity statistics for models calculated using training data in cross-validation.
mean specificity statistic for models calculated using training data in cross-validation.
standard deviation in specificity statistics for models calculated using training data in cross-validation.
mean TSS statistic for models calculated using test data in cross-validation.
standard deviation in TSS statistics for models calculated using test data in cross-validation.
mean sensitivity statistic for models calculated using test data in cross-validation.
standard deviation in sensitivity statistics for models calculated using test data in cross-validation.
mean specificity statistic for models calculated using test data in cross-validation.
standard deviation in specificity statistics for models calculated using test data in cross-validation.
This function requires the JAGS software to be installed. For information on installing the JAGS software, please consult the documentation for the rjags package.
Park T & Casella G (2008) The Bayesian lasso. Journal of the American Statistical Association, 103: 681–686.
Royle JA & Link WA (2006) Generalized site occupancy models allowing for false positive and false negative errors. Ecology, 87: 835–841.
## Not run: # set seeds for reproducibility set.seed(123) # simulate data for 200 sites, 2 features, and 3 environmental variables site_data <- simulate_site_data(n_sites = 30, n_features = 2, prop = 0.1) feature_data <- simulate_feature_data(n_features = 2, prop = 1) # print JAGS model code cat(readLines(system.file("jags", "model.jags", package = "surveyvoi")), sep = "\n") # fit models # note that we use a small number of MCMC iterations so that the example # finishes quickly, you probably want to use the defaults for real work results <- fit_hglm_occupancy_models( site_data, feature_data, c("f1", "f2"), c("n1", "n2"), c("e1", "e2", "e3"), "survey_sensitivity", "survey_specificity", n_folds = rep(5, 2), jags_n_samples = rep(250, 2), jags_n_burnin = rep(250, 2), jags_n_thin = rep(1, 2), jags_n_adapt = rep(100, 2), n_threads = 1) # print model predictions print(results$predictions) # print model performance print(results$performance, width = Inf) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.