prepare_data: Wrangle data to use for modelling input

View source: R/prepare-data.R

prepare_dataR Documentation

Wrangle data to use for modelling input

Description

prepare_data subsets raw BBS data by selected species and and wrangles stratified data for use as input for models.

Usage

prepare_data(
  strat_data = NULL,
  species_to_run = NULL,
  model = NULL,
  heavy_tailed = FALSE,
  n_knots = NULL,
  min_year = NULL,
  max_year = NULL,
  min_n_routes = 3,
  min_max_route_years = 3,
  min_mean_route_years = 1,
  strata_rem = NULL,
  quiet = FALSE,
  sampler = "jags",
  basis = "original",
  ...
)

Arguments

strat_data

Large list of stratified data returned by stratify()

species_to_run

Character string of the English name of the species to run

model

Character string of model to be used. Options are "slope", "firstdiff", "gam", "gamye.

heavy_tailed

Logical indicating whether the extra-Poisson error distribution should be modeled as a t-distribution, with heavier tails than the standard normal distribution. Default is currently FALSE, but recent results suggest users should strongly consider setting this to TRUE, even though it requires much longer convergence times

n_knots

Number of knots to be used in GAM function

min_year

Minimum year to keep in analysis

max_year

Maximum year to keep in analysis

min_n_routes

Minimum routes per strata where species has been observed. Defaults to 3

min_max_route_years

Minimum number of years with non-zero observations of species on at least 1 route. Defaults to 3

min_mean_route_years

Minimum average of years per route with the species observed. Defaults to 1.

strata_rem

Strata to remove from analysis. Defaults to NULL

quiet

Should progress bars be suppressed?

sampler

Which MCMC sampling software to use. Currently bbsBayes only supports "jags".

basis

Which version of the basis-function to use for the GAM smooth, the default is "original" the same basis used in Smith and Edwards 2020 and "mgcv" is an alternate that uses the "tp" basis from the packages mgcv (also used in brms, and rstanarm). If using the "mgcv" option, the user may want to consider adjusting the prior distributions for the parameters and their precision

...

Additional arguments

Value

List of data to be used for modelling, including:

model

The model to be used

heavy_tailed

Logical indicating whether the extra-Poisson error distribution should be modeled as a t-distribution

min_nu

if heavy_tailed is TRUE, minimum value for truncated gamma on DF of t-distribution noise default is 0 and user must change manually after function is run

ncounts

The number of counts containing useful data for the species

nstrata

The number of strata used in the analysis

ymin

Minimum year used

ymax

Maximum year used

nonzeroweight

Proportion of routes in each strata with species obervation

count

Vector of counts for the species

strat

Vector of strata to be used in the analysis

obser

Vector of unique observer-route pairings

year

Vector of years for each count

firstyr

Vector of indicator variables as to whether an observer was a first year

month

vector of numeric month of observation

day

vector of numeric day of observation

nobservers

Total number of observer-route pairings

fixedyear

Median of all years (ymin:ymax), included only with slope and firstdiff models

nknots

Number of knots to use for smooting functions, included only with GAM

X.basis

Basis function for n smoothing functions, included only with GAM

Examples

# Toy example with Pacific Wren sample data
# First, stratify the sample data

strat_data <- stratify(by = "bbs_cws", sample_data = TRUE)

# Prepare the stratified data for use in a model. In this
#   toy example, we will set the minimum year as 2009 and
#   maximum year as 2018, effectively only setting up to
#   model 10 years of data. We will use the "first difference
#   model.
model_data <- prepare_data(strat_data = strat_data,
                           species_to_run = "Pacific Wren",
                           model = "firstdiff",
                           min_year = 2009,
                           max_year = 2018)

# You can also specify the GAM model, with an optional number of
# knots to use for the GAM basis.
# By default, the number of knots will be equal to the floor
# of the total unique years for the species / 4
model_data <- prepare_data(strat_data = strat_data,
                           species_to_run = "Pacific Wren",
                           model = "gam",
                           n_knots = 9)



BrandonEdwards/bbsBayes documentation built on Aug. 11, 2024, 9:33 a.m.