calibrate: Calibrates the reproductive rate and dispersal scales of the...
In ncsu-landscape-dynamics/rpops: Pest or Pathogen Spread Model

calibrate

R Documentation

Calibrates the reproductive rate and dispersal scales of the pops model.

Description

Either Approximate Bayesian Computation or Markov Chain Monte Carlo Approximation is used to estimate relevant model parameters. Model accuracy is gauged using a custom quantity allocation disagreement function to assess accuracy of spatial configuration. We test number of predictions, number of predicted locations, cumulative distance to nearest infection. The calibration uses these metrics to determine if a run is kept if it is under a threshold. either because it improves the results or randomly gets kept despite being worse. We recommend running calibration for at least 10,000 iterations but even more will provide a better result. If the model converges and doesn't improve for awhile it will exist calibration prior to reaching the total number of iterations specified.

Usage

calibrate(
  infected_years_file,
  number_of_observations = 1,
  prior_number_of_observations = 0,
  prior_means = c(0, 0, 0, 0, 0, 0),
  prior_cov_matrix = matrix(0, 6, 6),
  params_to_estimate = c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE),
  number_of_generations = 7,
  generation_size = 1000,
  pest_host_table,
  competency_table,
  infected_file_list,
  host_file_list,
  total_populations_file,
  temp = FALSE,
  temperature_coefficient_file = "",
  precip = FALSE,
  precipitation_coefficient_file = "",
  model_type = "SI",
  latency_period = 0,
  time_step = "month",
  season_month_start = 1,
  season_month_end = 12,
  start_date = "2008-01-01",
  end_date = "2008-12-31",
  use_survival_rates = FALSE,
  survival_rate_month = 3,
  survival_rate_day = 15,
  survival_rates_file = "",
  use_lethal_temperature = FALSE,
  temperature_file = "",
  lethal_temperature = -12.87,
  lethal_temperature_month = 1,
  mortality_frequency = "year",
  mortality_frequency_n = 1,
  management = FALSE,
  treatment_dates = c(""),
  treatments_file = "",
  treatment_method = "ratio",
  natural_kernel_type = "cauchy",
  anthropogenic_kernel_type = "cauchy",
  natural_dir = "NONE",
  natural_kappa = 0,
  anthropogenic_dir = "NONE",
  anthropogenic_kappa = 0,
  pesticide_duration = c(0),
  pesticide_efficacy = 1,
  mask = NULL,
  output_frequency = "year",
  output_frequency_n = 1,
  movements_file = "",
  use_movements = FALSE,
  start_exposed = FALSE,
  generate_stochasticity = TRUE,
  establishment_stochasticity = TRUE,
  movement_stochasticity = TRUE,
  dispersal_stochasticity = TRUE,
  establishment_probability = 0.5,
  dispersal_percentage = 0.99,
  quarantine_areas_file = "",
  use_quarantine = FALSE,
  use_spreadrates = FALSE,
  use_overpopulation_movements = FALSE,
  overpopulation_percentage = 0,
  leaving_percentage = 0,
  leaving_scale_coefficient = 1,
  calibration_method = "ABC",
  number_of_iterations = 1e+05,
  exposed_file_list = "",
  verbose = TRUE,
  write_outputs = "None",
  output_folder_path = "",
  network_filename = "",
  network_movement = "walk",
  success_metric = "mcc",
  use_initial_condition_uncertainty = FALSE,
  use_host_uncertainty = FALSE,
  weather_type = "deterministic",
  temperature_coefficient_sd_file = "",
  precipitation_coefficient_sd_file = "",
  dispersers_to_soils_percentage = 0,
  quarantine_directions = "",
  multiple_random_seeds = FALSE,
  file_random_seeds = NULL,
  use_soils = FALSE,
  soil_starting_pest_file = "",
  start_with_soil_populations = FALSE,
  county_level_infection_data = FALSE
)

Arguments

`infected_years_file`	Raster file with years of initial infection/infestation as individual locations of a pest or pathogen. This is a multiband raster file (e.g. .tif) with each band representing a unique time step (e.g. band 1 = year 1 .... band 6 = year 6 or band 1 = week 1 .... band 6 = week 6). This needs to align with both the time step selection and start and end dates selection. Units for infections are based on data availability and the way the units used for your host file creation (e.g. percent area, # of hosts per cell, etc.). This doesn't include the start year which passed in in the initial_infected_file (e.g. if we had observation data from 2017, 2018, and 2019 the 2017 raster file would be the initial_infected_file and a dual band raster file would have band 1 = 2018 and band 2 = 2019 observations)
`number_of_observations`	the number of observations used for this calibration. Useful if using previous calibration. This is used to weight the parameters when updating parameters when new data becomes available. Example if we have 2,000 observations in 2019 and had 1,000 observations in 2018 and 1,000 in 2017, we would use 2,000 here and 2,000 for our prior_number_of_observations.
`prior_number_of_observations`	the number of total observations from previous calibrations used to weight the posterior distributions (if this is a new calibration this value takes the form of a prior weight (0 - 1)). This is used to weight the parameters when updating parameters when new data becomes available. Example if we have 2,000 observations in 2019 and had 1,000 observations in 2018 and 1,000 in 2017, we would use 2,000 here and 2,000 for our number_of_observations.
`prior_means`	A vector of the means of your parameters you are estimating in order from (reproductive_rate, natural_dispersal_distance, percent_natural_dispersal, anthropogenic_dispersal_distance, natural kappa, and anthropogenic kappa). This is used when updating a parameter set from a previous calibration using the iterative framework.
`prior_cov_matrix`	A covariance matrix from the previous years posterior parameter estimation ordered from (reproductive_rate, natural_dispersal_distance, percent_natural_dispersal, anthropogenic_dispersal_distance, natural kappa, and anthropogenic kappa). This is used when updating a parameter set from a previous calibration using the iterative framework.
`params_to_estimate`	A list of booleans specifying which parameters to estimate ordered from (reproductive_rate, natural_dispersal_distance, percent_natural_dispersal, anthropogenic_dispersal_distance, natural kappa, and anthropogenic kappa)
`number_of_generations`	the number of generations to use to decrease the uncertainty in the parameter estimation (too many and it will take a long time, too few and your parameter sets will be too wide). This is an ABC implementation naming convention but should be set to greater than 7 for robust calibrations. There is a trade off between computational time and model accuracy the larger this number gets. Usually 7 to 9 is the ideal range.
`generation_size`	how many accepted parameter sets should occur in each generation. For example if generation size is 1,000 then the simulation runs until 1,000 model runs are less than the threshold value. We recommend running at least 1,000 but the greater this number the more accurate the model parameters selected will be.
`pest_host_table`	The file path to a csv that has these columns in this order: host, susceptibility_mean, susceptibility_sd, mortality_rate, mortality_rate_mean, and mortality_time_lag as columns with each row being the species. Host species must be in the same order in the host_file_list, infected_file_list, pest_host_table rows, and competency_table columns. The host column is character string of the species name, and is only used for metadata and labeling output files. Susceptibility and mortality_rate values must be between 0 and 1.
`competency_table`	A csv with the hosts as the first n columns (n being the number of hosts) and the last column being the competency value. Each row is a set of Boolean for host presence and the competency value (between 0 and 1) for that combination of hosts in a cell. #'
`infected_file_list`	paths to raster files with initial infections and standard deviation for each host can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation).. Units for infections are based on data availability and the way the units used for your host file is created (e.g. percent area, # of hosts per cell, etc.).
`host_file_list`	paths to raster files with number of hosts and standard deviation on those estimates can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation). The units for this can be of many formats the two most common that we use are either percent area (0 to 100) or # of hosts in the cell. Usually depends on data available and estimation methods.
`total_populations_file`	path to raster file with number of total populations of all hosts and non-hosts. This depends on how your host data is set up. If host is percent area then this should be a raster with values that are 100 anywhere with host. If host file is # of hosts in a cell then this should be a raster with values that are the max of the host raster any where the # of hosts is greater than 0.
`temp`	boolean that allows the use of temperature coefficients to modify spread (TRUE or FALSE)
`temperature_coefficient_file`	path to raster file with temperature coefficient data for the timestep and and time period specified (e.g. if timestep = week and start_date = 2017_01_01 and end_date = 2019_12_31 this file would have 52 * 3 bands = 156 bands with data being weekly precipitation coefficients). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
`precip`	boolean that allows the use of precipitation coefficients to modify spread (TRUE or FALSE)
`precipitation_coefficient_file`	Raster file with precipitation coefficient data for the timestep and time period specified (e.g. if timestep = week and start_date = 2017_01_01 and end_date = 2019_12_31 this file would have 52 * 3 bands = 156 bands with data being weekly precipitation coefficients). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
`model_type`	What type of model most represents your system. Options are "SEI" (Susceptible - Exposed - Infected/Infested) or "SI" (Susceptible - Infected/Infested). Default value is "SI".
`latency_period`	How many times steps does it take to for exposed populations become infected/infested. This is an integer value and must be greater than 0 if model type is SEI.
`time_step`	How often should spread occur options: ('day', 'week', 'month').
`season_month_start`	When does spread first start occurring in the year for your pest or pathogen (integer value between 1 and 12)
`season_month_end`	When does spread end during the year for your pest or pathogen (integer value between 1 and 12)
`start_date`	Date to start the simulation with format ('YYYY_MM_DD')
`end_date`	Date to end the simulation with format ('YYYY_MM_DD')
`use_survival_rates`	Boolean to indicate if the model will use survival rates to limit the survival or emergence of overwintering generations.
`survival_rate_month`	What month do over wintering generations emerge. We suggest using the month before for this parameter as it is when the survival rates raster will be applied.
`survival_rate_day`	What day should the survival rates be applied
`survival_rates_file`	Raster file with survival rates from 0 to 1 representing the percentage of emergence for a cell.
`use_lethal_temperature`	A boolean to answer the question: does your pest or pathogen have a temperature at which it cannot survive? (TRUE or FALSE)
`temperature_file`	Path to raster file with temperature data for minimum temperature
`lethal_temperature`	The temperature in degrees C at which lethal temperature related mortality occurs for your pest or pathogen (-50 to 60)
`lethal_temperature_month`	The month in which lethal temperature related mortality occurs for your pest or pathogen integer value between 1 and 12
`mortality_frequency`	Sets the frequency of mortality calculations occur either ('year', 'month', week', 'day', 'time step', or 'every_n_steps')
`mortality_frequency_n`	Sets number of units from mortality_frequency in which to run the mortality calculation if mortality_frequency is 'every_n_steps'. Must be an integer >= 1.
`management`	Boolean to allow use of management (TRUE or FALSE)
`treatment_dates`	Dates in which to apply treatment list with format ('YYYY_MM_DD') (needs to be the same length as treatment_file and pesticide_duration)
`treatments_file`	Path to raster files with treatment data by dates. Needs to be a list of files the same length as treatment_dates and pesticide_duration.
`treatment_method`	What method to use when applying treatment one of ("ratio" or "all infected"). ratio removes a portion of all infected and susceptibles, all infected removes all infected a portion of susceptibles.
`natural_kernel_type`	What type of dispersal kernel should be used for natural dispersal. Current dispersal kernel options are ('Cauchy', 'exponential', 'uniform', 'deterministic neighbor','power law', 'hyperbolic secant', 'gamma', 'weibull', 'logistic')
`anthropogenic_kernel_type`	What type of dispersal kernel should be used for anthropogenic dispersal. Current dispersal kernel options are ('cauchy', 'exponential', 'uniform', 'deterministic neighbor','power law', 'hyperbolic secant', 'gamma', 'weibull', 'logistic', 'network')
`natural_dir`	Sets the predominate direction of natural dispersal usually due to wind values ('N', 'NW', 'W', 'SW', 'S', 'SE', 'E', 'NE', 'NONE')
`natural_kappa`	sets the strength of the natural direction in the von-mises distribution numeric value between 0.01 and 12
`anthropogenic_dir`	Sets the predominate direction of anthropogenic dispersal usually due to human movement typically over long distances (e.g. nursery trade, movement of firewood, etc..) ('N', 'NW', 'W', 'SW', 'S', 'SE', 'E', 'NE', 'NONE')
`anthropogenic_kappa`	sets the strength of the anthropogenic direction in the von-mises distribution numeric value between 0.01 and 12
`pesticide_duration`	How long does the pesticide (herbicide, vaccine, etc..) last before the host is susceptible again. If value is 0 treatment is a culling (i.e. host removal) not a pesticide treatment. (needs to be the same length as treatment_dates and treatment_file)
`pesticide_efficacy`	How effective is the pesticide at preventing the disease or killing the pest (if this is 0.70 then when applied it successfully treats 70 percent of the plants or animals).
`mask`	Raster file used to provide a mask to remove 0's that are not true negatives from comparisons (e.g. mask out lakes and oceans from statics if modeling terrestrial species). A numerical value represents the area you want to calculate statistics on and an NA value represents the area to remove from the statistics.
`output_frequency`	Sets when outputs occur either ('year', 'month', week', 'day', 'time step', or 'every_n_steps')
`output_frequency_n`	Sets number of units from output_frequency in which to export model results if mortality_frequency is 'every_n_steps'. Must be an integer >= 1.
`movements_file`	This is a csv file with columns lon_from, lat_from, lon_to, lat_to, number of animals, and date.
`use_movements`	This is a boolean to turn on use of the movement module.
`start_exposed`	Do your initial conditions start as exposed or infected (only used if model_type is "SEI"). Default False. If this is TRUE need to have both infected_files (this can be a raster of all 0's) and exposed_files
`generate_stochasticity`	Boolean to indicate whether to use stochasticity in reproductive functions default is TRUE
`establishment_stochasticity`	Boolean to indicate whether to use stochasticity in establishment functions default is TRUE
`movement_stochasticity`	Boolean to indicate whether to use stochasticity in movement functions default is TRUE
`dispersal_stochasticity`	Boolean to indicate whether to use a stochasticity in the dispersal kernel default is TRUE
`establishment_probability`	Threshold to determine establishment if establishment_stochasticity is FALSE (range 0 to 1, default = 0.5)
`dispersal_percentage`	Percentage of dispersal used to calculate the bounding box for deterministic dispersal
`quarantine_areas_file`	Path to raster file with quarantine boundaries used in calculating likelihood of quarantine escape if use_quarantine is TRUE
`use_quarantine`	Boolean to indicate whether or not there is a quarantine area if TRUE must pass in a raster file indicating the quarantine areas (default = FALSE)
`use_spreadrates`	Boolean to indicate whether or not to calculate spread rates
`use_overpopulation_movements`	Boolean to indicate whether to use the overpopulation pest movement module (driven by the natural kernel with its scale parameter modified by a coefficient)
`overpopulation_percentage`	Percentage of occupied hosts when the cell is considered to be overpopulated
`leaving_percentage`	Percentage of pests leaving an overpopulated cell
`leaving_scale_coefficient`	Coefficient to multiply scale parameter of the natural kernel (if applicable)
`calibration_method`	choose which method of calibration to use either 'ABC' (Approximate Bayesian Computation) or 'MCMC' (Markov Chain Monte Carlo Approximation)
`number_of_iterations`	how many iterations do you want to run to allow the calibration to converge (recommend a minimum of at least 100,000 but preferably 1 million).
`exposed_file_list`	paths to raster files with initial exposeds and standard deviation for each host can be based in 2 formats (a single file with number of hosts or a single file with 2 layers number of hosts and standard deviation).. Units for infections are based on data availability and the way the units used for your host file is created (e.g. percent area, # of hosts per cell, etc.).
`verbose`	Boolean with true printing current status of calibration, (e.g. the current generation, current particle, and the acceptance rate). Defaults if FALSE.
`write_outputs`	Either c("summary_outputs", or "None"). If not "None" output folder path must be provided.
`output_folder_path`	this is the full path with either / or \ (e.g., "C:/user_name/desktop/pops_sod_2020_2023/outputs/")
`network_filename`	The entire file path for the network file. Used if anthropogenic_kernel_type = 'network'.
`network_movement`	What movement type do you want to use in the network kernel either "walk", "jump", or "teleport". "walk" allows dispersing units to leave the network at any cell along the edge. "jump" automatically moves to the nearest node when moving through the network. "teleport" moves from node to node most likely used for airport and seaport networks.
`success_metric`	Choose the success metric that is most relevant to your system or data for comparing simulations vs. observations. Must be one of "quantity", "allocation", "configuration", "quantity and allocation","quantity and configuration", "allocation and configuration", "quantity, allocation, and configuration", "accuracy", "precision", "recall", "specificity", "accuracy and precision", "accuracy and specificity", "accuracy and recall", "precision and recall", "precision and specificity", "recall and specificity", "accuracy, precision, and recall", "accuracy, precision, and specificity", "accuracy, recall, and specificity", "precision, recall, and specificity", "accuracy, precision, recall, and specificity", "rmse", "distance", "mcc", "mcc and quantity", "mcc and distance", "rmse and distance", "mcc and configuration", "mcc and RMSE", "mcc, quantity, and configuration"). Default is "mcc"
`use_initial_condition_uncertainty`	Boolean to indicate whether or not to propagate and partition uncertainty from initial conditions. If TRUE the infected_files needs to have 2 layers one with the mean value and one with the standard deviation. If an SEI model is used the exposed_file needs to have 2 layers one with the mean value and one with the standard deviation
`use_host_uncertainty`	Boolean to indicate whether or not to propagate and partition uncertainty from host data. If TRUE the host_file needs to have 2 layers one with the mean value and one with the standard deviation.
`weather_type`	string indicating how the weather data is passed in either as a mean and standard deviation to represent uncertainty ("probabilistic") or as a time series ("deterministic")
`temperature_coefficient_sd_file`	Raster file with temperature coefficient standard deviation data for the timestep and time period specified (e.g. if timestep = week this file would have 52 bands with data being weekly temperature coefficient standard deviations). We convert raw temperature values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
`precipitation_coefficient_sd_file`	Raster file with precipitation coefficient standard deviation data for the timestep and time period specified (e.g. if timestep = week this file would have 52 bands with data being weekly precipitation coefficient standard deviations). We convert raw precipitation values to coefficients that affect the reproduction and survival of the pest all values in the raster are between 0 and 1.
`dispersers_to_soils_percentage`	Range from 0 to 1 representing the percentage of dispersers that fall to the soil and survive.
`quarantine_directions`	String with comma separated directions to include in the quarantine direction analysis, e.g., 'N,E'. By default all directions (N, S, E, W) are considered
`multiple_random_seeds`	Boolean to indicate if the model should use multiple random seeds (allows for performing uncertainty partitioning) or a single random seed (backwards compatibility option). Default is FALSE.
`file_random_seeds`	A file path to the file with the .csv file containing random_seeds table. Use if you are trying to recreate an exact analysis otherwise we suggest leaving the default. Default is Null which draws the seed numbers for each.
`use_soils`	Boolean to indicate if pests establish in the soil and spread out from there. Typically used for soil borne pathogens.
`soil_starting_pest_file`	path to the raster file with the starting amount of pest or pathogen.
`start_with_soil_populations`	Boolean to indicate whether to use a starting soil pest or pathogen population if TRUE then soil_starting_pest_file is required.
`county_level_infection_data`	Boolean to indicate if infection data is at the county level. If TRUE then the infected_file should be a polygon raster with county level infection/infestation counts.