Bootstrap: Bootstrap the baseline model

View source: R/resample.R

BootstrapR Documentation

Bootstrap the baseline model

Description

Run a subset of the baseline model a number of times after resampling e.g. Hauls in each Stratum, EDUSs in each Stratum.

Usage

Bootstrap(
  outputData,
  projectPath,
  BootstrapMethodTable = data.table::data.table(),
  NumberOfBootstraps = 1L,
  OutputProcesses = character(),
  UseOutputData = FALSE,
  NumberOfCores = 1L,
  BaselineSeedTable = data.table::data.table()
)

Arguments

outputData

The output of the function from an earlier run.

projectPath

The path to the project to containing the baseline to bootstrap.

BootstrapMethodTable

A table of the columns ProcessName, ResampleFunction and Seed, where each row defines the resample function to apply to the output of the given process, and the seed to use in the resampling. The seed is used to draw one seed per bootstrap run using getSeedVector. Run RstoxFramework::getResampleFunctions() to get a list of the implemented resample functions. Note that if a process is selected inn BootstrapMethodTable that is not used in the model up to the OutputProcesses, the bootstrapping of that process will not be effective on the end result (e.g. select the correct process that returns BioticAssignment data type).

NumberOfBootstraps

Integer: The number of bootstrap replicates.

OutputProcesses

A vector of the processes to save from each bootstrap replicate.

UseOutputData

Logical: Bootstrapping can be time consuming, and by setting UseOutputData to TRUE the output file generated by a previous run of the process will be used instead of re-running the bootstrapping. Use this parameter with caution. Any changes made to the Baseline model or to the parameters of the Bootstrap itself will not be accounted for unless UseOutputData = FALSE. The option UseOutputData = TRUE is intended only for saving time when one needs to generate a report from an existing Bootstrap run."

NumberOfCores

The number of cores to use for parallel processing. A copy of the project is created in tempdir() for each core, also when using only one core. Note that this will require disc space equivalent to the NumberOfCores time the size of the project folder (excluding the output/analysis/Bootstrap folder, which will be deleted before copies are made).

BaselineSeedTable

A table of ProcessName and Seed, giving the seed to use for the Baseline processes that requires a Seed parameter. The seed is used to draw one seed per bootstrap run using getSeedVector.

Details

A copy of the project is made for each core given by NumberOfCores. In the case that NumberOfCores == 1, this is still done for safety. Note that for acoustic-trawl survey estimates, if the AcousticPSUs of a Stratum have different assigned Hauls (not using the Stratum assignment method in DefineBioticAssignment), there is a probability that none the assigned Hauls of an AcousticPSU are re-sampled in a bootstrap replicate. This will lead to missing acoustic density for that PSU for the target species, which will propagate throughout to the reports. This forces the use of RemoveMissingValues = TRUE, which implies some degree of under-estimation from what the estimate would be if none of the AcousticPSUs came out with missing acoustic density.

Note on limitatiton on NumberOfBootstraps: All output requested data from all the bootstrap runs are accumulated in R memory, and written to one RData file at the end of the function, which effectively imposes a restriction of some hundred bootstrap runs for large StoX projects. Use Bootstrap instead. Backwards compatibility sets the function Bootstrap to Bootstrap_3.6.0 for StoX projects saved in StoX 3.6.0 and older.

Value

A BootstrapData object, which is a list of the RstoxData DataTypes and RstoxBase DataTypes.


StoXProject/RstoxFramework documentation built on Oct. 17, 2023, 1:24 p.m.