ReportBootstrap: Report Bootstrap

View source: R/resample.R

ReportBootstrapR Documentation

Report Bootstrap

Description

Reports the sum, mean or other statistics on a variable of the BootstrapData.

Usage

ReportBootstrap(
  BootstrapData,
  BaselineProcess = character(),
  TargetVariable = character(),
  TargetVariableUnit = character(),
  AggregationFunction = RstoxBase::getReportFunctions(getMultiple = FALSE),
  BootstrapReportFunction = RstoxBase::getReportFunctions(getMultiple = TRUE),
  Percentages = double(),
  GroupingVariables = character(),
  InformationVariables = character(),
  Filter = character(),
  RemoveMissingValues = FALSE,
  AggregationWeightingVariable = character(),
  BootstrapReportWeightingVariable = character()
)

Arguments

BootstrapData

The BootstrapData data output from Bootstrap.

BaselineProcess

A strings naming the baseline process to report from the BootstrapData. If a process with

TargetVariable

The variable to report.

TargetVariableUnit

The unit to use for the TargetVariable.

AggregationFunction

The function to apply to each bootstrap run. This must be a function returning a single value.

BootstrapReportFunction

The function to apply across bootstrap run, such as "cv" or "c".

Percentages

The percentages to report Percentiles for when BootstrapReportFunction = "summaryStox".

GroupingVariables

The variables to report by. For most applications GroupingVariables should include "Survey" and "SpeciesCategory", unless the user needs to sum over all Survey or SpeciesCategory.

InformationVariables

Variables to include as columns to the end of the report table. These cannot have more unique combinations than the GroupingVariables.

Filter

A string with an R expression to filter out unwanted rows of the report, e.g. "IndividualAge %notin% NA" or "Survey %notin% NA & SpeciesCategory %notin% NA".

RemoveMissingValues

Logical: If TRUE, remove missing values (NAs) from the TargetVariable. The default (FALSE) implies to report NA if at least one of the values used in the ReportFunction is NA. Use RemoveMissingValues = TRUE with extreme caution, as it may lead to under-estimation. E.g., if RemoveMissingValues = TRUE and a super-individual lacks IndividualRoundWeight, Biomass will be NA, and the portion of Abundance distributed to that super-individual will be excluded when summing Biomass (but included when summing Abundance). It is advised to always run with RemoveMissingValues = FALSE first, and make a thorough investigation to identify the source of any missing values. The function link{ImputeSuperIndividuals} can be used to impute the missing information from other super-individuals.

AggregationWeightingVariable

The variable to weight by in the AggregationFunction.

BootstrapReportWeightingVariable

The variable to weight by in the BootstrapReportFunction.

Details

This function works in two steps. First, the AggregationFunction is applied to the TargetVariable of the table given by BaselineProcess for each unique combination of the GroupingVariables and for each bootstrap run. Second, a grid of all possible combinations of the GroupingVariables is formed and the result from the first step placed onto the grid. This creates 0 for each position in the grid where data from the first step are not present. E.g., if a particularly large fish is found in only one haul, and this haul by random is not selected in a bootstrap run, the TargetVariable will be 0 to reflect the variability in the data. To complete the second step, the BootstrapReportFunction is applied over the bootstrap runs for each cell in the grid.

The parameter RemoveMissingValues should be used with extreme caution. The effect of setting RemoveMissingValues to TRUE is that missing values (NAs) are removed in both the first and second step. This can be dangerous both in the first and in the second step. E.g., if the Abundance of SuperIndividualsData is positive for super-individuals with missing IndividualWeight, then the Biomass of those super-individuals will be missing as well. If one the wants to sum the Biomass by using AggregationFunction = "sum" one will get NA if RemoveMissingValues = FALSE. If RemoveMissingValues = TRUE one will ignore the missing Biomass, and the summed Biomass will only include the super-individuals that have non-missing IndividualWeight, effectively discarding a portion of the observed abundance. The summed Biomass will in this case be underestimated!

In the second step, setting RemoveMissingValues to TRUE can be even more dangerous, as the only option currently available for the BootstrapReportFunction is the function RstoxBase::summaryStox(), which includes average and standard deivation which are highly influenced by removing missing data.

Instead of setting RemoveMissingValues to TRUE, it is advised to apply the function ImputeSuperIndividuals to fill in e.g. IndividualWeight where missing. Missing values in the output from ReportBootstrap can also be avoided by adding variables to GroupingVariables, such as adding "Stratum" e.g. if there are strata that are known from Baseline to contain no fish. These strata will then be present but with missing values, but these missing values will not affect other strata if "Stratum" is included in GroupingVariables. It is also recommended to include "Survey" and "SpeciesCategory" in the GroupingVariables, as these are key variables for which summary statistics should rarely be computed across.

Value

A ReportBootstrapData object.


StoXProject/RstoxFramework documentation built on Oct. 17, 2023, 1:24 p.m.