fcs2ModelSelection: Automated Model Selection
In aquaMetrics/fcs2: Fisheries Classification Scheme 2 For SNIFFER

Description Usage Arguments Value Note See Also

Automatically selects an optimal set of covariate terms for the abundance and prevalence regression equations of the FCS2 model. Terms are attempted sequentially and the approximate abundance and prevalence INLA fits are used to test the significance of each term.

fcs2ModelSelection(
  runTotalVars = NULL,
  allRunsTotalVar = NULL,
  allRunsRangeVars = NULL,
  dataFrame,
  surveyAreaVar = "SurveyArea",
  nRunsVar = NULL,
  muVars,
  muVarType,
  rhoVars = muVars,
  rhoVarType = muVarType,
  rhoFormula,
  subset = 1:nrow(dataFrame),
  tolerance = 0.01,
  maxNoOrders = 3,
  nSweeps = 1,
  prior.parameters = list(),
  estAllRunsTotalVar = NULL,
  verbose = FALSE
)

`runTotalVars`	a character vector of columns in `dataFrame` that give the number of fish caught in each run of a multiple-run survey. These should be in order from the first run to the last.
`allRunsTotalVar`	the name of a column in `dataFrame` that gives the total number of fish caught over all runs of a multiple-run survey.
`allRunsRangeVars`	the names of two columns in `dataFrame` that give the minimum and the maximum value of a range of possible values for the total number of fish caught over all runs in a multiple-run survey.
`dataFrame`	a data frame with surveys as rows and variables as columns. It should contain all variables specified by other arguments.
`surveyAreaVar`	the name of a column in `dataFrame` that gives the survey area. If not specified, the function will search for the default value `"SurveyArea"`.
`nRunsVar`	the name of a column in `dataFrame` that gives the number of runs in each survey. If missing, the number of runs is assumed to be the number of non-missing run total entries, unless `runTotalVars` has length 1 or is missing in which case a single-run model is used. Number of runs values greater than the number of run total columns are clipped with a warning.
`muVars`	a character vector naming variables to use for terms in the abundance regression equation. Variables are attempted one-by-one in the order given.
`muVarType`	a character vector of the same length as `muVars` indicating the type of abundance term to attempt for each variable. Each element should be one of `"asis"`, `"factor"`, `"linear"`, `"continuous"` or `"spatial"`. See ‘Details’ for a description of each.
`rhoVars`	a character vector naming variables to use for terms in the prevalence regression equation. Variables are attempted one-by-one in the order given. Defaults to `muVars` to use the same variables as specified for abundance.
`rhoVarType`	a character vector of the same length as `muVars` indicating the type of prevalence term to attempt for each variable. Each element should be one of `"asis"`, `"factor"`, `"linear"`, `"continuous"` or `"spatial"`. See ‘Details’ for a description of each. Defaults to `muVarType` to use the same variables as specified for abundance.
`rhoFormula`	an optional `formula` specifying which terms should appear in the prevalence regression equation when selecting terms for abundance. If specified, the prevalence model selection is skipped.
`subset`	an optional vector specifying a subset of surveys to be used in the fitting process.
`tolerance`	the threshold for each term's significance probability, below which a term is considered to be significant and is retained. The default value is `0.01` but a smaller value will cause fewer terms to be retained and vice-versa. See `summary.fcs2Fit` for a definition of the significance probability for each variable. The probabilities corresponding to each variable within a `rw2` or `spatial` term are combined (by rescaling the minimum under the assumption of independence) before comparison with `tolerance`.
`maxNoOrders`	the maximum number of polynomial orders to try for each linear component. Defaults to 3.
`nSweeps`	the number of sweeps to make through each list of potential variables. The default is `2` so that a second test is made to each term in case the presence of later variables will make an earlier term significant.
`prior.parameters`	an optional named list of named vectors giving the parameter values to use for the prior distribution of a variable. See `fcs2Priors` for further details on the prior distributions and how to specify prior parameters. Priors that are not given parameter values have defaults given to them by `.fcs2SetDefaultPriors`. Note that priors for linear variables are ignored by INLA.
`estAllRunsTotalVar`	the name of a column in `dataFrame` that gives an estimate of the total number of fish caught over all runs for surveys where only range data is available. This is used in the approximate abundance INLA fit only and is not used in the full model as fitted with BUGS. If this is not provided and range data are present, the central value of each range is used with a warning.
`verbose`	whether to print progress to screen.

a list of two matricies, each containing a summary of the terms attempted in each iteration. The first matrix gives the term history for the prevalence (rho) and the second gives the history for the abundance (mu).

Each row of a matrix summarises the regression formula used for that component, with covariates appearing as columns. Linear terms are represented as a number giving the order of the term, random walk terms are given by "rw2" and spatial terms by "spatial". fcs2:::termSummary2Formula can be used to convert a row to an formula for simplier input into fcs2FitModel.

Since the prevalence terms are selected first and then the abundance terms second, it is possible that some prevalence terms are no longer significant in the final fit as these were not attempted with the selected abundance formula.

fcs2FitModel, fcs2:::termSummary2Formula

aquaMetrics/fcs2 documentation built on Aug. 21, 2021, 12:55 p.m.