fcs2ModelSelection: Automated Model Selection

Description Usage Arguments Value Note See Also

View source: R/fcs2ModelSelection.R

Description

Automatically selects an optimal set of covariate terms for the abundance and prevalence regression equations of the FCS2 model. Terms are attempted sequentially and the approximate abundance and prevalence INLA fits are used to test the significance of each term.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
fcs2ModelSelection(
  runTotalVars = NULL,
  allRunsTotalVar = NULL,
  allRunsRangeVars = NULL,
  dataFrame,
  surveyAreaVar = "SurveyArea",
  nRunsVar = NULL,
  muVars,
  muVarType,
  rhoVars = muVars,
  rhoVarType = muVarType,
  rhoFormula,
  subset = 1:nrow(dataFrame),
  tolerance = 0.01,
  maxNoOrders = 3,
  nSweeps = 1,
  prior.parameters = list(),
  estAllRunsTotalVar = NULL,
  verbose = FALSE
)

Arguments

runTotalVars

a character vector of columns in dataFrame that give the number of fish caught in each run of a multiple-run survey. These should be in order from the first run to the last.

allRunsTotalVar

the name of a column in dataFrame that gives the total number of fish caught over all runs of a multiple-run survey.

allRunsRangeVars

the names of two columns in dataFrame that give the minimum and the maximum value of a range of possible values for the total number of fish caught over all runs in a multiple-run survey.

dataFrame

a data frame with surveys as rows and variables as columns. It should contain all variables specified by other arguments.

surveyAreaVar

the name of a column in dataFrame that gives the survey area. If not specified, the function will search for the default value "SurveyArea".

nRunsVar

the name of a column in dataFrame that gives the number of runs in each survey. If missing, the number of runs is assumed to be the number of non-missing run total entries, unless runTotalVars has length 1 or is missing in which case a single-run model is used. Number of runs values greater than the number of run total columns are clipped with a warning.

muVars

a character vector naming variables to use for terms in the abundance regression equation. Variables are attempted one-by-one in the order given.

muVarType

a character vector of the same length as muVars indicating the type of abundance term to attempt for each variable. Each element should be one of "asis", "factor", "linear", "continuous" or "spatial". See ‘Details’ for a description of each.

rhoVars

a character vector naming variables to use for terms in the prevalence regression equation. Variables are attempted one-by-one in the order given. Defaults to muVars to use the same variables as specified for abundance.

rhoVarType

a character vector of the same length as muVars indicating the type of prevalence term to attempt for each variable. Each element should be one of "asis", "factor", "linear", "continuous" or "spatial". See ‘Details’ for a description of each. Defaults to muVarType to use the same variables as specified for abundance.

rhoFormula

an optional formula specifying which terms should appear in the prevalence regression equation when selecting terms for abundance. If specified, the prevalence model selection is skipped.

subset

an optional vector specifying a subset of surveys to be used in the fitting process.

tolerance

the threshold for each term's significance probability, below which a term is considered to be significant and is retained. The default value is 0.01 but a smaller value will cause fewer terms to be retained and vice-versa.
See summary.fcs2Fit for a definition of the significance probability for each variable. The probabilities corresponding to each variable within a rw2 or spatial term are combined (by rescaling the minimum under the assumption of independence) before comparison with tolerance.

maxNoOrders

the maximum number of polynomial orders to try for each linear component. Defaults to 3.

nSweeps

the number of sweeps to make through each list of potential variables. The default is 2 so that a second test is made to each term in case the presence of later variables will make an earlier term significant.

prior.parameters

an optional named list of named vectors giving the parameter values to use for the prior distribution of a variable. See fcs2Priors for further details on the prior distributions and how to specify prior parameters. Priors that are not given parameter values have defaults given to them by .fcs2SetDefaultPriors. Note that priors for linear variables are ignored by INLA.

estAllRunsTotalVar

the name of a column in dataFrame that gives an estimate of the total number of fish caught over all runs for surveys where only range data is available. This is used in the approximate abundance INLA fit only and is not used in the full model as fitted with BUGS. If this is not provided and range data are present, the central value of each range is used with a warning.

verbose

whether to print progress to screen.

Value

a list of two matricies, each containing a summary of the terms attempted in each iteration. The first matrix gives the term history for the prevalence (rho) and the second gives the history for the abundance (mu).

Each row of a matrix summarises the regression formula used for that component, with covariates appearing as columns. Linear terms are represented as a number giving the order of the term, random walk terms are given by "rw2" and spatial terms by "spatial". fcs2:::termSummary2Formula can be used to convert a row to an formula for simplier input into fcs2FitModel.

Note

Since the prevalence terms are selected first and then the abundance terms second, it is possible that some prevalence terms are no longer significant in the final fit as these were not attempted with the selected abundance formula.

See Also

fcs2FitModel, fcs2:::termSummary2Formula


aquaMetrics/fcs2 documentation built on Aug. 21, 2021, 12:55 p.m.