fcs2FitModel: Fit the FCS2 Statistical Model
In aquaMetrics/fcs2: Fisheries Classification Scheme 2 For SNIFFER

Description Usage Arguments Details Value FCS2 model Note See Also Examples

Fits the both the original EA FCS2 statistical model and also SNIFFER's multiple-pass extension that allows multiple runs per survey.
The model allows observations to be given either as fish counts in each run of the survey, the total count over all runs or as an upper and lower bound for the all runs total. The fish counts depend upon abundance μ and prevalence ρ components that are each modelled by a covariate regression of linear, non-linear and spatial terms.
The Bayesian model is fitted by sampling a set of potential parameter values from the posterior distribution using MCMC via WinBUGS or OpenBUGS. However, this can take much time so an approximate model fit can be quickly obtained using INLA.

fcs2FitModel(
  runTotalVars = NULL,
  allRunsTotalVar = NULL,
  allRunsRangeVars = NULL,
  muFormula = ~1,
  rhoFormula = ~1,
  dataFrame,
  surveyAreaVar = "SurveyArea",
  nRunsVar = NULL,
  subset = 1:nrow(dataFrame),
  na.action,
  prior.parameters = list(),
  initial.values = list(),
  runINLA,
  runBUGS = FALSE,
  estAllRunsTotalVar = NULL,
  n.chains = 2,
  n.iter = 2000,
  n.burnin = floor(n.iter/2),
  n.thin = max(1, floor(n.chains * (n.iter - n.burnin)/n.sims)),
  n.sims = 1000,
  bugsFilename = "model.bug",
  bugsProgram = "OpenBUGS",
  verbose = TRUE,
  fit,
  ...
)

`runTotalVars`	a character vector of columns in `dataFrame` that give the number of fish caught in each run of a multiple-run survey. These should be in order from the first run to the last.
`allRunsTotalVar`	the name of a column in `dataFrame` that gives the total number of fish caught over all runs of a multiple-run survey.
`allRunsRangeVars`	the names of two columns in `dataFrame` that give the minimum and the maximum value of a range of possible values for the total number of fish caught over all runs in a multiple-run survey.
`muFormula`	a `formula` specifying which terms should appear in the abundance regression equation. This should take the form `~ term1 + term2 +` ... where each term is either a standard linear term, a non-linear first-order random walk term (given by `rw1`), a second-order random walk term (given by `rw2`) or a spatial term (given by `spatial`). A constant term is always present.
`rhoFormula`	a `formula` specifying which terms should appear in the prevalence regression equation. As with `muFormula`, the regression can contain linear terms but also non-linear terms and a single spatial term.
`dataFrame`	a data frame with surveys as rows and variables as columns. It should contain all variables specified through other arguments.
`surveyAreaVar`	the name of a column in `dataFrame` that gives the survey area. If not specified, the function will search for the default value `"SurveyArea"`.
`nRunsVar`	the name of a column in `dataFrame` that gives the number of runs in each survey. If missing, the number of runs is assumed to be the number of non-missing run total entries, unless `runTotalVars` has length 1 or is missing in which case a single-run model is used. Number of runs values greater than the number of run total columns are clipped with a warning.
`subset`	an optional vector specifying a subset of surveys to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain missing values (`NA`s). The default is set by the `na.action` setting of `options` and this is usually set to `na.omit`. This setting removes surveys that contain missing data in any required variables. A vector indicating the rows that were removed can be extracted from the returned object using `na.action.fcs2Fit`. Alternatively, `na.pass` can be used to ignore missing values (where possible) or `na.fail` can be given to signal an error if missing values are found.
`prior.parameters`	an optional named list of named vectors giving the parameter values to use for the prior distribution of a variable. See `fcs2Priors` for further details on the prior distributions and how to specify prior parameters. Priors that are not given parameter values have defaults given to them by `.fcs2SetDefaultPriors`. Note that priors for linear variables are ignored by INLA but all priors are used by BUGS to fit the full model.
`initial.values`	an optional named list of initial values to use for variables when fitting the full model using WinBUGS or OpenBUGS. Initial values that are not specified are estimated using the median estimate from INLA, unless `runINLA` is `FALSE` in which case BUGS will simulate an initial value from the prior.
`runINLA`	whether to run INLA to fit approximate abundance and prevalence models. By default, INLA is run for both models only if necessary for providing initial values for all variables. `runINLA` can be `"rho"` or `"mu"` to run only the prevalence or abundance models respectively. However, `"mu"` can only be given if the INLA fit for prevalence has already been run and is provided through `fit`. See `fcs2RunINLA` for details.
`runBUGS`	whether to run WinBUGS or OpenBUGS to fit the full FCS2 model by sampling from the posterior distribution of the parameters. Note that BUGS can often take weeks to run with a large data set, many regression terms, and a large number of iterations `n.iter`. By default, `runBUGS` is `FALSE` and the model is simply approximated using INLA. However, posterior samples from BUGS are required for calculating EQRs using `fcs2SingleEQR`, `fcs2JointEQR` or `fcs2JointAndSingleEQR`.
`estAllRunsTotalVar`	the name of a column in `dataFrame` that gives an estimate of the total number of fish caught over all runs for surveys where only range data is available. This is used in the approximate abundance INLA fit only and is not used in the full model as fitted with BUGS. If this is not provided and range data are present, the central value of each range is used with a warning.
`n.chains`	the number of MCMC chains to run using BUGS. The default is `2` and at least 2 chains are required if convergence measures are required.
`n.iter`	the total number of iterations per chain (including burn-in). The default is `2000` but it is likely that considerably more will be required if a few regression terms are included.
`n.burnin`	the length of burn-in, i.e. number of iterations to discard at the beginning. The default is `n.iter / 2`, that is to discarding the first half of the simulations.
`n.thin`	the thinning rate. Must be a positive integer. Set `n.thin > 1` to save memory and computation time if `n.iter` is large. The default is `max(1, floor(n.chains * (n.iter - n.burnin) / 1000))` which will only thin if there are at least `2000` simulations.
`n.sims`	the approximate number of simulations to keep after thinning.
`bugsFilename`	the name to use for the BUGS model file. The default is `"model.bug"`.
`bugsProgram`	the BUGS program, either `winbugs`, `WinBUGS`, `openbugs` or `OpenBUGS`. If using OpenBUGS, the package BRugs must be installed.
`verbose`	whether to print progress to screen.
`fit`	an `"fcs2Fit"` object, as returned by this function, can alternatively be provided instead of the arguments that specify the model and data. This allows the object returned after setting `runINLA` or `runBUGS` to `FALSE` to be extended, for example to view approximate INLA fits before running BUGS.
`...`	further arguments that are passed to `openbugs`.

The full model (described below) is fitted by drawing Monte Carlo samples from the posterior distribution of the unknown paramters. These represent equally plausable values of the parameters. Properties of the fitted model can be estimated by averaging with these values.

The values are sampled using Markov Chain Monte Carlo (MCMC) via WinBUGS or OpenBUGS. This can take much time to run and convergence of the markov chains should always be verified using plotBUGSTrace before the posterior samples can be trusted.

Approximate parameter estimates can first be produced using Integrated Nested Laplace Approximations (INLA). This method produces estimates in seconds but cannot be used to fit the full FCS2 model. Instead INLA fits two approximate models, one for the prevalence regression and one for the abundance. The INLA fits may be used to predict the significance of regression terms in the model. See fcs2RunINLA for further details of these models.

Returns an "fcs2Fit" object that contains the model specification, the required data and any INLA and BUGS model fits. The "fcs2Fit" object is essentially a list with the following items:

call

the matched call.

modelMatrix

a matrix of variables required by the model, extracted from dataFrame.

muLinearVars

a character vector of linear terms in the abundance regression, each of which appears as a column in modelMatrix

muRW1Vars, muRW2Vars

a character vector of variables with first and second-order random walk terms in the abundance regression.

muSpatialVar

the name of the abundance spatial variable, if present.

muAdjacency

the adjacency information for the abundance spatial term, if present.

rhoLinearVars, rhoRW1Vars, rhoRW2Vars,
rhoSpatialVar, rhoAdjacency

as their mu counterparts but applying to the prevalence regression rather than abundance

dataType

a factor with levels run, total and range indicating whether each survey contains the number of fish caught in each run, the total number over all runs, or a minimum and maximum describing a range of possible values of the total catch.

rwNoLevels

a named list giving the number of discrete levels to use to represent each random walk term.

rwBoundaries

a named list giving the locations of the discrete points that are used to represent each non-linear covariate term.

covariateMin, covariateMax

the minimum and maximum value for each covariate in the model.

N

the number of surveys.

multiRun

whether the multiple-run extension to the FCS2 model is used. This is necessary if more than one run total variable is specified by runTotalVars, or if the number of runs variable nRunsVar exceeds 1.

inlaFits

if INLA has been run, a list with two components:

`rhoFit`	an `"inla"` object containing the approximate prevalence model fit
`muFit`	an `"inla"` object containing the approximate abundance model fit

See inla from the package INLA for a description of "inla" objects.

bugsFit

if BUGS has been run, a "bugs" object containing the full FCS2 model fit. See bugs from the package R2WinBUGS for a description of this object.

In addition, the following arguments are stored: runTotalVars, allRunsTotalVar, allRunsRangeVars, muFormula, rhoFormula, surveyAreaVar and nRunsVar. Also, prior.parameters and initial.values are corrected and completed.

The FCS2 model is a Bayesian hierarchical model of fish counts over a number of surveys. The original EA FCS2 model assumes a single fish count C per survey that is modelled by a zero-inflated Negative Binomial (ZINB) distribution (see rzinbinom). Specifically,

C ~ ZINB(r, m=a μ, z=1 - ρ)

where r is a shape parameter, a is the survey area, μ is the abundance (mean density) and ρ is the prevalence (probability present).

The SNIFFER multiple-pass extension to the FCS2 model instead assumes that joint fish count C_1, …, C_d over d runs follows a zero-inflated Negative Multinomial (ZINM) distribution (see fcs2:::zinmultinom). Specifically,

(C_1, …, C_d) ~ ZINM(r, m_j = a μ q (1 - q)^(j - 1), z=1 - ρ)

where the additional parameter q represents the catch probability.

The remaining components of the model are common to both the original and multiple-pass FCS2. The abundance and prevalence components are both modelled using regressions in terms of other covariates x_i,j. Specifically, for survey i,

log(μ_i) = β_0 + … + f_j(x_i,j; β_j) + …

log(μ_i) = γ_0 + … + g_j(x_i,j; γ_j) + …

where f_j and g_j are regression terms that depend upon parameters β_j and γ_j respectively. The functions log and logit are used to transform the components to the real line. The regression equations are specified by setting muFormula and rhoFormula using symbolic formulae.

The terms can be either linear, non-linear or spatial. Linear terms such as β_1 x + β_2 x^2 are specified as standard formula. Non-linear relationships may be constructed using random walk terms using rw1 for first-order or rw2 for second-order random walks. A single spatial term may be added using spatial. This relies upon a geographical partition of the land into a number of spatial regions. The spatial relationship is modelled by assuming correlations between neighbouring regions.

Being a Bayesian model, prior distributions are required for every unknown parameter. These should represent your beliefs about their values prior to incorporating the information from the dataset. It is common to specify wide uninformative prior distributions to represent prior ignorance. The default priors attempt to do this but try to balance against too wide priors which can cause WinBUGS to fail when fitting the model. The prior distributions should always be checked before fitting the full model. See fcs2Priors for further details on the prior distributions and how to specify prior parameters via prior.parameters.

WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml) or the newer OpenBUGS (http://www.openbugs.info/w/) must be separately installed in order to fit the full FCS2 model. If OpenBUGS is used, the package BRugs must also be installed. The latest version of BRugs comes with OpenBUGS bundled within it so that an external installation is no longer necessary.

Note that INLA only uses the priors for the shape parameter r and the hyperparameter for each non-linear term.

inla from the package INLA which is used to provide the approximate model fits using Integrated Nested Laplace Approximation;
bugs from the package BRUGS which is used to provide the full model fit using Markov Chain Monte Carlo (MCMC);

print.fcs2Fit and summary.fcs2Fit for summarising "fcs2Fit" objects;
plot.fcs2Fit, plotSpatialTerm and plotCatchPMF for plotting the FCS2 model fit;
plotBUGSTrace for assessing the MCMC convergence;
ppplot for assessing the model fit with a P-P plot;

fcs2SingleEQR, fcs2JointEQR or fcs2JointAndSingleEQR for producing single or joint EQR samples.

### Example 1: Very simple example with no covariates
###

# simulate random dataset
Data <- data.frame(SurveyArea=rlnorm(100, 4.6, 0.5))   # random survey area
Data$Catch <- rzinbinom(100, size=1.1, zeroprob=0.3,
                        nbmean=0.3 * Data$SurveyArea)  # single catch per survey

# fit approximate model with INLA - model contains no regression terms
fit1 <- fcs2FitModel("Catch", dataFrame=Data, surveyAreaVar="SurveyArea")

## Not run: 
# fit full model with OpenBUGS
# (more iterations may be required to achieve convergence)
fit1 <- fcs2FitModel(fit=fit1, runBUGS=TRUE, n.iter=1000,
                     bugsProgram="OpenBUGS")  
## End(Not run)

# summarise fit
summary(fit1)




### Example 2: Multiple-run data with prevalence affected by a barrier
###

# add to simulated dataset
Data$NumberOfRuns <- 3   # 3 runs
Data$Barrier <- runif(100) > 0.8   # 20% chance that Barrier = TRUE

# 3 run catch with barrier effecting chance of 0
Catch <- rzinmultinom(100, size=1.1,
                      zeroprob=expit(-1 + 2 * Data$Barrier),
                      nmmean=0.3 * Data$SurveyArea %o% 0.4^(0:2))
Data$Run1 <- Catch[,1]
Data$Run2 <- Catch[,2]
Data$Run3 <- Catch[,3]

# define model with a single barrier term in prevalence regression
# and run INLA to provide an approximate fit
fit2 <- fcs2FitModel(c("Run1", "Run2", "Run3"), dataFrame=Data,
                     surveyAreaVar="SurveyArea", nRunsVar="NumberOfRuns",
                     rhoFormula= ~ Barrier)

# summarise fit to see whether barrier term is significant
summary(fit2)

## Not run: 
# fit full model with OpenBUGS
fit2 <- fcs2FitModel(fit=fit2, runBUGS=TRUE, n.iter=10000,
                     bugsProgram="OpenBUGS")  
## End(Not run)

# plot the parameter estimates
plot(fit2, group=FALSE)




### Example 3: Detailed example with multiple terms
###

# define adjacency information for 3 spatial regions appearing in a line
adjacency <- list(num=c(1, 2, 1),  # number of regions adjacent to each region
                  adj=c(2,
                        1, 3,
                        2),        # indices of these regions
                  sumNumNeigh=4)   # total number of adjacencies

# add to simulated dataset
# randomly place each survey in one of 3 regions
Data$RegionNumber <- sample(1:3, 100, replace=TRUE)
Data$Altitude <- rlnorm(100, 4.4, 1.1)   # random altitude
Data$WetWidth <- rlnorm(100, 1.5, 0.6)   # random wet width

# simulate catch that varies with barrier, region, altitude but not wet width
Data$Catch <- rzinbinom(100, size=1.1,
                        zeroprob=expit(-1.3 + 2.9 * Data$Barrier +
                                       2.55 * (Data$RegionNumber == 1) -
                                       3.7 * (Data$RegionNumber == 3)),
                        nbmean=0.3 * Data$SurveyArea *
                               exp(-9.5 + 4 * log(Data$Altitude) -
                                   0.5 * log(Data$Altitude)^2))

# define model with barrier and spatial terms in prevalence and
#   with log(altitude) and log(wet width) in abundance
# run INLA to provide an approximate fit
fit3 <- fcs2FitModel("Catch", dataFrame=Data, surveyAreaVar="SurveyArea",
                     rhoFormula= ~ Barrier + spatial(RegionNumber, adjacency),
                     muFormula= ~ log(Altitude) + I(log(Altitude)^2) +
                                rw2(log(WetWidth)))

# summarise fit
summary(fit3)

# plot fit
plot(fit3)

# fit again without wet width term and restrict the variability
#   between regions in the spatial term
fit3 <- fcs2FitModel("Catch", dataFrame=Data, surveyAreaVar="SurveyArea",
                     rhoFormula= ~ Barrier +
                                   spatial(RegionNumber, adjacency,
                                           scale.parameters=c(mean=1, sd=3)),
                     muFormula= ~ log(Altitude) + I(log(Altitude)^2))

# summarise fit, showing each of the variables in the spatial term
summary(fit3, allVars=TRUE)

# plot fit
plot(fit3)

# adjust the priors to make sure non-informative and fit again
fit3 <- fcs2FitModel("Catch", dataFrame=Data, surveyAreaVar="SurveyArea",
                     rhoFormula= ~ Barrier +
                                   spatial(RegionNumber, adjacency,
                                           scale.parameters=c(mean=1, sd=3)),
                     muFormula= ~ log(Altitude) + I(log(Altitude)^2),
                     prior.parameters=list(beta.const=c(mean=0, sd=30),
                                           "beta.log(Altitude)"=c(mean=0, sd=10),
                                           gamma.BarrierTRUE=c(mean=0, sd=5)))

# plot fit
plot(fit3)

## Not run: 

# fit full model with OpenBUGS using a large number of iterations
#   - this may take some time or
#     even crash if some terms are not significant
fit3 <- fcs2FitModel(fit=fit3, runBUGS=TRUE, n.iter=10000,
                     bugsProgram="OpenBUGS")

# check MCMC chains converged and mixed well
plotBUGSTrace(fit3)

# plot fit
plot(fit3)

# plot predicted catch distribution, for first 9 surveys
plotCatchPMF(fit3, Data, 1:9, boundaries=NULL)

## End(Not run)