kfoldCVModelCount: K-fold Cross-Validation for Lek Count Models

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/kfoldCVModelCount.R

Description

K-fold Cross validation for Lek Count Models.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
kfoldCVModelCount(lekGroup, dataList,
                  registeredModel = c("modelCountDetectBinREY",
                                      "modelCountDetectBin",
                                      "modelCountDetectBetaBinREY",
                                      "modelCountDetectBinREYObs2"),
                  parameters, n.chains = 4, n.iter = 30000, thin = 30,
                  backupFile = tempfile(pattern = "bckp",
                  tmpdir = getwd(), fileext = ".Rds"))

restartCV(filename)

## S3 method for class 'CVModelCount'
print(x, ...)

LLCount(dataList, listCVCoef, lekGroup, iterations,
        nrepint = 1000, verbose = TRUE)

## S3 method for class 'LLCountsSim'
print(x, ...)

elpdLeks(x)

Arguments

lekGroup

vector of N integers from 1 to K, where N is the number of Leks in the dataset and K is the number of subsets used in K-fold cross-validation.

dataList

object of class "caperpyData" returned by the function dataCount2jags containing the dataset used to fit the model.

registeredModel

character string containing the name of a registered count model (see help("modelCountDetectBin") for a list of registered model names).

parameters

vector of character string containing name of parameters to monitor during MCMC iterations. Can be left unspecified.

n.chains

The number of MCMC chain to perform.

n.iter

The number of MCMC iterations to monitor.

thin

thinning intervals for monitors.

backupFile

character string containing the name of a file that will be used to backup calculations. If the function kfoldCVModelCount is stopped while the calculations are not finished, the calculations can be restarted with restartCV, passing this filename as filename argument.

filename

character string. See argument backupFile above.

listCVCoef

object of class "CVModelCount" returned by the function kfoldCVModelCount or restartCV containing the K objects of class mcmc.list corresponding to the K subsets.

iterations

integer vector containing the indices of the MCMC iterations for which LPD is to be calculated. Can be left unspecified (in which case all iterations are used).

nrepint

The number of Monte-Carlo simulation to perform for integration in the calculation of LPD (see the vignette for details).

verbose

logical value indicating if information should be printed

x

for elpdLeks and print.LLCountsSim, an object of class "LLCountSim". For print.CVModelCount, an object of class "CVModelCount".

...

additional arguments to be passed from and to other functions.

Details

To test the predictive ability of a model, this approach consists in splitting the original lek counts dataset lekcounts in G subsets of leks. In the dataset used by Calenge et al. (in prep.), we used 10 groups of 33 leks. Thus, for each subset i, we can build a calibration dataset with all subsets except subset i and fit the model by MCMC with this calibration dataset. The function kfoldCVModelCount performs this operation. We can then predict the count data of each lek using a model fitted without this lek, and calculate the log-probability density (LPD) of all counts on each lek, for each MCMC vector of parameters simulated the model (avoiding the circularity consisting in using a model fit with a dataset to predict the same dataset). The function LLCount performs this operation. Finally, the function elpdLeks calculates the expected log-probability for each lek (mean over all iterations).

Note that the functions kfoldCVModelCount and LLCount can take a very long time. See the vignette vignette("caperpyogm") for a more detailed description of the k-fold validation process.

Value

The functions kfoldCVModelCount and restartCV return an object of class "CVModelCount", which is a list with K components, the component i being an object of class "mcmc.list" corresponding to the model fitted to the dataset excluding lek group i.

The function LLCount return a matrix of class "LLCountSim" with N rows (the N leks) and P columns (the P MCMC iterations) containing the LPD for all counts of each lek calculated using each MCMC iteration.

The function elpdLeks returns a vector with N elements (the N leks) containing the estimated expected log-probability densities for all counts of each lek.

Author(s)

Clement Calenge clement.calenge@ofb.gouv.fr

References

Calenge C., Menoni E., Milhau B., Foulche K, Chiffard J., Marchandeau S. (in prep.). The participatory monitoring of the capercaillie in the French Pyrenees.

See Also

llcBinREY contains datasets generated by this process for several models. See the vignette vignette("caperpyogm") for a more detailed description of the k-fold validation process.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## We work on the dataset lekcounts
head(lekcounts)

## We prepare the dataset to fit the model with JAGS
dataList <- dataCount2jags(lekcounts$lek, lekcounts$period,
                           lekcounts$nbobs, lekcounts$nbmales,
                           lekcounts$gr, as.numeric(factor(lekcounts$type)),
                           lekcounts$natun, lekcounts$year)
dataList

## We define 10 groups of 33 leks
set.seed(980)
ooo <- sample(c(rep(1:10,each=33)))

## Performs K-fold validation. WARNING!! THIS CALCULATION TAKES
## SEVERAL HOURS!!!
## Not run: 
listCoefsCVBinREY <- kfoldCVModelCount(ooo, dataList, "modelCountDetectBinREY")

## End(Not run)

## To save time for the user, we have stored the result of this
## command in the dataset listCoefsCVBinREY (for the model
## modelCountDetectBinREY only. We could not include the results of
## cross-validation for other models due to the large object size, but
## we can send them on request).
listCoefsCVBinREY

## Finally, we can use LLCount to calculate the LPD of each lek counts
## for each MCMC iteration, under a model that was not fit using these
## counts.
## WARNING!!! THIS CALCULATION ALSO TAKES MORE THAN ONE HOUR!!!
## Not run: 
llcBinREY <- LLCount(dataList, listCoefsCVBinREY, ooo)

## End(Not run)

## And the result is stored in the dataset:
llcBinREY

## Expected LPD can be calculated with:
elpdLeks(llcBinREY)

ClementCalenge/caperpyogm documentation built on Sept. 14, 2021, 4:14 p.m.