cv4postpr: Leave-one-our cross validation for model selection ABC

Description Usage Arguments Details Value See Also Examples

Description

This function performs a leave-one-out cross validation for model selection with ABC via subsequent calls to the function postpr.

Usage

1
2
3
cv4postpr(index, sumstat, postpr.out = NULL, nval, tols, method,
subset = NULL, kernel = "epanechnikov", numnet = 10, sizenet = 5, lambda
= c(0.0001,0.001,0.01), trace = FALSE, maxit = 500, ...)

Arguments

index

a vector of model indices. It can be character or numeric and will be coerced to factor. It must have the same length as the number of rows in sumstat to indicate which row of sumstat belong to which model.

sumstat

a vector, matrix or data frame of the simulated summary statistics.

postpr.out

an object of class "postpr", optional. If supplied, all arguments passed to postpr are extracted from this object, except for sumstat, index, and tols, which always have to be supplied as arguments.

nval

the size of the cross-validation sample for each model.

tols

a single tolerance rate or a vector of tolerance rates.

method

a character string indicating the type of simulation required. Possible values are "rejection", "mnlogistic", "neuralnet". See postpr for details.

subset

a logical expression indicating elements or rows to keep. Missing values in index and/or sumstat are taken as FALSE.

kernel

a character string specifying the kernel to be used when method is "loclinear" or "neuralnet". Defaults to "epanechnikov". See density for details.

numnet

the number of neural networks when method is "neuralnet". Defaults to 10. It indicates the number of times the function nnet is called.

sizenet

the number of units in the hidden layer. Defaults to 5. Can be zero if there are no skip-layer units. See nnet for more details.

lambda

a numeric vector or a single value indicating the weight decay when method is "neuralnet". See nnet for more details. By default, 0.0001, 0.001, or 0.01 is randomly chosen for each of the networks.

trace

logical, TRUE switches on tracing the optimization of nnet. Applies only when method is "neuralnet".

maxit

numeric, the maximum number of iterations. Defaults to 500. Applies only when method is "neuralnet". See also nnet.

...

other arguments passed to nnet.

Details

For each model, a simulation is selected repeatedly to be a validation simulation, while the other simulations are used as training simulations. Each time the function postpr is called to estimate the parameter(s).

Ideally, we want nval to be equal to the number of simulations for each model, however, this might take too much time. Users are warned not to choose a too large number of simulations (especially when the neural networks are used). Beware that the actual number of cross-validation estimation steps that need to be performed is nval*the number of models.

The arguments for the function postpr can be supplied in two ways. First, simply give them as arguments when calling this function, in which case postpr.out can be NULL. Second, via an existing object of class "postpr", here postpr.out. WARNING: when postpr.out is supplied, the same sumstat and param objects have to be used as in the original call to postpr. Column names of sumstat and param are checked for match.

See summary.cv4postpr for calculating the prediction error from an object of class "cv4postpr" and plot.cv4postpr for visualizing the misclassification of the models using barplots.

Value

An object of class "cv4postpr", which is a list with the following elements

call

The original calls to postpr for each tolerance rates.

cvsamples

Numeric vector of length nval*the number of models, indicating which rows of sumstat were used as validation values.

tols

The tolerance rates.

true

The true models.

estim

The estimated model probabilities.

method

The method used.

names

A list of two elements: model contains the model names, and statistics.names the names of the summary statistics.

seed

The value of .Random.seed when cv4postpr is called.

See Also

postpr, summary.cv4postpr, plot.cv4postpr

Examples

1
2
3
4
5
6
7
8
require(abc.data)
data(human)
###Reduce the sample size of the simulations to reduce the running time.
###Do not do that with your own data!
ss<-c(1:1000,50001:51000,100001:101000)
cv.modsel <- cv4postpr(models[ss], stat.3pops.sim[ss,], nval=5, tols=c(.05,.1), method="rejection") 
summary(cv.modsel)
plot(cv.modsel, names.arg=c("Bottleneck", "Constant", "Exponential"))

Example output

Loading required package: abc.data
Loading required package: nnet
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Confusion matrix based on 5 samples for each model.

$tol0.05
      bott const exp
bott     3     1   1
const    0     2   3
exp      0     0   5

$tol0.1
      bott const exp
bott     3     1   1
const    0     2   3
exp      0     0   5


Mean model posterior probabilities (rejection)

$tol0.05
        bott  const    exp
bott  0.6042 0.2572 0.1386
const 0.1453 0.4402 0.4145
exp   0.0933 0.2412 0.6655

$tol0.1
        bott  const    exp
bott  0.5542 0.2978 0.1479
const 0.1446 0.4369 0.4185
exp   0.1059 0.2618 0.6322

abc documentation built on May 2, 2019, 3:32 p.m.

Related to cv4postpr in abc...