Estimating posterior model probabilities

Share:

Description

Model selection with Approximate Bayesian Computation (ABC).

Usage

1
2
3
postpr(target, index, sumstat, tol, subset = NULL, method, corr=TRUE,
kernel="epanechnikov", numnet = 10, sizenet = 5, lambda =
c(0.0001,0.001,0.01), trace = TRUE, maxit = 500, ...)

Arguments

target

a vector of the observed summary statistics.

index

a vector of model indices. It can be character or numeric and will be coerced to factor. It must have the same length as sumstat to indicate which row of sumstat belong to which model.

sumstat

a vector, matrix or data frame of the simulated summary statistics.

tol

numeric, the required proportion of points nearest the target values (tolerance), or a vector of the desired tolerance values. If a vector is given

subset

a logical expression indicating elements or rows to keep. Missing values in index and/or sumstat are taken as FALSE.

method

a character string indicating the type of simulation required. Possible values are "rejection", "mnlogistic", "neuralnet". See Details.

corr

logical, if TRUE (default) posterior model probabilities are corrected for the number of simulations performed for each model. If equal number of simulations are available for all models, corr has no effect.

kernel

a character string specifying the kernel to be used when method is "mnlogistic" or "neuralnet". Defaults to "epanechnikov". See density for details.

numnet

the number of neural networks when method is "neuralnet". It corresponds to the number of times the function nnet is called.

sizenet

the number of units in the hidden layer. Can be zero if there are no skip-layer units.

lambda

a numeric vector or a single value indicating the weight decay when method is "neuralnet". By default, 0.0001, 0.001, or 0.01 is randomly chosen for the each of the networks. See nnet for more details.

trace

logical, TRUE switches on tracing the optimization of nnet (applies when method is "neuralnet").

maxit

numeric, the maximum number of iterations. Defaults to 500. See also nnet.

...

other arguments passed on from nnet.

Details

The function computes the posterior model probabilities. Simulations have to be performed with at least two distinct models. When method is "rejection", the posterior probability of a given model is approximated by the proportion of accepted simulations given this model. This approximation holds when the different models are a priori equally likely, and the same number of simulations is performed for each model. When method is "mnlogistic" the posterior model probabilities are estimated using a multinomial logistic regression as implemented in the function multinom from the package nnet. When method is "neuralnet", neural networks are used to predict the probabilities of models based on the observed statistics using nnet. This method can be useful if many summary statistics are used.

Names for the summary statistics are strongly recommended. Names can be supplied as colnames to sumstat (and target). If no names are supplied S1, S2, ... to summary statistics will be assigned to parameters and the user will be warned.

Value

An object of class "postpr", containing the following components:

pred

a vector of model probabilities when method is "mnlogistic" or "neuralnet".

values

the vector of model indices in the accepted region using the rejection method.

weights

vector of regression weights when method is "mnlogistic" or "neuralnet".

ss

summary statistics in the accepted region.

call

the original call.

na.action

a logical vector indicating the elements or rows that were excluded, including both NA/NaN's and elements/rows selected by subset

method

a character string indicating the method used, i.e. "rejection", "mnlogistic" or "neuralnet".

corr

logical, if TRUE the posterior model probabilities are corrected for the number of simulations performed for each model.

nmodels

the number of simulations performed for each model a priori.

models

a character vector of model names (a priori).

numstat

the number of summary statistics used.

names

a list of two elements: model contains the model names, and statistics.names the names of the summary statistics.

Author(s)

Katalin Csillery, Olivier Francois and Michael Blum with some initial code from Mark Beaumont.

References

Beaumont, M.A. (2008) Joint determination of topology, divergence time, and immigration in population trees. In Simulation, Genetics, and Human Prehistory (Matsumura, S., Forster, P. and Renfrew, C., eds) McDonald Institute for Archaeological Research

See Also

summary.postpr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
## an artifical example
ss <- cbind(runif(1000),rt(1000,df=20))
postpr(target=c(3), index=c(rep("norm",500),rep("t",500)),
sumstat=ss[,1], tol=.1, method="rejection")

## human demographic history
require(abc.data)
data(human)
## five R objects are loaded. See ?human and vignette("abc") for details.

## illustrate the informativeness of two summary statistics: mean and
## variance of Tajima's D
par(mfcol = c(1,3))
boxplot(stat.3pops.sim[,"pi"]~models, main="Mean nucleotide diversity")
boxplot(stat.3pops.sim[,"TajD.m"]~models, main="Mean Tajima's D")
boxplot(stat.3pops.sim[,"TajD.v"]~models, main="Var in Tajima's D")

## model selection with ABC for the European population


modsel.it <- postpr(stat.voight["italian",], models, stat.3pops.sim, tol=.05, method="mnlogistic")
summary(modsel.it)

## In Europe, the most supported model
## is a population bottleneck 
## Check that in Africa, population expansion is the most supported model, while in
##  Asia, it is a population bottleneck
##modsel.ha <- postpr(stat.voight["hausa",], models, stat.3pops.sim, 
##tol=.05, method="mnlogistic")
##modsel.ch <- postpr(stat.voight["chinese",], models, stat.3pops.sim,
## tol=.05, method="mnlogistic")