abc: Parameter estimation with Approximate Bayesian Computation...

View source: R/abc.R

abcR Documentation

Parameter estimation with Approximate Bayesian Computation (ABC)

Description

This function performs multivariate parameter estimation based on summary statistics using an ABC algorithm. The algorithms implemented are rejection sampling, and local linear or non-linear (neural network) regression. A conditional heteroscedastic model is available for the latter two algorithms.

Usage

abc(target, param, sumstat, tol, method, hcorr = TRUE, transf = "none",
logit.bounds, subset = NULL, kernel = "epanechnikov", numnet =
10, sizenet = 5, lambda = c(0.0001,0.001,0.01), trace = FALSE, maxit =
500, ...)

Arguments

target

a vector of the observed summary statistics.

param

a vector, matrix or data frame of the simulated parameter values, i.e. the dependent variable(s) when method is "loclinear", "neuralnet" or "ridge".

sumstat

a vector, matrix or data frame of the simulated summary statistics, i.e. the independent variables when method is "loclinear", "neuralnet" or "ridge".

tol

tolerance, the required proportion of points accepted nearest the target values.

method

a character string indicating the type of ABC algorithm to be applied. Possible values are "rejection", "loclinear", "neuralnet" and "ridge". See also Details.

hcorr

logical, the conditional heteroscedastic model is applied if TRUE (default).

transf

a vector of character strings indicating the kind of transformation to be applied to the parameter values. The possible values are "log", "logit", and "none" (default), when no is transformation applied. See also Details.

logit.bounds

a matrix of bounds if transf is "logit". The matrix has as many lines as parameters (including the ones that are not "logit" transformed) and 2 columns. First column is the minimum bound and second column is the maximum bound.

subset

a logical expression indicating elements or rows to keep. Missing values in param and/or sumstat are taken as FALSE.

kernel

a character string specifying the kernel to be used when method is "loclinear", "neuralnet" or "ridge". Defaults to "epanechnikov". See density for details.

numnet

the number of neural networks when method is "neuralnet". Defaults to 10. It indicates the number of times the function nnet is called.

sizenet

the number of units in the hidden layer. Defaults to 5. Can be zero if there are no skip-layer units. See nnet for more details.

lambda

a numeric vector or a single value indicating the weight decay when method is "neuralnet". See nnet for more details. By default, 0.0001, 0.001, or 0.01 is randomly chosen for each of the networks.

trace

logical, if TRUE switches on tracing the optimization of nnet. Applies only when method is "neuralnet".

maxit

numeric, the maximum number of iterations. Defaults to 500. Applies only when method is "neuralnet". See also nnet.

...

other arguments passed to nnet.

Details

These ABC algorithms generate random samples from the posterior distributions of one or more parameters of interest, θ_1, θ_2, …, θ_n. To apply any of these algorithms, (i) data sets have to be simulated based on random draws from the prior distributions of the θ_i's, (ii) from these data sets, a set of summary statistics have to be calculated, S(y), (iii) the same summary statistics have to be calculated from the observed data, S(y_0), and (iv) a tolerance rate must be chosen (tol). See cv4abc for a cross-validation tool that may help in choosing the tolerance rate.

When method is "rejection", the simple rejection algorithm is used. Parameter values are accepted if the Euclidean distance between S(y) and S(y_0) is sufficiently small. The percentage of accepted simulations is determined by tol. When method is "loclinear", a local linear regression method corrects for the imperfect match between S(y) and S(y_0). The accepted parameter values are weighted by a smooth function (kernel) of the distance between S(y) and S(y_0), and corrected according to a linear transform: θ^{*} = θ - b(S(y) - S(y_0)). θ^{*}'s represent samples form the posterior distribution. This method calls the function lsfit from the stats library. When using the "loclinear" method, a warning about the collinearity of the design matrix of the regression might be issued. In that situation, we recommend to rather use the related "ridge" method that performs local-linear ridge regression and deals with the collinearity issue. The non-linear regression correction method ("neuralnet") uses a non-linear regression to minimize the departure from non-linearity using the function nnet. The posterior samples of parameters based on the rejection algorithm are returned as well, even when one of the regression algorithms is used.

Several additional arguments can be specified when method is "neuralnet". The method is based on the function nnet from the library nnet, which fits single-hidden-layer neural networks. numnet defines the number of neural networks, thus the function nnet is called numnet number of times. Predictions from different neural networks can be rather different, so the median of the predictions from all neural networks is used to provide a global prediction. The choice of the number of neural networks is a trade-off between speed and accuracy. The default is set to 10 networks. The number of units in the hidden layer can be specified via sizenet. Selecting the number of hidden units is similar to selecting the independent variables in a linear or non-linear regression. Thus, it corresponds to the complexity of the network. There is several rule of thumb to choose the number of hidden units, but they are often unreliable. Generally speaking, the optimal choice of sizenet depends on the dimensionality, thus the number of statistics in sumstat. It can be zero when there are no skip-layer units. See also nnet for more details. The method "neuralnet" is recommended when dealing with a large number of summary statistics.

If method is "loclinear", "neuralnet" or "ridge", a correction for heteroscedasticity is applied by default (hcorr = TRUE).

Parameters maybe transformed priori to estimation. The type of transformation is defined by transf. The length of transf is normally the same as the number of parameters. If only one value is given, that same transformation is applied to all parameters and the user is warned. When a parameter transformation used, the parameters are back-transformed to their original scale after the regression estimation. No transformations can be applied when method is "rejection".

Using names for the parameters and summary statistics is strongly recommended. Names can be supplied as names or colnames to param and sumstat (and target). If no names are supplied, P1, P2, ... is assigned to parameters and S1, S2, ... to summary statistics and the user is warned.

Value

The returned value is an object of class "abc", containing the following components:

adj.values

The regression adjusted values, when method is "loclinear", "neuralnet" or "ridge".

unadj.values

The unadjusted values that correspond to "rejection" method.

ss

The summary statistics for the accepted simulations.

weights

The regression weights, when method is "loclinear", "neuralnet" or "ridge".

residuals

The residuals from the regression when method is "loclinear", "neuralnet" or "ridge". These are the "raw" residuals from lsfit or nnet, respectively, thus they are not on the original scale of the parameter(s).

dist

The Euclidean distances for the accepted simulations.

call

The original call.

na.action

A logical vector indicating the elements or rows that were excluded, including both NA/NaN's and elements/rows selected by subset.

region

A logical expression indicting the elements or rows that were accepted.

transf

The parameter transformations that have been used.

logit.bounds

The bounds, if transformation was "logit".

kernel

The kernel used.

method

Character string indicating the method, i.e. "rejection", "loclinear", or "neuralnet".

lambda

A numeric vector of length numnet. The actual values of the decay parameters used in each of the neural networks, when method is "neuralnet". These values are selected randomly from the supplied vector of values.

numparam

Number of parameters used.

numstat

Number of summary statistics used.

aic

The sum of the AIC of the numparam regression. Returned only if method is "loclinear". It is used for selecting informative summary statistics.

bic

The same but with the BIC.

names

A list with two elements: parameter.names and statistics.names. Both contain a vector of character strings with the parameter and statistics names, respectively.

Author(s)

Katalin Csillery, Olivier Francois and Michael Blum with some initial code from Mark Beaumont.

References

Pritchard, J.K., and M.T. Seielstad and A. Perez-Lezaun and M.W. Feldman (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.

Beaumont, M.A., Zhang, W., and Balding, D.J. (2002) Approximate Bayesian Computation in Population Genetics, Genetics, 162, 2025-2035.

Blum, M.G.B. and Francois, O. (2010) Non-linear regression models for Approximate Bayesian Computation. Statistics and Computing 20, 63-73.

Csillery, K., M.G.B. Blum, O.E. Gaggiotti and O. Francois (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology and Evolution, 25, 410-418.

See Also

summary.abc, hist.abc, plot.abc, lsfit, nnet, cv4abc

Examples

require(abc.data)
data(musigma2)
?musigma2

## The rejection algorithm
##
rej <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, method =
"rejection") 

## ABC with local linear regression correction without/with correction
## for heteroscedasticity 
##
lin <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, hcorr =
FALSE, method = "loclinear", transf=c("none","log"))
linhc <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, method =
"loclinear", transf=c("none","log")) 

## posterior summaries
##
linsum <- summary(linhc, intvl = .9)
linsum
## compare with the rejection sampling
summary(linhc, unadj = TRUE, intvl = .9)

## posterior histograms
##
hist(linhc, breaks=30, caption=c(expression(mu),
expression(sigma^2))) 

## or send histograms to a pdf file
## Not run: 
hist(linhc, file="linhc", breaks=30, caption=c(expression(mu),
expression(sigma^2)))

## End(Not run)
## diagnostic plots: compare the 2 'abc' objects: "loclinear",
## "loclinear" with correction for heteroscedasticity
##
plot(lin, param=par.sim)
plot(linhc, param=par.sim)

## example illustrates how to add "true" parameter values to a plot
##
postmod <- c(post.mu[match(max(post.mu[,2]), post.mu[,2]),1],
             post.sigma2[match(max(post.sigma2[,2]), post.sigma2[,2]),1])
plot(linhc, param=par.sim, true=postmod)


## artificial example to show how to use the logit tranformations
##
myp <- data.frame(par1=runif(1000,-1,1),par2=rnorm(1000),par3=runif(1000,0,2))
mys <- myp+rnorm(1000,sd=.1)
myt <- c(0,0,1.5)
lin2 <- abc(target=myt, param=myp, sumstat=mys, tol=.1, method =
"loclinear", transf=c("logit","none","logit"),logit.bounds = rbind(c(-1,
1), c(NA, NA), c(0, 2)))
summary(lin2)

abc documentation built on May 20, 2022, 1:11 a.m.