# abc: Parameter estimation with Approximate Bayesian Computation... In abc: Tools for Approximate Bayesian Computation (ABC)

## Description

This function performs multivariate parameter estimation based on summary statistics using an ABC algorithm. The algorithms implemented are rejection sampling, and local linear or non-linear (neural network) regression. A conditional heteroscedastic model is available for the latter two algorithms.

## Usage

 ```1 2 3 4``` ```abc(target, param, sumstat, tol, method, hcorr = TRUE, transf = "none", logit.bounds, subset = NULL, kernel = "epanechnikov", numnet = 10, sizenet = 5, lambda = c(0.0001,0.001,0.01), trace = FALSE, maxit = 500, ...) ```

## Arguments

 `target` a vector of the observed summary statistics. `param` a vector, matrix or data frame of the simulated parameter values, i.e. the dependent variable(s) when `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`. `sumstat` a vector, matrix or data frame of the simulated summary statistics, i.e. the independent variables when `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`. `tol` tolerance, the required proportion of points accepted nearest the target values. `method` a character string indicating the type of ABC algorithm to be applied. Possible values are `"rejection"`, `"loclinear"`, `"neuralnet"` and `"ridge"`. See also `Details`. `hcorr` logical, the conditional heteroscedastic model is applied if `TRUE` (default). `transf` a vector of character strings indicating the kind of transformation to be applied to the parameter values. The possible values are `"log"`, `"logit"`, and `"none"` (default), when no is transformation applied. See also `Details`. `logit.bounds` a matrix of bounds if `transf` is `"logit"`. The matrix has as many lines as parameters (including the ones that are not `"logit"` transformed) and 2 columns. First column is the minimum bound and second column is the maximum bound. `subset` a logical expression indicating elements or rows to keep. Missing values in `param` and/or `sumstat` are taken as `FALSE`. `kernel` a character string specifying the kernel to be used when `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`. Defaults to `"epanechnikov"`. See `density` for details. `numnet` the number of neural networks when `method` is `"neuralnet"`. Defaults to 10. It indicates the number of times the function `nnet` is called. `sizenet` the number of units in the hidden layer. Defaults to 5. Can be zero if there are no skip-layer units. See `nnet` for more details. `lambda` a numeric vector or a single value indicating the weight decay when `method` is `"neuralnet"`. See `nnet` for more details. By default, 0.0001, 0.001, or 0.01 is randomly chosen for each of the networks. `trace` logical, if `TRUE` switches on tracing the optimization of `nnet`. Applies only when `method` is `"neuralnet"`. `maxit` numeric, the maximum number of iterations. Defaults to 500. Applies only when `method` is `"neuralnet"`. See also `nnet`. `...` other arguments passed to `nnet`.

## Details

These ABC algorithms generate random samples from the posterior distributions of one or more parameters of interest, θ_1, θ_2, …, θ_n. To apply any of these algorithms, (i) data sets have to be simulated based on random draws from the prior distributions of the θ_i's, (ii) from these data sets, a set of summary statistics have to be calculated, S(y), (iii) the same summary statistics have to be calculated from the observed data, S(y_0), and (iv) a tolerance rate must be chosen (`tol`). See `cv4abc` for a cross-validation tool that may help in choosing the tolerance rate.

When `method` is `"rejection"`, the simple rejection algorithm is used. Parameter values are accepted if the Euclidean distance between S(y) and S(y_0) is sufficiently small. The percentage of accepted simulations is determined by `tol`. When `method` is `"loclinear"`, a local linear regression method corrects for the imperfect match between S(y) and S(y_0). The accepted parameter values are weighted by a smooth function (`kernel`) of the distance between S(y) and S(y_0), and corrected according to a linear transform: θ^{*} = θ - b(S(y) - S(y_0)). θ^{*}'s represent samples form the posterior distribution. This method calls the function `lsfit` from the `stats` library. When using the `"loclinear"` method, a warning about the collinearity of the design matrix of the regression might be issued. In that situation, we recommend to rather use the related `"ridge"` method that performs local-linear ridge regression and deals with the collinearity issue. The non-linear regression correction method (`"neuralnet"`) uses a non-linear regression to minimize the departure from non-linearity using the function `nnet`. The posterior samples of parameters based on the rejection algorithm are returned as well, even when one of the regression algorithms is used.

Several additional arguments can be specified when `method` is `"neuralnet"`. The method is based on the function `nnet` from the library `nnet`, which fits single-hidden-layer neural networks. `numnet` defines the number of neural networks, thus the function `nnet` is called `numnet` number of times. Predictions from different neural networks can be rather different, so the median of the predictions from all neural networks is used to provide a global prediction. The choice of the number of neural networks is a trade-off between speed and accuracy. The default is set to 10 networks. The number of units in the hidden layer can be specified via `sizenet`. Selecting the number of hidden units is similar to selecting the independent variables in a linear or non-linear regression. Thus, it corresponds to the complexity of the network. There is several rule of thumb to choose the number of hidden units, but they are often unreliable. Generally speaking, the optimal choice of `sizenet` depends on the dimensionality, thus the number of statistics in `sumstat`. It can be zero when there are no skip-layer units. See also `nnet` for more details. The `method` `"neuralnet"` is recommended when dealing with a large number of summary statistics.

If `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`, a correction for heteroscedasticity is applied by default (```hcorr = TRUE```).

Parameters maybe transformed priori to estimation. The type of transformation is defined by `transf`. The length of `transf` is normally the same as the number of parameters. If only one value is given, that same transformation is applied to all parameters and the user is warned. When a parameter transformation used, the parameters are back-transformed to their original scale after the regression estimation. No transformations can be applied when `method` is `"rejection"`.

Using names for the parameters and summary statistics is strongly recommended. Names can be supplied as `names` or `colnames` to `param` and `sumstat` (and `target`). If no names are supplied, P1, P2, ... is assigned to parameters and S1, S2, ... to summary statistics and the user is warned.

## Value

The returned value is an object of class `"abc"`, containing the following components:

 `adj.values` The regression adjusted values, when `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`. `unadj.values` The unadjusted values that correspond to `"rejection"` `method`. `ss` The summary statistics for the accepted simulations. `weights` The regression weights, when `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`. `residuals` The residuals from the regression when `method` is `"loclinear"`, `"neuralnet"` or `"ridge"`. These are the "raw" residuals from `lsfit` or `nnet`, respectively, thus they are not on the original scale of the parameter(s). `dist` The Euclidean distances for the accepted simulations. `call` The original call. `na.action` A logical vector indicating the elements or rows that were excluded, including both `NA`/`NaN`'s and elements/rows selected by `subset`. `region` A logical expression indicting the elements or rows that were accepted. `transf` The parameter transformations that have been used. `logit.bounds` The bounds, if transformation was `"logit"`. `kernel` The kernel used. `method` Character string indicating the `method`, i.e. `"rejection"`, `"loclinear"`, or `"neuralnet"`. `lambda` A numeric vector of length `numnet`. The actual values of the decay parameters used in each of the neural networks, when `method` is `"neuralnet"`. These values are selected randomly from the supplied vector of values. `numparam` Number of parameters used. `numstat` Number of summary statistics used. `aic` The sum of the AIC of the `numparam` regression. Returned only if `method` is `"loclinear"`. It is used for selecting informative summary statistics. `bic` The same but with the BIC. `names` A list with two elements: `parameter.names` and `statistics.names`. Both contain a vector of character strings with the parameter and statistics names, respectively.

## Author(s)

Katalin Csillery, Olivier Francois and Michael Blum with some initial code from Mark Beaumont.

## References

Pritchard, J.K., and M.T. Seielstad and A. Perez-Lezaun and M.W. Feldman (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.

Beaumont, M.A., Zhang, W., and Balding, D.J. (2002) Approximate Bayesian Computation in Population Genetics, Genetics, 162, 2025-2035.

Blum, M.G.B. and Francois, O. (2010) Non-linear regression models for Approximate Bayesian Computation. Statistics and Computing 20, 63-73.

Csillery, K., M.G.B. Blum, O.E. Gaggiotti and O. Francois (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology and Evolution, 25, 410-418.

## See Also

`summary.abc`, `hist.abc`, `plot.abc`, `lsfit`, `nnet`, `cv4abc`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55``` ```require(abc.data) data(musigma2) ?musigma2 ## The rejection algorithm ## rej <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, method = "rejection") ## ABC with local linear regression correction without/with correction ## for heteroscedasticity ## lin <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, hcorr = FALSE, method = "loclinear", transf=c("none","log")) linhc <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, method = "loclinear", transf=c("none","log")) ## posterior summaries ## linsum <- summary(linhc, intvl = .9) linsum ## compare with the rejection sampling summary(linhc, unadj = TRUE, intvl = .9) ## posterior histograms ## hist(linhc, breaks=30, caption=c(expression(mu), expression(sigma^2))) ## or send histograms to a pdf file hist(linhc, file="linhc", breaks=30, caption=c(expression(mu), expression(sigma^2))) ## diagnostic plots: compare the 2 'abc' objects: "loclinear", ## "loclinear" with correction for heteroscedasticity ## plot(lin, param=par.sim) plot(linhc, param=par.sim) ## example illustrates how to add "true" parameter values to a plot ## postmod <- c(post.mu[match(max(post.mu[,2]), post.mu[,2]),1], post.sigma2[match(max(post.sigma2[,2]), post.sigma2[,2]),1]) plot(linhc, param=par.sim, true=postmod) ## artificial example to show how to use the logit tranformations ## myp <- data.frame(par1=runif(1000,-1,1),par2=rnorm(1000),par3=runif(1000,0,2)) mys <- myp+rnorm(1000,sd=.1) myt <- c(0,0,1.5) lin2 <- abc(target=myt, param=myp, sumstat=mys, tol=.1, method = "loclinear", transf=c("logit","none","logit"),logit.bounds = rbind(c(-1, 1), c(NA, NA), c(0, 2))) summary(lin2) ```

### Example output        ```Loading required package: abc.data
Loading required package: nnet
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

backsolve

Loading required package: MASS
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
musigma2               package:abc.data                R Documentation

_A _s_e_t _o_f _o_b_j_e_c_t_s _u_s_e_d _t_o _e_s_t_i_m_a_t_e _t_h_e _p_o_p_u_l_a_t_i_o_n _m_e_a_n _a_n_d _v_a_r_i_a_n_c_e _i_n _a
_G_a_u_s_s_i_a_n _m_o_d_e_l _w_i_t_h _A_B_C (_s_e_e _t_h_e _v_i_g_n_e_t_t_e _o_f _t_h_e '_a_b_c' _p_a_c_k_a_g_e _f_o_r _m_o_r_e
_d_e_t_a_i_l_s).

_D_e_s_c_r_i_p_t_i_o_n:

'musigma2' loads in five R objects: 'par.sim' is a data frame and
contains the parameter values of the simulated data sets, 'stat'
is a data frame and contains the simulated summary statistics,
'stat.obs' is a data frame and contains the observed summary
statistics, 'post.mu' and 'post.sigma2' are data frames and
contain the true posterior distributions for the two parameters of
interest, mu and sigma^2, respectively.

_U_s_a_g_e:

data(musigma2)

_F_o_r_m_a_t:

The 'par.sim' data frame contains the following columns:

'mu' The population mean.

'sigma2' The population variance.

The 'stat.sim' and 'stat.obs' data frames contain the following
columns:

'mean' The sample mean.

'var' The logarithm of the sample variance.

The 'post.mu' and 'post.sigma2' data frames contain the following
columns:

'x' the coordinates of the points where the density is estimated.

'y' the posterior density values.

_D_e_t_a_i_l_s:

The prior of sigma^2 is an inverse chi^2 distribution with one
degree of freedom. The prior of mu is a normal distribution with
variance of sigma^2. For this simple example, the closed form of
the posterior distribution is available.

_S_o_u_r_c_e:

The observed statistics are the mean and variance of the sepal of
_Iris setosa_, estimated from part of the 'iris' data.

The data were collected by Anderson, Edgar.

_R_e_f_e_r_e_n_c_e_s:

Anderson, E. (1935). The irises of the Gaspe Peninsula, _Bulletin
of the American Iris Society_, *59*, 2-5.

Call:
abc(target = stat.obs, param = par.sim, sumstat = stat.sim, tol = 0.1,
method = "loclinear", transf = c("none", "log"))
Data:
abc.out\$adj.values (1000 posterior samples)
Weights:
abc.out\$weights

mu sigma2
Min.:                3.2091 0.0855
Weighted 5 % Perc.:  3.3250 0.1187
Weighted Median:     3.4198 0.1636
Weighted Mean:       3.4194 0.1674
Weighted Mode:       3.4225 0.1531
Weighted 95 % Perc.: 3.5144 0.2287
Max.:                3.5982 0.3108
mu     sigma2
Min.:                3.20912108 0.08552166
Weighted 5 % Perc.:  3.32503009 0.11871687
Weighted Median:     3.41976457 0.16356863
Weighted Mean:       3.41943316 0.16743565
Weighted Mode:       3.42252573 0.15309016
Weighted 95 % Perc.: 3.51441169 0.22874721
Max.:                3.59822201 0.31079180
Call:
abc(target = stat.obs, param = par.sim, sumstat = stat.sim, tol = 0.1,
method = "loclinear", transf = c("none", "log"))
Data:
abc.out\$unadj.values (1000 posterior samples)

mu sigma2
Min.:      2.4798 0.0651
5% Perc.:  2.6745 0.1245
Median:    3.1856 0.2759
Mean:      3.2173 0.2882
Mode:      3.1079 0.2468
95% Perc.: 3.8752 0.5007
Max.:      4.2307 0.6950
Call:
abc(target = myt, param = myp, sumstat = mys, tol = 0.1, method = "loclinear",
transf = c("logit", "none", "logit"), logit.bounds = rbind(c(-1,
1), c(NA, NA), c(0, 2)))
Data:
abc.out\$adj.values (100 posterior samples)
Weights:
abc.out\$weights

par1    par2    par3
Min.:                  -0.3497 -0.3278  1.1899
Weighted 2.5 % Perc.:  -0.2335 -0.2351  1.2656
Weighted Median:        0.0056  0.0100  1.5337
Weighted Mean:         -0.0076 -0.0077  1.5395
Weighted Mode:          0.0199  0.0194  1.5358
Weighted 97.5 % Perc.:  0.2359  0.2268  1.8149
Max.:                   0.3094  0.2588  1.8408
```

abc documentation built on May 2, 2019, 3:32 p.m.