KDKW.FD.CRN_ind: Approximate maximum likelihood algorithm based on...
In johannabertl/ApproxML: Approximate maximum likelihood estimation

Description Usage Arguments Details Functions Gain sequences Author(s) References Examples

This function approximates the maximum likelihood estimate of the multivariate parameter theta. It requires a function to simulate data given a random seed and compute summary statistics at each position of the parameter space. These simulations are used to obtain estimates the likelihood in a stochastic approximation algorithm based on finite differences approximation of the gradient (Kiefer-Wolfowitz algorithm). The random seed is used to generate common random numbers (CRNs) to reduce the variance of the gradient estimates and thereby the final parameter estimate.

KDKW.FD.CRN_ind(s.obs, theta.0, simfun, rest = matrix(c(-Inf, Inf), ncol
  = 2, nrow = length(theta.0), byrow = T), ce, gamma = 1/6, C = 0, a,
  alpha = 1, A = 0, K, nk, Hfun = bw.nrd0.mult, Hnum, kernel,
  lg = T, ...)

KDKW.FD.CRN_ind.theta.0(theta.0, s.obs, simfun, rest = matrix(c(-Inf,
  Inf), ncol = 2, nrow = length(theta.0), byrow = T), ce, gamma = 1/6,
  C = 0, a, alpha = 1, A = 0, K, nk, Hfun = bw.nrd0.mult, Hnum,
  kernel, lg = T, ...)

KDKW.FD.CRN_ind.s.obs(s.obs, theta.0, simfun, rest = matrix(c(-Inf, Inf),
  ncol = 2, nrow = length(theta.0), byrow = T), ce, gamma = 1/6, C = 0,
  a, alpha = 1, A = 0, K, nk, Hfun = bw.nrd0.mult, Hnum, kernel,
  lg = T, ...)

`s.obs`	Vector of observed summary statistics
`theta.0`	Vector of starting point
`simfun`	name of the function which is used to simulate data (further arguments to simfun are given with ...). Note that simfun needs to have an argument "seed" for common random number generation.
`rest`	Matrix with restrictions on the parameterspace (matrix with 2 columns that contain the lower and upper bounds for the elements of theta). Use -Inf and Inf for unrestricted parameters. The default value is an unrestricted parameter space.
`ce`	numeric
`gamma`	numeric
`C`	integer
`a`	numeric
`alpha`	numeric
`A`	integer
`K`	number of steps
`nk`	number of simulations per step
`Hfun`	function which computes the bandwidth matrix. Can be one of the functions in bw.r
`Hnum`	how often shall the bandwidth matrix be computed? Possible values: "once" or an integer indicating after how many iterations the bandwidth selection is to be renewed (e. g. 1 => bandwidth selection in each step)
`kernel`	kernel function
`lg`	logical. Use the log-likelihood instead of the likelihood?
`...`	further arguments which are passed on to simulate the data with simfun

Note that currently in each iteration, a single integer valued seed is passed on to the simulation function for simulation at theta+ and theta-. This might not be enough for complex simulations and should probably be replaced by an implementation e. g. using setRNG::setRNG that sets a vector of 264 integers as seed, and maybe a flexible number of seeds, if multiple random samples need to be generated.

The simulation function needs to have an argument "seed". At the moment, this is not implemented for most simulation functions.

Note that this is an experimental function that is only for use with SIMCRNpoisson_glmm_regr_ind and a model matrix x consisting of a set of dummy variables that is equivalent to a single factor that defines groups. The summary statistics and density estimates are computed separately for each independent group, and then the overall density estimate is obtained by multiplying them.

KDKW.FD.CRN_ind.theta.0: The same function for a set of different starting values, use e. g. with apply or mclapply. It uses try catch errors.
KDKW.FD.CRN_ind.s.obs: The same function for a set of different summary statistics, use e. g. with apply or mclapply. It uses try catch errors.

The gain sequences are defined following Spall (2013):

a_k = a/(k + A)^alpha and c_k = c/(k + C)^gamma.

Usually, C>0 is only used if the algorithm is resumed. The default values alpha=1 and gamma=1/6 are given in Spall (2013).

Johanna Bertl

Bertl, J.; Ewing, G.; Kosiol, C. & Futschik, A. Approximate Maximum Likelihood Estimation for Population Genetic Inference. Statistical Applications in Genetics and Molecular Biology, 2017, 16, p. 291-312
Kiefer, J. & Wolfowitz, J. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 1952, 23, p. 462-466
Blum, J. R. Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics, 1954, 25, 737-744
Spall, J. C. Introduction to Stochastic Search and Optimization: Estimation, Simulation and Control. Wiley, 2003

x = matrix(c(rep(1, 50), rep(0, 100), rep(1, 50)), ncol=2 )
testsim = SIMCRNpoisson_glmm_regr_ind(100, c(2,3,1), seed = 1234, x)
sumstat = apply(testsim, 2, mean)

test_ind = KDKW.FD.CRN_ind(s.obs=sumstat, theta.0=c(2.5, 3.5, 1.5), simfun=SIMCRNpoisson_glmm_regr_ind, rest=matrix(c(0.5, 5, 0.5, 5, 0.1, 3), ncol=2, nrow=3, byrow=T), ce=0.1, gamma = 1/6, C=0, a = 0.5, alpha = 1, A=0, K=200, nk=20, Hfun = bw.nrd0.flex, Hnum=1, kernel=robust.unscaled.diagonal, lg = T, x=x)

matplot(test_ind$theta, t="l")