estimateModel: Estimate model from single reliable dataset.
In StatisticsNZ/demest: Bayesian Demographic Estimation and Forecasting

estimateModel

R Documentation

Estimate model from single reliable dataset.

Description

Estimate rates, counts, probabilities, or means for a single demographic array. The demographic array is treated as observed without error.

Usage

estimateModel(
  model,
  y,
  exposure = NULL,
  weights = NULL,
  filename = NULL,
  nBurnin = 1000,
  nSim = 1000,
  nChain = 4,
  nThin = 1,
  parallel = TRUE,
  nCore = NULL,
  outfile = NULL,
  nUpdateMax = 50,
  verbose = TRUE,
  useC = TRUE
)

Arguments

`model`	An object of class `SpecModel`, specifying the model to be fit.
`y`	A `demographic array` holding the outcome data.
`exposure`	A `Counts` object specifying exposure or sample size.
`weights`	A `Counts` object containing weights.
`filename`	The name of a file where output is collected.
`nBurnin`	Number of iteration discarded before recording begins.
`nSim`	Number of iterations carried out during recording.
`nChain`	Number of independent chains to use.
`nThin`	Thinning interval.
`parallel`	Logical. If `TRUE` (the default), parallel processing is used.
`nCore`	The number of cores to use, when `parallel` is `TRUE`. If no value supplied, defaults to `nChain`.
`outfile`	Where to direct the ‘stdout’ and ‘stderr’ connection output from the workers when parallel processing. Passed to function `[parallel]{makeCluster}`.
`nUpdateMax`	Maximum number of iterations completed before releasing memory. If running out of memory, setting a lower value than the default may help.
`verbose`	Logical. If `TRUE` (the default) a message is printed at the end of the calculations.
`useC`	Logical. If `TRUE` (the default), the calculations are done in C. Setting `useC` to `FALSE` may be useful for debugging.

Model, y, and exposure

The model for the contents of the array is specified using function Model.

If model specifies a Poisson, binomial, or multinomial model, then y must have class Counts. If model specifies a normal distribution, then y can have class Counts or Counts.

y may include NAs. Missing values can be imputed via function function fetch. If model specifies a Poisson distribution, then y can also have known subtotals, which can help with the imputation of the missing values.

An exposure term is optional in Poisson models, and required in binomial models. (For convenience, demest treats the sample size parameter in binomial models as kind of exposure.) A weights term is optional in normal models.

Output

The output from estimateModel would often be too large to fit into memory. estimateModel therefore departs from the standard R behavior in the way it handles output. Rather than returning an object containing the output, estimateModel creates a file on disk, somewhat like a database.

The name and location of the output file is specified using the filename argument. The file is just a text file, and can be copied and moved.

Users extract items from the file using function such as fetch, fetchSummary, fetchMCMC, and fetchFiniteSD.

Functions estimateCounts, estimateAccount, and predictModel follow the same strategy for returning output.

nBurnin, nSim, nChain, nThin

estimateModel, estimateCounts, and estimateAccount use Markov chain Monte Carlo (MCMC) methods for inference. MCMC methods have two stages: burnin and production. During the burnin phase, the model moves from an initial guess at the location of the posterior distribution towards the true location. During the production phase, if all goes well, the model samples from the true posterior distribution.

Parameter nBurnin specifies the number of iterations that the model spendss moving away from its initial location. Parameter nSim specifies the number of iterations that the model spends sampling from the posterior distribution.

Collecting every iteration during the production phase would lead to huge output files. Instead, the model collects only one in every nThin iterations. The resulting loss in information is relatively small, since successive iterations are typically highly correlated.

The calculations are run nChain times, with each chain yielding a different sample. As described in the documentation for fetchMCMC, comparing the samples is a way of checking whether the model has found the posterior distribution. When each chain seems to be sampling from the same distribution, the model is said to have converged.

At the end of the estimation process, the estimateModel and similar functions pool the results from all the chains to form a single sample. This sample has floor(nChain * nSim / nThin) iterations.

References

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2013) Bayesian Data Analysis. Third Edition. Boca Raton: Chapman & Hall/CRC.

Examples

library(datasets)
admissions <- Counts(UCBAdmissions)
admitted <- subarray(admissions, Admit == "Admitted")
filename <- tempfile()
estimateModel(Model(y ~ Binomial(mean ~ Gender + Dept)),
              y = admitted,
              exposure = admissions,
              file = filename,
              nBurnin = 50,
              nSim = 50,
              nChain = 2,
              nThin = 2)
fetchSummary(filename)

StatisticsNZ/demest documentation built on Nov. 2, 2023, 7:56 p.m.