estimateModel: Estimate model from single reliable dataset.

View source: R/estimate-functions.R

estimateModelR Documentation

Estimate model from single reliable dataset.

Description

Estimate rates, counts, probabilities, or means for a single demographic array. The demographic array is treated as observed without error.

Usage

estimateModel(
  model,
  y,
  exposure = NULL,
  weights = NULL,
  filename = NULL,
  nBurnin = 1000,
  nSim = 1000,
  nChain = 4,
  nThin = 1,
  parallel = TRUE,
  nCore = NULL,
  outfile = NULL,
  nUpdateMax = 50,
  verbose = TRUE,
  useC = TRUE
)

Arguments

model

An object of class SpecModel, specifying the model to be fit.

y

A demographic array holding the outcome data.

exposure

A Counts object specifying exposure or sample size.

weights

A Counts object containing weights.

filename

The name of a file where output is collected.

nBurnin

Number of iteration discarded before recording begins.

nSim

Number of iterations carried out during recording.

nChain

Number of independent chains to use.

nThin

Thinning interval.

parallel

Logical. If TRUE (the default), parallel processing is used.

nCore

The number of cores to use, when parallel is TRUE. If no value supplied, defaults to nChain.

outfile

Where to direct the ‘stdout’ and ‘stderr’ connection output from the workers when parallel processing. Passed to function [parallel]{makeCluster}.

nUpdateMax

Maximum number of iterations completed before releasing memory. If running out of memory, setting a lower value than the default may help.

verbose

Logical. If TRUE (the default) a message is printed at the end of the calculations.

useC

Logical. If TRUE (the default), the calculations are done in C. Setting useC to FALSE may be useful for debugging.

Model, y, and exposure

The model for the contents of the array is specified using function Model.

If model specifies a Poisson, binomial, or multinomial model, then y must have class Counts. If model specifies a normal distribution, then y can have class Counts or Counts.

y may include NAs. Missing values can be imputed via function function fetch. If model specifies a Poisson distribution, then y can also have known subtotals, which can help with the imputation of the missing values.

An exposure term is optional in Poisson models, and required in binomial models. (For convenience, demest treats the sample size parameter in binomial models as kind of exposure.) A weights term is optional in normal models.

Output

The output from estimateModel would often be too large to fit into memory. estimateModel therefore departs from the standard R behavior in the way it handles output. Rather than returning an object containing the output, estimateModel creates a file on disk, somewhat like a database.

The name and location of the output file is specified using the filename argument. The file is just a text file, and can be copied and moved.

Users extract items from the file using function such as fetch, fetchSummary, fetchMCMC, and fetchFiniteSD.

Functions estimateCounts, estimateAccount, and predictModel follow the same strategy for returning output.

nBurnin, nSim, nChain, nThin

estimateModel, estimateCounts, and estimateAccount use Markov chain Monte Carlo (MCMC) methods for inference. MCMC methods have two stages: burnin and production. During the burnin phase, the model moves from an initial guess at the location of the posterior distribution towards the true location. During the production phase, if all goes well, the model samples from the true posterior distribution.

Parameter nBurnin specifies the number of iterations that the model spendss moving away from its initial location. Parameter nSim specifies the number of iterations that the model spends sampling from the posterior distribution.

Collecting every iteration during the production phase would lead to huge output files. Instead, the model collects only one in every nThin iterations. The resulting loss in information is relatively small, since successive iterations are typically highly correlated.

The calculations are run nChain times, with each chain yielding a different sample. As described in the documentation for fetchMCMC, comparing the samples is a way of checking whether the model has found the posterior distribution. When each chain seems to be sampling from the same distribution, the model is said to have converged.

At the end of the estimation process, the estimateModel and similar functions pool the results from all the chains to form a single sample. This sample has floor(nChain * nSim / nThin) iterations.

References

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2013) Bayesian Data Analysis. Third Edition. Boca Raton: Chapman & Hall/CRC.

See Also

estimateCounts is similar to estimateModel, except that y is not observed directly, but must be inferred from multiple noisy datasets. estimateAccount infers a demographic account from multiple noisy datasets. Calculations can be extended using continueEstimation. Forecasts based on the results from estimateModel can be constructed using function predictModel.

Examples

library(datasets)
admissions <- Counts(UCBAdmissions)
admitted <- subarray(admissions, Admit == "Admitted")
filename <- tempfile()
estimateModel(Model(y ~ Binomial(mean ~ Gender + Dept)),
              y = admitted,
              exposure = admissions,
              file = filename,
              nBurnin = 50,
              nSim = 50,
              nChain = 2,
              nThin = 2)
fetchSummary(filename)

StatisticsNZ/demest documentation built on Nov. 2, 2023, 7:56 p.m.