mse: mse

mseR Documentation

mse

Description

Prepare data for estimation and calculate estimates using run_lcmcr.

Usage

mse(
  stratum_data,
  stratum_name,
  estimates_dir = NULL,
  min_n = 1,
  K = NULL,
  buffer_size = 10000,
  sampler_thinning = 1000,
  seed = 19481210,
  burnin = 10000,
  n_samples = 10000,
  posterior_thinning = 500
)

Arguments

stratum_data

A data frame including all records in a stratum of interest. Columns indicating sources should be prefixed with in_ and should be numeric.

stratum_name

An identifier for the stratum.

estimates_dir

File path for the folder containing pre-calculated estimates, if you would like to use pre-calculated results. Note, setting this option forces the model specification parameters to be identical to those used to calculate the pre-calculated estimates. Do not specify a file path If you would like to use a custom model specification.

min_n

The minimum number of records that must appear in a source to be considered valid for estimation. min_n should never be less than or equal to 0; the default value is 1.

K

The maximum number of latent classes to fit. By default the function will calculate K as the minimum value of 2 raised to the number of valid sources - 1 or 15.

buffer_size

Size of the tracing buffer. Default value is 10,000.

sampler_thinning

Thinning interval for the tracing buffer. Default value is 1,000.

seed

Integer seed for the internal random number generator. Default value is 19481210.

burnin

Number of burn in iterations. Default value is 10,000.

n_samples

Number of samples to be generated. Samples are taken one every posterior_thinning iterations of the sampler. Default value is 10,000. The final number of samples from the posterior is n_samples divided by 1,000.

posterior_thinning

Thinning interval for the sampler. Default value is 500.

Value

A data frame with five columns. validated is a logical value indicating whether the stratum is estimable, N is the draws from the posterior distribution (NA if the stratum is not estimable), valid_sources is a string indicating which sources were used in the estimation, n_obs is the number of observations on valid lists in the stratum of interest (NA if the stratum is not estimable), and stratum_name is a stratum identifier. If the stratum is estimable the return will consist of n_samples divided by 1,000 rows.

Examples


set.seed(19481210)
in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65))
in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5))
in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25))
in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0))

my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D)
mse(stratum_data = my_stratum, stratum_name = "my_stratum")


verdata documentation built on June 8, 2025, 11:46 a.m.