mse: mse
In verdata: Analyze Data from the Truth Commission in Colombia

mse	R Documentation

mse

Description

Prepare data for estimation and calculate estimates using run_lcmcr.

Usage

mse(
  stratum_data,
  stratum_name,
  estimates_dir = NULL,
  min_n = 1,
  K = NULL,
  buffer_size = 10000,
  sampler_thinning = 1000,
  seed = 19481210,
  burnin = 10000,
  n_samples = 10000,
  posterior_thinning = 500
)

Arguments

`stratum_data`	A data frame including all records in a stratum of interest. Columns indicating sources should be prefixed with `in_` and should be numeric.
`stratum_name`	An identifier for the stratum.
`estimates_dir`	File path for the folder containing pre-calculated estimates, if you would like to use pre-calculated results. Note, setting this option forces the model specification parameters to be identical to those used to calculate the pre-calculated estimates. Do not specify a file path If you would like to use a custom model specification.
`min_n`	The minimum number of records that must appear in a source to be considered valid for estimation. `min_n` should never be less than or equal to 0; the default value is 1.
`K`	The maximum number of latent classes to fit. By default the function will calculate `K` as the minimum value of 2 raised to the number of valid sources - 1 or 15.
`buffer_size`	Size of the tracing buffer. Default value is 10,000.
`sampler_thinning`	Thinning interval for the tracing buffer. Default value is 1,000.
`seed`	Integer seed for the internal random number generator. Default value is 19481210.
`burnin`	Number of burn in iterations. Default value is 10,000.
`n_samples`	Number of samples to be generated. Samples are taken one every `posterior_thinning` iterations of the sampler. Default value is 10,000. The final number of samples from the posterior is `n_samples` divided by 1,000.
`posterior_thinning`	Thinning interval for the sampler. Default value is 500.

Value

A data frame with five columns. validated is a logical value indicating whether the stratum is estimable, N is the draws from the posterior distribution (NA if the stratum is not estimable), valid_sources is a string indicating which sources were used in the estimation, n_obs is the number of observations on valid lists in the stratum of interest (NA if the stratum is not estimable), and stratum_name is a stratum identifier. If the stratum is estimable the return will consist of n_samples divided by 1,000 rows.

Examples


set.seed(19481210)
in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65))
in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5))
in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25))
in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0))

my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D)
mse(stratum_data = my_stratum, stratum_name = "my_stratum")

verdata documentation built on June 8, 2025, 11:46 a.m.