lpmec: lpmec

View source: R/lpme_DoBootPartition.R

lpmecR Documentation

lpmec

Description

Implements bootstrapped analysis for latent variable models with measurement error correction

Usage

lpmec(
  Y,
  observables,
  observables_groupings = colnames(observables),
  orientation_signs = NULL,
  make_observables_groupings = FALSE,
  n_boot = 32L,
  n_partition = 10L,
  boot_basis = 1:length(Y),
  return_intermediaries = TRUE,
  ordinal = FALSE,
  estimation_method = "em",
  latent_estimation_fn = NULL,
  mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L,
    batch_size = 512L, chain_method = "parallel", subsample_method = "full",
    anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L),
  conda_env = "lpmec",
  conda_env_required = FALSE
)

Arguments

Y

A vector of observed outcome variables

observables

A matrix of observable indicators used to estimate the latent variable

observables_groupings

A vector specifying groupings for the observable indicators. Default is column names of observables.

orientation_signs

(optional) A numeric vector of length equal to the number of columns in 'observables', containing 1 or -1 to indicate the desired orientation of each column. If provided, each column of 'observables' will be oriented by this sign before analysis. Default is NULL (no orientation applied).

make_observables_groupings

Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE.

n_boot

Integer. Number of bootstrap iterations. Default is 32.

n_partition

Integer. Number of partitions for each bootstrap iteration. Default is 10.

boot_basis

Vector of indices or grouping variable for stratified bootstrap. Default is 1:length(Y).

return_intermediaries

Logical. If TRUE, returns intermediate results. Default is TRUE.

ordinal

Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE).

estimation_method

Character specifying the estimation approach. Options include:

  • "em" (default): Uses expectation-maximization via emIRT package. Supports both binary (via emIRT::binIRT) and ordinal (via emIRT::ordIRT) indicators.

  • "pca": First principal component of observables.

  • "averaging": Uses feature averaging.

  • "mcmc": Markov Chain Monte Carlo estimation using either pscl::ideal (R backend) or numpyro (Python backend)

  • "mcmc_joint": Joint Bayesian model that simultaneously estimates latent variables and outcome relationship using numpyro

  • "mcmc_overimputation": Two-stage MCMC approach with measurement error correction via over-imputation

  • "custom": In this case, latent estimation performed using latent_estimation_fn.

latent_estimation_fn

Custom function for estimating latent trait from observables if estimation_method="custom" (optional). The function should accept a matrix of observables (rows are observations) and return a numeric vector of length equal to the number of observations.

mcmc_control

A list indicating parameter specifications if MCMC used.

backend

Character string indicating the MCMC engine to use. Valid options are "pscl" (default, uses the R-based pscl::ideal function) or "numpyro" (uses the Python numpyro package via reticulate).

n_samples_warmup

Integer specifying the number of warm-up (burn-in) iterations before samples are collected. Default is 500.

n_samples_mcmc

Integer specifying the number of post-warmup MCMC iterations to retain. Default is 1000.

chain_method

Character string passed to numpyro specifying how to run multiple chains. Options: "parallel" (default), "sequential", or "vectorized".

n_thin_by

Integer indicating the thinning factor for MCMC samples. Default is 1.

n_chains

Integer specifying the number of parallel MCMC chains to run. Default is 2.

conda_env

A character string specifying the name of the conda environment to use via reticulate. Default is "lpmec".

conda_env_required

A logical indicating whether the specified conda environment must be strictly used. If TRUE, an error is thrown if the environment is not found. Default is FALSE.

Details

This function implements a bootstrapped latent variable analysis with measurement error correction. It performs multiple bootstrap iterations, each with multiple partitions. For each partition, it calls the lpmec_onerun function to estimate latent variables and apply various correction methods. The results are then aggregated across partitions and bootstrap iterations to produce final estimates and bootstrap standard errors.

Value

A list containing various estimates and statistics (in snake_case):

  • ols_coef: Coefficient from naive OLS regression.

  • ols_se: Standard error of naive OLS coefficient.

  • ols_tstat: T-statistic of naive OLS coefficient.

  • iv_coef: Coefficient from instrumental variable (IV) regression.

  • iv_se: Standard error of IV regression coefficient.

  • iv_tstat: T-statistic of IV regression coefficient.

  • corrected_iv_coef: IV regression coefficient corrected for measurement error.

  • corrected_iv_se: Standard error of the corrected IV coefficient (currently NA).

  • corrected_iv_tstat: T-statistic of the corrected IV coefficient.

  • var_est: Estimated variance of the measurement error (split-half variance).

  • corrected_ols_coef: OLS coefficient corrected for measurement error.

  • corrected_ols_se: Standard error of the corrected OLS coefficient (currently NA).

  • corrected_ols_tstat: T-statistic of the corrected OLS coefficient (currently NA).

  • corrected_ols_coef_alt: Alternative corrected OLS coefficient (if applicable).

  • corrected_ols_se_alt: Standard error for the alternative corrected OLS coefficient (if applicable).

  • corrected_ols_tstat_alt: T-statistic for the alternative corrected OLS coefficient (if applicable).

  • bayesian_ols_coef_outer_normed: Posterior mean of the OLS coefficient under MCMC, after normalizing by the overall sample standard deviation.

  • bayesian_ols_se_outer_normed: Posterior standard error corresponding to bayesian_ols_coef_outer_normed.

  • bayesian_ols_tstat_outer_normed: T-statistic for bayesian_ols_coef_outer_normed.

  • bayesian_ols_coef_inner_normed: Posterior mean of the OLS coefficient under MCMC, after normalizing each posterior draw individually.

  • bayesian_ols_se_inner_normed: Posterior standard error corresponding to bayesian_ols_coef_inner_normed.

  • bayesian_ols_tstat_inner_normed: T-statistic for bayesian_ols_coef_inner_normed.

  • m_stage_1_erv: Extreme robustness value (ERV) for the first-stage regression (x_est2 on x_est1), if computed.

  • m_reduced_erv: ERV for the reduced model (Y on x_est1), if computed.

  • x_est1: First set of latent variable estimates.

  • x_est2: Second set of latent variable estimates.

References

Jerzak, C. T. and Jessee, S. A. (2025). Attenuation Bias with Latent Predictors. arXiv:2507.22218 [stat.AP]. https://arxiv.org/abs/2507.22218

Examples


# Generate some example data
set.seed(123)
Y <- rnorm(1000)
observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10))

# Run the bootstrapped analysis
results <- lpmec(Y = Y,
                 observables = observables,
                 n_boot = 10,    # small values for illustration only
                 n_partition = 5 # small for size
                 )

# View the corrected IV coefficient and its standard error
print(results)



lpmec documentation built on Feb. 9, 2026, 5:07 p.m.