topic_model_data: Topic model dataset
In MLBC: Bias Correction Methods for Models Using Synthetic Data

topic_model_data

R Documentation

Topic model dataset

Description

Dataset containing topic model outputs for demonstrating bias correction methods in topic model regressions using CEO diary data.

Usage

topic_model_data

Format

A list with 8 components:

covars: Data frame (916 x 11): Control variables
estimation_data: Data frame (916 x 672): Contains outcome ly and word frequencies
gamma_draws: Data frame (2000 x 2): MCMC draws
theta_est_full: Data frame (916 x 2): Full sample topic proportions
theta_est_samp: Data frame (916 x 2): Subsample topic proportions
beta_est_full: Data frame (2 x 654): Full sample topic-word distributions
beta_est_samp: Data frame (2 x 654): Subsample topic-word distributions
lda_data: Data frame (916 x 2): LDA validation data

Source

CEO diary data from Bandiera et al (2020), Journal of Political Economy

Examples

data(topic_model_data)

# Basic exploration
Y <- topic_model_data$estimation_data$ly
theta <- as.matrix(topic_model_data$theta_est_full)

cat("Sample size:", length(Y), "\n")
cat("Mean log employment:", round(mean(Y), 2), "\n")
cat("Topic 1 mean:", round(mean(theta[, 1]), 3), "\n")

MLBC documentation built on Aug. 8, 2025, 7:31 p.m.