knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette measures the runtime of a few steps in the alignment workflow. Running this vignette with $V = 1000$, $N = 250$ gives the estimates reported in the accompanying manuscript.
library(MCMCpack) library(alto) library(dplyr) library(purrr) library(stringr) library(tictoc) source("https://raw.githubusercontent.com/krisrs1128/topic_align/main/simulations/simulation_functions.R")
For this simulation, we work with simulated LDA data, as in the "Identifying True Topics" vignette.
attach(params) lambdas <- list(beta = 0.1, gamma = .5, count = 1e4) betas <- rdirichlet(K, rep(lambdas$beta, V)) gammas <- rdirichlet(N, rep(lambdas$gamma, K)) x <- simulate_lda(betas, gammas, lambda = lambdas$count)
We split model running and alignment, so we can measure the computation times
separately. We use the tictoc
library for this. In general, running the LDA
models consumes the majority of the time in an alignment workflow, especially
when the sample or vocabulary size is large.
lda_params <- map(1:n_models, ~ list(k = .)) names(lda_params) <- str_c("K", 1:n_models) tic() lda_models <- run_lda_models(x, lda_params, reset = TRUE) toc() tic() align_topics(lda_models, method = "product") toc() tic() align_topics(lda_models, method = "transport") toc()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.