knitr::read_chunk(here::here("code", "chunk-options.R")) devtools::load_all()
The last thing we'll look at before presenting plots for the final model is the color distribution over each topic. This gives us a picture of what our color themes actually are!
library(dplyr) library(purrr) library(ggplot2)
load_data(sample_data = FALSE)
knitr::read_chunk(here::here("code", "compare-models.R"))
For these plots the distribution is represented by a weighted relevance score (the same that is used in the [
LDAvis` package](http://www.kennyshirley.com/LDAvis/#topic=0&lambda=0.61&term=).
The beta $\beta$ matrix, gives the posterior distribution of words given a topic, $p(w|t)$. Relevance is computed [ \text{relevance}(w|t) = \lambda \cdot p(w|t) + (1-\lambda)\cdot \frac{p(w|t)}{p(w)}. ]
Even though our model scores might have leaned towards a model with fewere topics, we can see specific topics where adding more models separates themes that appear to be quite different. The firs two examples are of topic # 2 from the 30 topic model which seems more coherent in the 40 topic model (the sencond plot).
One final plot from topic 32 that I looked questionable but seems to have grouped some related (if small) sets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.