Parrot scales short text around common words.
Let me know if you run into any problems.
Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3044864
The scale_text function suggests a truncation for the SVD. It produces approximately the same number of pivot words.
```{r example}
library(devtools) install_github("wilryh/parrot", dependencies=TRUE)
library(stm) library(parrot)
processed <- textProcessor( input_data$text, data.frame(input_data), removestopwords=T, lowercase=T, stem=F ) out <- prepDocuments( processed$documents, processed$vocab, processed$meta )
tdm <- doc_to_tdm(out)
embeddings <- read_word_embeddings( in_vocab=out$vocab, ovefile = "O2M_overlap.txt" # must add location on your computer "path/to/O2M_overlap.txt" ## ovefile2 = "path/to/O2M_oov.txt", # very rare words and misspellings ## available here http://www.cis.uni-muenchen.de/~wenpeng/renamed-meta-emb.tar.gz ## must unpack and replace "path/to/" with location on your computer )
scores <- scale_text( meta=out$meta, tdm=tdm
)
document_scores <- score_documents( scores=scores, n_dimensions=10 )
get_keywords(scores, n_dimensions=3, n_words=15)
with(document_scores, cor(sqrt(n_words), X0, use="complete"))
plot_keywords( scores, x_dimension=1, y_dimension=2, q_cutoff=0.9 ) ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.