Description Usage Arguments See Also Examples
scale_text
runs pivoted text scaling
1 2 3 4 |
tdm |
A sparseMatrix. Rows are documents and columns are vocabulary. |
meta |
data.frame. Must line up with tdm etc. This is included only to keep track of any accompanying variables. It is unaltered by the function. |
tdm_vocab |
A character vector. Provide vocabulary for columns of tdm if missing in column names. |
embeddings |
A numeric matrix. A matrix of embedding values. |
embeddings_vocab |
A character vector. Provide vocabulary for rows of chosen embeddings if missing in row names. |
compress_fast |
A logical scalar. use R base (F) or RSpectra (T) |
n_dimension_compression |
An integer scalar. How many dimensions of PCA to use. The algorithm will not work if this is set too high. If left NULL, a recommended number of dimensions will be calculated automatically. |
pivot |
An integer scalar. This is the power of the pivot. It should be set as high as possible as long as algorithm still works. 2 or 4 is a good bet. If using out-of-sample embeddings, this can be set lower (e.g. 1/2). |
verbose |
A logical scalar. Print progress of the function. |
constrain_outliers |
A logical scalar. This requires in-sample words and embedding scores for documents to have approximately unit norms. Recommended for online surveys (reduce influence of bad data), focused survey questions, and online social media data. |
simple |
A logical scalar. Pivot once. |
holdout |
A logical or numeric vector. A logical or numeric vector indicating which rows to exclude from training. |
read_word_embeddings
,
get_keywords
, plot_keywords
,
score_documents
, doc_to_tdm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ## Not run:
library(stm)
library(parrot)
processed <- textProcessor(
input_data$text,
data.frame(input_data),
removestopwords = T, lowercase = T, stem = F
)
out <- prepDocuments(
processed$documents, processed$vocab, processed$meta
)
tdm <- doc_to_tdm(out)
# download and extract embeddings data first
embeddings <- read_word_embeddings(
in_vocab = out$vocab,
ovefile = "O2M_overlap.txt" # must add location on your computer "path/to/O2M_overlap.txt"
## ovefile2 = "O2M_oov.txt", # very rare words and misspellings
## available here http://www.cis.uni-muenchen.de/~wenpeng/renamed-meta-emb.tar.gz
## must unpack and replace "path/to/" with location on your computer
)
scores <- scale_text(
meta = out$meta,
tdm = tdm,
## embeddings = embeddings[["meta"]], ## limited effects on output
compress_fast = TRUE,
constrain_outliers = TRUE
)
document_scores <- score_documents(
scores = scores, n_dimensions = 10
)
get_keywords(scores, n_dimensions = 3, n_words = 15)
with(document_scores, cor(sqrt(n_words), X0, use = "complete"))
plot_keywords(
scores, x_dimension = 1, y_dimension = 2, q_cutoff = 0.9
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.