View source: R/textmodel_lsa.R
textmodel_lsa | R Documentation |
Fit the Latent Semantic Analysis scaling model to a dfm,
which may be weighted (for instance using quanteda::dfm_tfidf()
).
textmodel_lsa(x, nd = 10, margin = c("both", "documents", "features"))
x |
the dfm on which the model will be fit |
nd |
the number of dimensions to be included in output |
margin |
margin to be smoothed by the SVD |
svds in the RSpectra package is applied to enable the fast computation of the SVD.
a textmodel_lsa
class object, a list containing:
sk
a numeric vector containing the d values from the SVD
docs
document coordinates from the SVD (u)
features
feature coordinates from the SVD (v)
matrix_low_rank
the multiplication of udv'
data
the input data as a CSparseMatrix from the Matrix package
The number of dimensions nd
retained in LSA is an empirical
issue. While a reduction in k
can remove much of the noise, keeping
too few dimensions or factors may lose important information.
Haiyan Wang and Kohei Watanabe
Rosario, B. (2000). Latent Semantic Indexing: An Overview. Technical report INFOSYS 240 Spring Paper, University of California, Berkeley.
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6): 391.
predict.textmodel_lsa()
, coef.textmodel_lsa()
library("quanteda")
dfmat <- dfm(tokens(data_corpus_irishbudget2010))
# create an LSA space and return its truncated representation in the low-rank space
tmod <- textmodel_lsa(dfmat[1:10, ])
head(tmod$docs)
# matrix in low_rank LSA space
tmod$matrix_low_rank[,1:5]
# fold queries into the space generated by dfmat[1:10,]
# and return its truncated versions of its representation in the new low-rank space
pred <- predict(tmod, newdata = dfmat[11:14, ])
pred$docs_newspace
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.