In this vignette, we show how to perform Latent Semantic Analysis using the quanteda package based on Grossman and Frieder's Information Retrieval, Algorithms and Heuristics.
LSA decomposes document-feature matrix into a reduced vector space that is assumed to reflect semantic structure.
New documents or queries can be 'folded-in' to this constructed latent semantic space for downstream tasks.
library(quanteda)
txt <- c(d1="Shipment of gold damaged in a fire", d2="Delivery of silver arrived in a silver truck", d3="Shipment of gold arrived in a truck" ) mydfm <- dfm(txt) mydfm
mylsa <- textmodel_lsa(mydfm)
the new document vector coordinates in the reduced 2-dimensional space is:
mylsa$docs[, 1:2]
Now the new unseen document can be represented in the reduced 2-dimensional space. The unseen query document:
querydfm <- dfm(c("gold silver truck")) %>% dfm_select(pattern = mydfm) querydfm
newq <- predict(mylsa, querydfm) newq$docs_newspace[, 1:2]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.