lma_lspace | R Documentation |
Map a document-term matrix onto a latent semantic space, extract terms from a
latent semantic space (if dtm
is a character vector, or map.space =
FALSE
),
or perform a singular value decomposition of a document-term matrix (if dtm
is a matrix
and space
is missing).
lma_lspace(dtm = "", space, map.space = TRUE, fill.missing = FALSE,
term.map = NULL, dim.cutoff = 0.5, keep.dim = FALSE,
use.scan = FALSE, dir = getOption("lingmatch.lspace.dir"))
dtm |
A matrix with terms as column names, or a character vector of terms to be extracted
from a specified space. If this is of length 1 and |
space |
A matrix with terms as rownames. If missing, this will be the right singular vectors
of a singular value decomposition of |
map.space |
Logical: if |
fill.missing |
Logical: if |
term.map |
A matrix with |
dim.cutoff |
If a |
keep.dim |
Logical: if |
use.scan |
Logical: if |
dir |
Path to a folder containing spaces. |
A matrix or sparse matrix with either (a) a row per term and column per latent dimension (a latent
space, either calculated from the input, or retrieved when map.space = FALSE
), (b) a row per document
and column per latent dimension (when a dtm is mapped to a space), or (c) a row per document and
column per term (when a space is calculated and keep.dim = TRUE
).
A traditional latent semantic space is a selection of right singular vectors from the singular
value decomposition of a dtm (svd(dtm)$v[, 1:k]
, where k
is the selected number of
dimensions, decided here by dim.cutoff
).
Mapping a new dtm into a latent semantic space consists of multiplying common terms:
dtm[, ct]
%*% space[ct, ]
, where ct
=
colnames(dtm)[colnames(dtm)
%in%
rownames(space)]
– the terms common between the dtm and the space. This
results in a matrix with documents as rows, and dimensions as columns, replacing terms.
Other Latent Semantic Space functions:
download.lspace()
,
select.lspace()
,
standardize.lspace()
text <- c(
paste(
"Hey, I like kittens. I think all kinds of cats really are just the",
"best pet ever."
),
paste(
"Oh year? Well I really like cars. All the wheels and the turbos...",
"I think that's the best ever."
),
paste(
"You know what? Poo on you. Cats, dogs, rabbits -- you know, living",
"creatures... to think you'd care about anything else!"
),
paste(
"You can stick to your opinion. You can be wrong if you want. You know",
"what life's about? Supercharging, diesel guzzling, exhaust spewing,",
"piston moving ignitions."
)
)
dtm <- lma_dtm(text)
# calculate a latent semantic space from the example text
lss <- lma_lspace(dtm)
# show that document similarities between the truncated and full space are the same
spaces <- list(
full = lma_lspace(dtm, keep.dim = TRUE),
truncated = lma_lspace(dtm, lss)
)
sapply(spaces, lma_simets, metric = "cosine")
## Not run:
# specify a directory containing spaces,
# or where you would like to download spaces
space_dir <- "~/Latent Semantic Spaces"
# map to a pretrained space
ddm <- lma_lspace(dtm, "100k", dir = space_dir)
# load the matching subset of the space
# without mapping
lss_100k_part <- lma_lspace(colnames(dtm), "100k", dir = space_dir)
## or
lss_100k_part <- lma_lspace(dtm, "100k", map.space = FALSE, dir = space_dir)
# load the full space
lss_100k <- lma_lspace("100k", dir = space_dir)
## or
lss_100k <- lma_lspace(space = "100k", dir = space_dir)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.