fold_in | R Documentation |
Additional documents can be mapped into a pre-exisiting latent semantic space without influencing the factor distribution of the space. Applied, when additional documents must not influence the calculated existing latent semantic factor structure.
fold_in( docvecs, LSAspace )
LSAspace |
a latent semantic space generated by createLSAspace. |
docvecs |
a textmatrix. |
To keep additional documents from influencing the factor distribution
calculated previously from a particular text basis, they can be folded-in
after the singular value decomposition performed in lsa()
.
Background Information:
For folding-in, a pseudo document vector mi
of the new documents
is calculated into as shown in the equations (1) and (2) (cf. Berry et al., 1995):
(1) di = t(v) Tk Sk^(-1)
(2) mi = Tk Sk t(di)
The document vector t(v) in equation~(1) is identical to an additional
column of an input textmatrix M with the term frequencies of the
essay to be folded-in. Tk and Sk are the truncated matrices
from the SVD applied through lsa()
on a given text
collection to construct the latent semantic space. The resulting vector
mi from equation~(2) is identical to an additional column in the
textmatrix representation of the latent semantic space (as produced by
as.textmatrix()
). Be careful when using weighting schemes: you
may want to use the global weights of the training textmatrix also for
your new data that you fold-in!
textmatrix |
a textmatrix representation of the additional documents in the latent semantic space. |
Fridolin Wild f.wild@open.ac.uk
textmatrix
, lsa
, as.textmatrix
# create a first textmatrix with some files td = tempfile() dir.create(td) write( c("dog", "cat", "mouse"), file=paste(td, "D1", sep="/") ) write( c("hamster", "mouse", "sushi"), file=paste(td, "D2", sep="/") ) write( c("dog", "monster", "monster"), file=paste(td, "D3", sep="/") ) matrix1 = textmatrix(td, minWordLength=1) unlink(td, recursive=TRUE) # create a second textmatrix with some more files td = tempfile() dir.create(td) write( c("cat", "mouse", "mouse"), file=paste(td, "A1", sep="/") ) write( c("nothing", "mouse", "monster"), file=paste(td, "A2", sep="/") ) write( c("cat", "monster", "monster"), file=paste(td, "A3", sep="/") ) matrix2 = textmatrix(td, vocabulary=rownames(matrix1), minWordLength=1) unlink(td, recursive=TRUE) # create an LSA space from matrix1 space1 = lsa(matrix1, dims=dimcalc_share()) as.textmatrix(space1) # fold matrix2 into the space generated by matrix1 fold_in( matrix2, space1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.