as.textmatrix: Display a latent semantic space generated by Latent Semantic...

as.textmatrixR Documentation

Display a latent semantic space generated by Latent Semantic Analysis (LSA)

Description

Returns a latent semantic space (created by createLSAspace) in textmatrix format: rows are terms, columns are documents.

Usage

   as.textmatrix( LSAspace )

Arguments

LSAspace

a latent semantic space generated by createLSAspace.

Details

To allow comparisons between terms and documents, the internal format of the latent semantic space needs to be converted to a classical document-term matrix (just like the ones generated by textmatrix() that are of class ‘textmatrix’).

Remark: There are other ways to compare documents and terms using the partial matrices from an LSA space directly. See (Berry, 1995) for more information.

Value

textmatrix

a textmatrix representation of the latent semantic space.

Author(s)

Fridolin Wild f.wild@open.ac.uk

References

Berry, M., Dumais, S., and O'Brien, G (1995) Using Linear Algebra for Intelligent Information Retrieval. In: SIAM Review, Vol. 37(4), pp.573–595.

See Also

textmatrix, lsa, fold_in

Examples


# create some files
td = tempfile()
dir.create(td)
write( c("dog", "cat", "mouse"), file=paste(td, "D1", sep="/"))
write( c("hamster", "mouse", "sushi"), file=paste(td, "D2", sep="/"))
write( c("dog", "monster", "monster"), file=paste(td, "D3", sep="/"))
write( c("dog", "mouse", "dog"), file=paste(td, "D4", sep="/"))

# read files into a document-term matrix
myMatrix = textmatrix(td, minWordLength=1)

# create the latent semantic space
myLSAspace = lsa(myMatrix, dims=dimcalc_raw()) 

# display it as a textmatrix again
round(as.textmatrix(myLSAspace),2) # should give the original

# create the latent semantic space
myLSAspace = lsa(myMatrix, dims=dimcalc_share()) 

# display it as a textmatrix again
myNewMatrix = as.textmatrix(myLSAspace) 
myNewMatrix # should look be different!

# compare two terms with the cosine measure
cosine(myNewMatrix["dog",], myNewMatrix["cat",])

# compare two documents with pearson
cor(myNewMatrix[,1], myNewMatrix[,2], method="pearson")

# clean up
unlink(td, recursive=TRUE)


lsa documentation built on May 9, 2022, 9:10 a.m.

Related to as.textmatrix in lsa...