sample.textmatrix | R Documentation |
Creates a subset of the documents of a corpus to help reduce a corpus in size through random sampling.
sample.textmatrix(textmatrix, samplesize, index.return=FALSE)
textmatrix |
A document-term matrix. |
samplesize |
Desired number of files |
index.return |
if set to true, the positions of the subset in the original column vectors will be returned as well. |
Often a corpus is so big that it cannot be processed in memory. One technique to reduce the size is to select a subset of the documents randomly, assuming that through the random selection the nature of the term sets and distributions will not be changed.
filelist |
a list of filenames of the documents in the corpus.). |
ix |
If index.return is set to true, a list is returned; |
Fridolin Wild f.wild@open.ac.uk
textmatrix
# create some files td = tempfile() dir.create(td) write( c("dog", "cat", "mouse"), file=paste(td, "D1", sep="/")) write( c("hamster", "mouse", "sushi"), file=paste(td, "D2", sep="/")) write( c("dog", "monster", "monster"), file=paste(td, "D3", sep="/")) write( c("dog", "mouse", "dog"), file=paste(td, "D4", sep="/")) # create matrices myMatrix = textmatrix(td, minWordLength=1) sample(myMatrix, 3) # clean up unlink(td, recursive=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.