Create DSM Object From tm Package (wordspace)

Share:

Description

Convert a tm term-document or document-term matrix into a wordspace DSM object.

Usage

1
2
3
4
## S3 method for class 'TermDocumentMatrix'
as.dsm(obj, ..., verbose=FALSE)
## S3 method for class 'DocumentTermMatrix'
as.dsm(obj, ..., verbose=FALSE)

Arguments

obj

an term-document or document-term matrix from the tm package, i.e. an object of a class TermDocumentMatrix or DocumentTermMatrix.

...

additional arguments are ignored

verbose

if TRUE, a few progress and information messages are shown

Value

An object of class dsm.

Author(s)

Stefan Evert (http://purl.org/stefan.evert)

See Also

as.dsm and the documentation of the tm package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
library(tm) # tm package needs to be installed
data(crude) # news messages on crude oil from Reuters corpus

cat(as.character(crude[[1]]), "\n") # a text example

corpus <- tm_map(crude, stripWhitespace) # some pre-processing
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("english"))

cat(as.character(corpus[[1]]), "\n") # pre-processed text

dtm <- DocumentTermMatrix(corpus) # document-term matrix
inspect(dtm[1:5, 90:99])   # rows = documents

wordspace_dtm <- as.dsm(dtm, verbose=TRUE) # convert to DSM
print(wordspace_dtm$S[1:5, 90:99]) # same part of dtm as above

wordspace_tdm <- t(wordspace_dtm) # convert to term-document matrix
print(wordspace_tdm)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.