Description Usage Arguments Value Note Examples
corpus2dtm
transforms a corpus of decisions from Italian Supreme Court to a document term matrix.
1 | corpus2dtm(corpus, stopwords)
|
corpus |
a corpus of decisions from Italian Supreme Court. |
stopwords |
a character vector of stopwords. |
dtm
a base document-term matrix with minimum term length 3 and terms appearing at least in 5 documents.
Basic text cleansing steps build a base-dtm
by selecting only terms (columns)
corresponding to a suitable vocabulary. Typically, this involves converting tokens to lower-case,
removing punctuation characters, removing numbers, stemming, removing stop-words and selecting terms
with a length above a certain minimum and occurring at least in a minimum number of documents.
Package tm version >= 0.6 required.
1 2 3 4 5 6 7 | ## Not run:
library(Supreme)
data("corpus")
data("italianStopWords") # for removing italian stop words
dtm <- corpus2dtm(corpus, italianStopWords)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.