Description Usage Arguments Value Examples
remove_stopwords
- Remove stopwords and < nchar words from a
TermDocumentMatrix
or DocumentTermMatrix
.
prep_stopwords
- Join multiple vectors of words, convert to lower case,
and return sorted unique words.
1 2 3 4 | remove_stopwords(x, stopwords = tm::stopwords("english"), min.char = 3,
max.char = NULL, stem = FALSE, denumber = TRUE)
prep_stopwords(...)
|
x |
A |
stopwords |
A vector of stopwords to remove. |
min.char |
The minimal length character for retained words. |
max.char |
The maximum length character for retained words. |
stem |
Logical. If |
denumber |
Logical. If |
... |
|
Returns a TermDocumentMatrix
or DocumentTermMatrix
.
1 2 3 4 5 6 | (x <-with(presidential_debates_2012, q_dtm(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(x)
(y <- with(presidential_debates_2012, q_tdm(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(y)
prep_stopwords("the", "ChIcken", "Hello", tm::stopwords("english"), c("John", "Josh"))
|
<<DocumentTermMatrix (documents: 2912, terms: 3377)>>
Non-/sparse entries: 42058/9791766
Sparsity : 100%
Maximal term length: 16
Weighting : term frequency (tf)
Warning message:
removeNumbers is deprecated; use remove_numbers instead
<<DocumentTermMatrix (documents: 2912, terms: 3180)>>
Non-/sparse entries: 19014/9241146
Sparsity : 100%
Maximal term length: 16
Weighting : term frequency (tf)
<<TermDocumentMatrix (terms: 3377, documents: 2912)>>
Non-/sparse entries: 42058/9791766
Sparsity : 100%
Maximal term length: 16
Weighting : term frequency (tf)
Warning message:
removeNumbers is deprecated; use remove_numbers instead
<<TermDocumentMatrix (terms: 3180, documents: 2912)>>
Non-/sparse entries: 19014/9241146
Sparsity : 100%
Maximal term length: 16
Weighting : term frequency (tf)
[1] "a" "about" "above" "after" "again"
[6] "against" "all" "am" "an" "and"
[11] "any" "are" "aren't" "as" "at"
[16] "be" "because" "been" "before" "being"
[21] "below" "between" "both" "but" "by"
[26] "can't" "cannot" "chicken" "could" "couldn't"
[31] "did" "didn't" "do" "does" "doesn't"
[36] "doing" "don't" "down" "during" "each"
[41] "few" "for" "from" "further" "had"
[46] "hadn't" "has" "hasn't" "have" "haven't"
[51] "having" "he" "he'd" "he'll" "he's"
[56] "hello" "her" "here" "here's" "hers"
[61] "herself" "him" "himself" "his" "how"
[66] "how's" "i" "i'd" "i'll" "i'm"
[71] "i've" "if" "in" "into" "is"
[76] "isn't" "it" "it's" "its" "itself"
[81] "john" "josh" "let's" "me" "more"
[86] "most" "mustn't" "my" "myself" "no"
[91] "nor" "not" "of" "off" "on"
[96] "once" "only" "or" "other" "ought"
[101] "our" "ours" "ourselves" "out" "over"
[106] "own" "same" "shan't" "she" "she'd"
[111] "she'll" "she's" "should" "shouldn't" "so"
[116] "some" "such" "than" "that" "that's"
[121] "the" "their" "theirs" "them" "themselves"
[126] "then" "there" "there's" "these" "they"
[131] "they'd" "they'll" "they're" "they've" "this"
[136] "those" "through" "to" "too" "under"
[141] "until" "up" "very" "was" "wasn't"
[146] "we" "we'd" "we'll" "we're" "we've"
[151] "were" "weren't" "what" "what's" "when"
[156] "when's" "where" "where's" "which" "while"
[161] "who" "who's" "whom" "why" "why's"
[166] "with" "won't" "would" "wouldn't" "you"
[171] "you'd" "you'll" "you're" "you've" "your"
[176] "yours" "yourself" "yourselves"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.