remove_stopwords: Remove Stopwords from a TermDocumentMatrix/DocumentTermMatrix

Description Usage Arguments Value Examples

Description

remove_stopwords - Remove stopwords and < nchar words from a TermDocumentMatrix or DocumentTermMatrix.

prep_stopwords - Join multiple vectors of words, convert to lower case, and return sorted unique words.

Usage

1
2
3
4
remove_stopwords(x, stopwords = tm::stopwords("english"), min.char = 3,
  max.char = NULL, stem = FALSE, denumber = TRUE)

prep_stopwords(...)

Arguments

x

A TermDocumentMatrix or DocumentTermMatrix.

stopwords

A vector of stopwords to remove.

min.char

The minimal length character for retained words.

max.char

The maximum length character for retained words.

stem

Logical. If TRUE the stopwords will be stemmed.

denumber

Logical. If TRUE numbers will be excluded.

...

vectors of words.

Value

Returns a TermDocumentMatrix or DocumentTermMatrix.

Examples

1
2
3
4
5
6
(x <-with(presidential_debates_2012, q_dtm(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(x)
(y <- with(presidential_debates_2012, q_tdm(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(y)

prep_stopwords("the", "ChIcken", "Hello", tm::stopwords("english"), c("John", "Josh"))

Example output

<<DocumentTermMatrix (documents: 2912, terms: 3377)>>
Non-/sparse entries: 42058/9791766
Sparsity           : 100%
Maximal term length: 16
Weighting          : term frequency (tf)
Warning message:
removeNumbers is deprecated; use remove_numbers instead 
<<DocumentTermMatrix (documents: 2912, terms: 3180)>>
Non-/sparse entries: 19014/9241146
Sparsity           : 100%
Maximal term length: 16
Weighting          : term frequency (tf)
<<TermDocumentMatrix (terms: 3377, documents: 2912)>>
Non-/sparse entries: 42058/9791766
Sparsity           : 100%
Maximal term length: 16
Weighting          : term frequency (tf)
Warning message:
removeNumbers is deprecated; use remove_numbers instead 
<<TermDocumentMatrix (terms: 3180, documents: 2912)>>
Non-/sparse entries: 19014/9241146
Sparsity           : 100%
Maximal term length: 16
Weighting          : term frequency (tf)
  [1] "a"          "about"      "above"      "after"      "again"     
  [6] "against"    "all"        "am"         "an"         "and"       
 [11] "any"        "are"        "aren't"     "as"         "at"        
 [16] "be"         "because"    "been"       "before"     "being"     
 [21] "below"      "between"    "both"       "but"        "by"        
 [26] "can't"      "cannot"     "chicken"    "could"      "couldn't"  
 [31] "did"        "didn't"     "do"         "does"       "doesn't"   
 [36] "doing"      "don't"      "down"       "during"     "each"      
 [41] "few"        "for"        "from"       "further"    "had"       
 [46] "hadn't"     "has"        "hasn't"     "have"       "haven't"   
 [51] "having"     "he"         "he'd"       "he'll"      "he's"      
 [56] "hello"      "her"        "here"       "here's"     "hers"      
 [61] "herself"    "him"        "himself"    "his"        "how"       
 [66] "how's"      "i"          "i'd"        "i'll"       "i'm"       
 [71] "i've"       "if"         "in"         "into"       "is"        
 [76] "isn't"      "it"         "it's"       "its"        "itself"    
 [81] "john"       "josh"       "let's"      "me"         "more"      
 [86] "most"       "mustn't"    "my"         "myself"     "no"        
 [91] "nor"        "not"        "of"         "off"        "on"        
 [96] "once"       "only"       "or"         "other"      "ought"     
[101] "our"        "ours"       "ourselves"  "out"        "over"      
[106] "own"        "same"       "shan't"     "she"        "she'd"     
[111] "she'll"     "she's"      "should"     "shouldn't"  "so"        
[116] "some"       "such"       "than"       "that"       "that's"    
[121] "the"        "their"      "theirs"     "them"       "themselves"
[126] "then"       "there"      "there's"    "these"      "they"      
[131] "they'd"     "they'll"    "they're"    "they've"    "this"      
[136] "those"      "through"    "to"         "too"        "under"     
[141] "until"      "up"         "very"       "was"        "wasn't"    
[146] "we"         "we'd"       "we'll"      "we're"      "we've"     
[151] "were"       "weren't"    "what"       "what's"     "when"      
[156] "when's"     "where"      "where's"    "which"      "while"     
[161] "who"        "who's"      "whom"       "why"        "why's"     
[166] "with"       "won't"      "would"      "wouldn't"   "you"       
[171] "you'd"      "you'll"     "you're"     "you've"     "your"      
[176] "yours"      "yourself"   "yourselves"

gofastr documentation built on May 2, 2019, 5:39 a.m.