Description Usage Arguments Details See Also Examples
Some pre-process the data in some standard ways.
1 2 | text_clean(docvec, rmDuplicates = FALSE, cores = 6, stems = NULL,
partial = FALSE)
|
rmDuplicates |
if remove duplicated tweets |
cores |
number of cores for parallel computing |
stems |
customized stems to be removed |
partial |
partial cleaning. step 1 to 11 |
tweets |
tweets retrieved from |
1. Convert to basic ASCII text to avoid silly characters
2. Make everything consistently lower case
3. Remove the "RT" (retweet) so duplicates are duplicates
4. Remove links
5. Remove punctuation
6. Remove tabs
7. "&" is "&" in HTML, so after punctuation removed ...
8. Leading blanks
9. Lagging blanks
10. General spaces (should just do all whitespaces no?)
11. Get rid of duplicates!
12. Convert to tm corpus
13. Remove English stop words.
14. Remove numbers.
15. Stem the words.
16. Remove the customized stems
tweet_corpus
1 2 3 | setupTwitterConn()
tweets <- tweet_corpus(search = "audusd", n = 100, since = as.character(Sys.Date()-7), until = as.character(Sys.Date()))
tweets <- text_clean(tweets$v, rmDuplicates = FALSE, cores = 6, stems = c("audusd"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.