pool_tweets | R Documentation |
This function pools a data frame of parsed tweets into document pools.
pool_tweets( data, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE, remove_emojis = TRUE, remove_users = TRUE, remove_hashtags = TRUE, cosine_threshold = 0.9, stopwords = "en", n_grams = 1L )
data |
Data frame containing tweets and hashtags. Works with any data frame, as long as there
is a "text" column of type character string and a "hashtags" column with comma separated character vectors.
Can be obtained either by using |
remove_numbers |
Logical. If |
remove_punct |
Logical. If |
remove_symbols |
Logical. If |
remove_url |
Logical. If |
remove_emojis |
Logical. If |
remove_users |
Logical. If |
remove_hashtags |
Logical. If |
cosine_threshold |
Double. Value between 0 and 1 specifying the cosine similarity threshold to be used for document pooling. Tweets without a hashtag will be assigned to document (hashtag) pools based upon this metric. Low thresholds will reduce topic coherence by including a large number of tweets without a hashtag into the document pools. Higher thresholds will lead to more coherent topics but will reduce document sizes. |
stopwords |
a character vector, list of character vectors, dictionary or collocations object. See pattern for details. Defaults to stopwords("english"). |
n_grams |
Integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a n in the n-gram(s) that are produced. See tokens_ngrams |
Pools tweets by hashtags using cosine similarity to create longer pseudo-documents for better LDA estimation and creates n-gram tokens. The method applies an implementation of the pooling algorithm from Mehrotra et al. 2013.
List with corpus object and dfm object of pooled tweets.
Mehrotra, Rishabh & Sanner, Scott & Buntine, Wray & Xie, Lexing. (2013). Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. 889-892. 10.1145/2484028.2484166.
tokens, dfm
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) pool <- pool_tweets( data = mytweets, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE, remove_users = TRUE, remove_hashtags = TRUE, remove_emojis = TRUE, cosine_threshold = 0.9, stopwords = "en", n_grams = 1 ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.