In abuchmueller/Twitmo: Twitter Topic Modeling and Visualization for R

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
options(width = 100)
library(Twitmo)
library(magrittr)

Twitmo

The goal of Twitmo is to facilitate topic modeling in R with Twitter data. Twitmo provides a broad range of methods to sample, pre-process and visualize contents of geo-tagged tweets to make modeling the public discourse easy and accessible.

Common questions

Can I use `Twitmo` for pseudo-document pooling if I already sampled data earlier from Twitter without `Twitmo`?

Yes, this is possible in the Github version of Twitmo. You can use pool_tweets() on any data frame, that has a 'text' and a 'hashtags' columns that are also named that way. Any additional columns you might have, can additionally be used as document meta-data in a STM (see below).

Can I use `Twitmo` to model topical prevalence over time?

Twitmo has no built-in methods for this purpose, however slicing your data time wise and fitting multiple LDA models then comparing topical prevalence over time can be accomplished with Twitmo in conjunction with ggplot2.

Installation

Important Note for NEW users

If you are using Twitmo for the first time, you might not already have rtweet installed. If you have rtweet version >= 1.0.0 installed, you will not be able to use certain parts of Twitmo, like parsing/loading tweets because of breaking changes in rtweet. Since CRAN, by default, only distributes the latest version of a package and R does not respect upper boundaries on dependencies I am currently working on a solution. You make sure you have the correct version of rtweet installed by running

## install remotes package if it's not already
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

devtools::install_version("rtweet", version = "0.7.0", repos = "http://cran.us.r-project.org")

You can install Twitmo from CRAN with:

install.packages("Twitmo")

or install from Github where the correct version of rtweet will automatically be installed.

You can install Twitmo from Github with:

## install remotes package if it's not already
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}

## install dev version of Twitmo from github
remotes::install_github("abuchmueller/Twitmo")

Note: Installing from Github may require you to have Rtools on your system.

Collecting geo-tagged tweets

Make sure you have a regular Twitter Account before start to sample your tweets.

# Live stream tweets from the UK for 30 seconds and save to "uk_tweets.json" in current working directory
get_tweets(method = 'stream', 
           location = "GBR", 
           timeout = 30, 
           file_name = "uk_tweets.json")

# Use your own bounding box to stream US mainland tweets
get_tweets(method = 'stream', 
           location = c(-125, 26, -65, 49), 
           timeout = 30,
           file_name = "tweets_from_us_mainland.json")

Load your tweets from a json file into a data frame

A small sample with raw tweets is included in the package. Access via:

raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")
mytweets <- load_tweets(raw_path)

Pool tweets into long pseudo-document

pool <- pool_tweets(mytweets)
pool.corpus <- pool$corpus
pool.dfm <- pool$document_term_matrix

Find optimal number of topics

find_lda(pool.dfm)

Fitting a LDA model

model <- fit_lda(pool.dfm, n_topics = 7)

View most relevant terms for each topic

lda_terms(model)

or which hashtags are heavily associated with each topic

lda_hashtags(model)

Inspecting LDA distributions

Check the distribution of your LDA Model with

lda_distribution(model)

Filtering tweets

Sometimes you can build better topic models by blacklisting or whitelisting certain keywords from your data. You can do this with a keyword dictionary using the filter_tweets() function. In this example we exclude all tweets with "football" or "mood" in them from our data.

mytweets %>% dim()
filter_tweets(mytweets, keywords = "football,mood", include = FALSE) %>% dim()

Analogously if you want to run your collected tweets through a whitelist use

mytweets %>% dim()
filter_tweets(mytweets, keywords = "football,mood", include = TRUE) %>% dim()

Fitting a structural topic model (STM)

Structural topic models can be fitted with additional external covariates. In this example we metadata that comes with the tweets such as retweet count. This works with parsed unpooled tweets. Pre-processing and fitting is done with one function.

stm_model <- fit_stm(mytweets, n_topics = 7, xcov = ~ retweet_count + followers_count + reply_count + quote_count + favorite_count,
                     remove_punct = TRUE,
                     remove_url = TRUE,
                     remove_emojis = TRUE,
                     stem = TRUE,
                     stopwords = "en")

STMs can be inspected via

summary(stm_model)

Visualizing models with `LDAvis`

Make sure you have LDAvis and servr installed.

## install LDAvis package if it's not already
if (!requireNamespace("LDAvis", quietly = TRUE)) {
  install.packages("LDAvis")
}

## install servr package if it's not already
if (!requireNamespace("servr", quietly = TRUE)) {
  install.packages("servr")
}

Export fitted models into interactive LDAvis visualizations with one line of code

to_ldavis(model, pool.corpus, pool.dfm)
## for STM use (included in the stm package)
stm::toLDAvis(stm_model, stm_model$prep$documents)

Plotting geo-tagged tweets

Plot your tweets onto a static map

plot_tweets(mytweets, region = "USA(?!:Alaska|:Hawaii)", alpha=0.1)

or plot the distribution of a certain hashtag onto a static map (UK data not included)

plot_hashtag(uk_tweets, region = "UK", hashtag = "foodwaste", ignore_case=TRUE, alpha=0.2)

Interactive maps with `leaflet`

Use scroll wheel to zoom into and out of the map. Click markets to see tweets. Make sure you have the leaflet package installed.

## install leaflet package if it's not already
if (!requireNamespace("leaflet", quietly = TRUE)) {
  install.packages("leaflet")
}

cluster_tweets(mytweets)

abuchmueller/Twitmo documentation built on Sept. 14, 2022, 8:06 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

abuchmueller/Twitmo
Twitter Topic Modeling and Visualization for R

In abuchmueller/Twitmo: Twitter Topic Modeling and Visualization for R

Twitmo

Common questions

Can I use `Twitmo` for pseudo-document pooling if I already sampled data earlier from Twitter without `Twitmo`?

Can I use `Twitmo` to model topical prevalence over time?

Installation

Important Note for NEW users

Collecting geo-tagged tweets

Load your tweets from a json file into a data frame

Pool tweets into long pseudo-document

Find optimal number of topics

Fitting a LDA model

View most relevant terms for each topic

Inspecting LDA distributions

Filtering tweets

Fitting a structural topic model (STM)

Visualizing models with `LDAvis`

Plotting geo-tagged tweets

Interactive maps with `leaflet`

R Package Documentation

Browse R Packages

We want your feedback!

abuchmueller/Twitmo Twitter Topic Modeling and Visualization for R

In abuchmueller/Twitmo: Twitter Topic Modeling and Visualization for R

Twitmo

Common questions

Can I use Twitmo for pseudo-document pooling if I already sampled data earlier from Twitter without Twitmo?

Can I use Twitmo to model topical prevalence over time?

Installation

Important Note for NEW users

Collecting geo-tagged tweets

Load your tweets from a json file into a data frame

Pool tweets into long pseudo-document

Find optimal number of topics

Fitting a LDA model

View most relevant terms for each topic

Inspecting LDA distributions

Filtering tweets

Fitting a structural topic model (STM)

Visualizing models with LDAvis

Plotting geo-tagged tweets

Interactive maps with leaflet

R Package Documentation

Browse R Packages

We want your feedback!

abuchmueller/Twitmo
Twitter Topic Modeling and Visualization for R

Can I use `Twitmo` for pseudo-document pooling if I already sampled data earlier from Twitter without `Twitmo`?

Can I use `Twitmo` to model topical prevalence over time?

Visualizing models with `LDAvis`

Interactive maps with `leaflet`