This package includes a series of functions that allow the user to extract a brand’s Twitter timeline, visualize the brand identity through the imagery posted and estimate topic models for both the timeline as well as the mentions of that brand.
Twitter extraction capabilities are courtesy of rtweet
. Image
processing and conversion into collages is thanks to the magick
package. The topic modeling process relies on the BTM
package, which
is
designed for short texts
as opposed to standard LDA, which relies on a larger corpus. The BTM
output is then processed using several functions borrowed from the
LDAvis
package.
You can install the development version of twAccount from GitHub with:
# install.packages("devtools")
devtools::install_github("taylorgrant/twAccount")
remotes::install_github("taylorgrant/twAccount")
# Full Usage
twitter_tm("BMWUSA", search_query = "bmw", topic_ct = 40, term_ct = 30)
# If you have a timeline from `rtweet` with/without mentions, and just want topic models
run_btm("BMWUSA", data_source = 'timeline', n_topics = 40, n_terms = 30)
# If you have a timeline and want images
get_pics(dir_location, timeline)
# If you want to make final collages
unique(lubridate::year(timeline$created_at)) %>%
purrr::walk(make_collage)
The package provides an overall wrapper function twitter_tm()
that
will extract (1) a brand’s Twitter timeline, (2) recent, public tweets
matching a boolean search, and (3) all images that a brand has included
in its publicly available tweets.
Once the data is extracted, a model is run to estimate topic models
based upon the Biterm Topic Models algorithm in the BTM
package.
Currently, the model is specified to run based on the number of topics
selected by the user, rather than running through a series of topic
sizes and maximizing the log-likelihood (maybe later). The topic
modeling process yields estimates of where the topics lie in 2d space,
as well as the top N most salient terms for each topic (N terms
specified by the user).
Estimation of the 2d space relies upon the
Jensen-Shannon Divergence,
the estimates of which are then scaled via PCA. This allows for easy
visualization of the topics on an X,Y axis. For example, if we took the
last 3,200 tweets from the
BMWUSA
account, a version of the 2d topic model representation would look like
the below. This can be presented in a static graph, or in an html
rmarkdown file using the crosstalk
package, or you can simply use the
d3 visualization from LDAvis
.
Ideally, the user would create a new project within RStudio, but this package will build out a new directory in the current working directory of the user.
After running the twitter_tm()
function, the new directory will follow
this structure:
├── [HANDLE]
│ └── [RUN DATE]
│ ├── [FIRST YEAR OF TWEETS]
│ ├── [NEXT YEAR OF TWEETS]
│ ├── [ANOTHER YEAR OF TWEETS]
│ ├── {handle}_tm_timeline.rds
│ ├── {handle}_tm_mentions.rds
│ ├── {handle}_twitter_info.rds
│ └── local_cols
└── [udpipe MODEL]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.