fit_stm: Fit STM (Structural topic model)

View source: R/fit_stm.R

fit_stmR Documentation

Fit STM (Structural topic model)

Description

Estimate a structural topic model

Usage

fit_stm(
  data,
  n_topics = 2L,
  xcov,
  remove_punct = TRUE,
  stem = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  stopwords = "en",
  ...
)

Arguments

data

Data frame containing tweets and hashtags. Works with any data frame, as long as there is a "text" column of type character string and a "hashtags" column with comma separated character vectors. Can be obtained either by using load_tweets on a json object returned by Twitter's API v1.1 or by using stream_in on any json file, as long as it has a "text" and "hashtags" field. If you are unsure about the requirements you may load the sample piece of data contained in the package by following the example in the the example section of this help page.

n_topics

Integer with number of topics.

xcov

Either a \[stats]formula with an empty left-hand side specifying external covariates (meta data) to use.e.g. ~favourites_count + retweet_count or a character vector (c("favourites_count", "retweet_count")) or comma separated character string ("favourites_count,retweet_count") with column names implying which metadata to use as external covariates.

remove_punct

Logical. Indicates whether punctuation (includes Twitter hashtags and usernames) should be removed. Defaults to TRUE.

stem

Logical. If TRUE turn on word stemming for terms.

remove_url

Logical. If TRUE find and eliminate URLs beginning with http(s).

remove_emojis

Logical. If TRUE all emojis will be removed from tweets.

stopwords

a character vector, list of character vectors, dictionary or collocations object. See pattern for details. Defaults to stopwords("english").

...

Additional arguments passed to stm.

Details

Use this to function estimate a STM from a data frame of parsed Tweets. Works with unpooled Tweets only. Pre-processing and fitting is done in one run.

Value

Object of class stm. Additionally, pre-processed documents are appended into a named list called "prep".

See Also

stm

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# fit STM with tweets
stm_model <- fit_stm(mytweets,
  n_topics = 7,
  xcov = ~ retweet_count + followers_count + reply_count +
    quote_count + favorite_count,
  remove_punct = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  stem = TRUE,
  stopwords = "en"
)

## End(Not run)


abuchmueller/Twitmo documentation built on Sept. 14, 2022, 8:06 p.m.