clean_tweets: Clean Tweets

Description Usage Arguments

View source: R/Functions.R

Description

This function takes a dataframe of raw tweets and performs some basic cleaning and tokenization. It returns the input data.frame, now with a new column for clean_text, the tweets after cleaning. It also returns the emojis in the tweets in their own column, and a count of emojis used in each tweet, for convenience.

Usage

1
2
3
clean_tweets(tweet_data, remove.mentions = TRUE, remove.hashtags = TRUE,
  remove.urls = TRUE, remove.retweets = TRUE, remove.numbers = FALSE,
  lowercase = TRUE)

Arguments

remove.mentions

TRUE by default, controls whether to remove mentions. Can be set to FALSE to keep them.

remove.hashtags

TRUE by default, controls whether to remove hashtags Can be set to FALSE to keep them.

remove.urls

TRUE by default, controls whether to remove urls. Can be set to FALSE to keep them.

remove.retweets

TRUE by default, controls whether to remove retweets. Can be set to FALSE to keep them.

remove.numbers

FALSE by default, controls whether to remove numbers Can be set to TRUE to remove them.

lowercase

TRUE by default, controls whether to convert all characters to lowercase. Can be set to FALSE to retain case.

tweets

An input dataset of raw tweets, usually from search_tweets()


seanchrismurphy/twtools documentation built on May 29, 2019, 4:27 p.m.