bigram_adjustment: Bigram negation adjustment

Description Usage Arguments Details

View source: R/fct_tidytext.R

Description

Makes adjustments to assign a negative score to phrases like "I am not happy", that would have gotten a positive score, had it not been adjusted (looking at single word at a time).

Usage

1
bigram_adjustment(lexicons, tweets_by_id, negation_words, stop_words)

Arguments

lexicons

Lexicons to use, A named list of tibbles.

tweets_by_id

Texts of tweets, processed in produce_analysis_df

negation_words

Negation words to use from TweetAnalysis R6 class. A character vector.

stop_words

Stop words to use from TweetAnalysis R6 class. A tibble.

Details

Traditionally, a single word tokenization results in a single row of "word to sentiment value" per word. This function tokenizes the texts with 2 words. Any token that has as the first word, a negative word per negation_words, instead gets two rows. One with the full 2-word token, and another row with the original word. The sentiment value of both rows is the sentiment value of the original word multiplied by -1. Then both these rows are appended to the 1-word-tokenized tibble, and are summed at the word/tweet level, canceling out the original word's sentiment, and adding the bigram sentiment. Since we are specifically looking only for the negative words, stop words will exclude negation words.


jiwanheo/senTWEETment documentation built on Jan. 20, 2022, 3:20 a.m.