txt_sentiment: Perform dictionary-based sentiment analysis on a tokenised...

View source: R/sentiment.R

txt_sentimentR Documentation

Perform dictionary-based sentiment analysis on a tokenised data frame

Description

This function identifies words which have a positive/negative meaning, with the addition of some basic logic regarding occurrences of amplifiers/deamplifiers and negators in the neighbourhood of the word which has a positive/negative meaning.

  • If a negator is occurring in the neigbourhood, positive becomes negative or vice versa.

  • If amplifiers/deamplifiers occur in the neigbourhood, these amplifier weight is added to the sentiment polarity score.

This function took inspiration from qdap::polarity but was completely re-engineered to allow to calculate similar things on a udpipe-tokenised dataset. It works on a sentence level and the negator/amplification logic can not surpass a boundary defined by the PUNCT upos parts of speech tag.

Note that if you prefer to build a supervised model to perform sentiment scoring you might be interested in looking at the ruimtehol R package https://github.com/bnosac/ruimtehol instead.

Usage

txt_sentiment(
  x,
  term = "lemma",
  polarity_terms,
  polarity_negators = character(),
  polarity_amplifiers = character(),
  polarity_deamplifiers = character(),
  amplifier_weight = 0.8,
  n_before = 4,
  n_after = 2,
  constrain = FALSE
)

Arguments

x

a data.frame with the columns doc_id, paragraph_id, sentence_id, upos and the column as indicated in term. This is exactly what udpipe returns.

term

a character string with the name of a column of x where you want to apply to sentiment scoring upon

polarity_terms

data.frame containing terms which have positive or negative meaning. This data frame should contain the columns term and polarity where term is of type character and polarity can either be 1 or -1.

polarity_negators

a character vector of words which will invert the meaning of the polarity_terms such that -1 becomes 1 and vice versa

polarity_amplifiers

a character vector of words which amplify the polarity_terms

polarity_deamplifiers

a character vector of words which deamplify the polarity_terms

amplifier_weight

weight which is added to the polarity score if an amplifier occurs in the neighbourhood

n_before

integer indicating how many words before the polarity_terms word one has to look to find negators/amplifiers/deamplifiers to apply its logic

n_after

integer indicating how many words after the polarity_terms word one has to look to find negators/amplifiers/deamplifiers to apply its logic

constrain

logical indicating to make sure the aggregated sentiment scores is between -1 and 1

Value

a list containing

  • data: the x data.frame with 2 columns added: polarity and sentiment_polarity.

    • The column polarity being just the polarity column of the polarity_terms dataset corresponding to the polarity of the term you apply the sentiment scoring

    • The colummn sentiment_polarity is the value where the amplifier/de-amplifier/negator logic is applied on.

  • overall: a data.frame with one row per doc_id containing the columns doc_id, sentences, terms, sentiment_polarity, terms_positive, terms_negative, terms_negation and terms_amplification providing the aggregate sentiment_polarity score of the dataset x by doc_id as well as the terminology causing the sentiment, the number of sentences and the number of non punctuation terms in the document.

Examples

x <- c("I do not like whatsoever when an R package has soo many dependencies.",
       "Making other people install java is annoying, 
        as it is a really painful experience in classrooms.")
## Not run: 
## Do the annotation to get the data.frame needed as input to txt_sentiment
anno <- udpipe(x, "english-gum")

## End(Not run)
anno <- data.frame(doc_id = c(rep("doc1", 14), rep("doc2", 18)), 
                   paragraph_id = 1,
                   sentence_id = 1,
                   lemma = c("I", "do", "not", "like", "whatsoever", 
                             "when", "an", "R", "package", 
                             "has", "soo", "many", "dependencies", ".", 
                             "Making", "other", "people", "install", 
                             "java", "is", "annoying", ",", "as", 
                             "it", "is", "a", "really", "painful", 
                             "experience", "in", "classrooms", "."),
                   upos = c("PRON", "AUX", "PART", "VERB", "PRON", 
                            "SCONJ", "DET", "PROPN", "NOUN", "VERB", 
                             "ADV", "ADJ", "NOUN", "PUNCT", 
                             "VERB", "ADJ", "NOUN", "ADJ", "NOUN", 
                             "AUX", "VERB", "PUNCT", "SCONJ", "PRON", 
                             "AUX", "DET", "ADV", "ADJ", "NOUN", 
                             "ADP", "NOUN", "PUNCT"),
                   stringsasFactors = FALSE)
scores <- txt_sentiment(x = anno, 
              term = "lemma",
              polarity_terms = data.frame(term = c("annoy", "like", "painful"), 
                                          polarity = c(-1, 1, -1)), 
              polarity_negators = c("not", "neither"),
              polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), 
              polarity_deamplifiers = c("slightly", "somewhat"))
scores$overall
scores$data
scores <- txt_sentiment(x = anno, 
              term = "lemma",
              polarity_terms = data.frame(term = c("annoy", "like", "painful"), 
                                          polarity = c(-1, 1, -1)), 
              polarity_negators = c("not", "neither"),
              polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), 
              polarity_deamplifiers = c("slightly", "somewhat"),
              constrain = TRUE, n_before = 4,
              n_after = 2, amplifier_weight = .8)
scores$overall
scores$data

udpipe documentation built on Jan. 6, 2023, 5:06 p.m.