txt_sentiment | R Documentation |
This function identifies words which have a positive/negative meaning, with the addition of some basic logic regarding occurrences of amplifiers/deamplifiers and negators in the neighbourhood of the word which has a positive/negative meaning.
If a negator is occurring in the neigbourhood, positive becomes negative or vice versa.
If amplifiers/deamplifiers occur in the neigbourhood, these amplifier weight is added to the sentiment polarity score.
This function took inspiration from qdap::polarity but was completely re-engineered to allow to calculate similar things on
a udpipe-tokenised dataset. It works on a sentence level and the negator/amplification logic can not surpass a boundary defined
by the PUNCT upos parts of speech tag.
Note that if you prefer to build a supervised model to perform sentiment scoring you might be interested in looking at the ruimtehol R package https://github.com/bnosac/ruimtehol instead.
txt_sentiment( x, term = "lemma", polarity_terms, polarity_negators = character(), polarity_amplifiers = character(), polarity_deamplifiers = character(), amplifier_weight = 0.8, n_before = 4, n_after = 2, constrain = FALSE )
x |
a data.frame with the columns doc_id, paragraph_id, sentence_id, upos and the column as indicated in |
term |
a character string with the name of a column of |
polarity_terms |
data.frame containing terms which have positive or negative meaning. This data frame should contain the columns term and polarity where term is of type character and polarity can either be 1 or -1. |
polarity_negators |
a character vector of words which will invert the meaning of the |
polarity_amplifiers |
a character vector of words which amplify the |
polarity_deamplifiers |
a character vector of words which deamplify the |
amplifier_weight |
weight which is added to the polarity score if an amplifier occurs in the neighbourhood |
n_before |
integer indicating how many words before the |
n_after |
integer indicating how many words after the |
constrain |
logical indicating to make sure the aggregated sentiment scores is between -1 and 1 |
a list containing
data: the x
data.frame with 2 columns added: polarity and sentiment_polarity.
The column polarity being just the polarity column of the polarity_terms
dataset corresponding to the polarity of the term
you apply the sentiment scoring
The colummn sentiment_polarity is the value where the amplifier/de-amplifier/negator logic is applied on.
overall: a data.frame with one row per doc_id containing the columns doc_id, sentences,
terms, sentiment_polarity, terms_positive, terms_negative, terms_negation and terms_amplification
providing the aggregate sentiment_polarity score of the dataset x
by doc_id as well as
the terminology causing the sentiment, the number of sentences and the number of non punctuation terms in the document.
x <- c("I do not like whatsoever when an R package has soo many dependencies.", "Making other people install java is annoying, as it is a really painful experience in classrooms.") ## Not run: ## Do the annotation to get the data.frame needed as input to txt_sentiment anno <- udpipe(x, "english-gum") ## End(Not run) anno <- data.frame(doc_id = c(rep("doc1", 14), rep("doc2", 18)), paragraph_id = 1, sentence_id = 1, lemma = c("I", "do", "not", "like", "whatsoever", "when", "an", "R", "package", "has", "soo", "many", "dependencies", ".", "Making", "other", "people", "install", "java", "is", "annoying", ",", "as", "it", "is", "a", "really", "painful", "experience", "in", "classrooms", "."), upos = c("PRON", "AUX", "PART", "VERB", "PRON", "SCONJ", "DET", "PROPN", "NOUN", "VERB", "ADV", "ADJ", "NOUN", "PUNCT", "VERB", "ADJ", "NOUN", "ADJ", "NOUN", "AUX", "VERB", "PUNCT", "SCONJ", "PRON", "AUX", "DET", "ADV", "ADJ", "NOUN", "ADP", "NOUN", "PUNCT"), stringsasFactors = FALSE) scores <- txt_sentiment(x = anno, term = "lemma", polarity_terms = data.frame(term = c("annoy", "like", "painful"), polarity = c(-1, 1, -1)), polarity_negators = c("not", "neither"), polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), polarity_deamplifiers = c("slightly", "somewhat")) scores$overall scores$data scores <- txt_sentiment(x = anno, term = "lemma", polarity_terms = data.frame(term = c("annoy", "like", "painful"), polarity = c(-1, 1, -1)), polarity_negators = c("not", "neither"), polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), polarity_deamplifiers = c("slightly", "somewhat"), constrain = TRUE, n_before = 4, n_after = 2, amplifier_weight = .8) scores$overall scores$data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.