Description Usage Arguments Value Chaining See Also Examples
Approximate the sentiment (polarity) of text by grouping variable(s). For a
full description of the sentiment detection algorithm see
sentiment
. See sentiment
for more details about the algorithm, the sentiment/valence shifter keys
that can be passed into the function, and other arguments that can be passed.
1 2 3 4 5 6 7 | sentiment_by(
text.var,
by = NULL,
averaging.function = sentimentr::average_downweighted_zero,
group.names,
...
)
|
text.var |
The text variable. Also takes a |
by |
The grouping variable(s). Default |
averaging.function |
A function for performing the group by averaging.
The default, |
group.names |
A vector of names that corresponds to group. Generally for internal use. |
... |
Other arguments passed to |
Returns a data.table with grouping variables plus:
element_id - The id number of the original vector passed to sentiment
sentence_id - The id number of the sentences within each element_id
word_count - Word count sum
med by grouping variable
sd - Standard deviation (sd
) of the sentiment/polarity score by grouping variable
ave_sentiment - Sentiment/polarity score mean
average by grouping variable
sentimentr uses non-standard evaluation when you use with()
OR
%$%
(magrittr) and looks for the vectors within the data set
passed to it. There is one exception to this...when you pass a
get_sentences()
object to sentiment_by()
to the first argument
which is text.var
it calls the sentiment_by.get_sentences_data_frame
method which requires text.var
to be a get_sentences_data_frame
object. Because this object is a data.frame
its method knows this and
knows it can access the columns of the get_sentences_data_frame
object
directly (usually text.var
is an atomic vector), it just needs the
names of the columns to grab.
To illustrate this point understand that all three of these approaches result in exactly the same output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## method 1
presidential_debates_2012 %>%
get_sentences() %>%
sentiment_by(by = c('person', 'time'))
## method 2
presidential_debates_2012 %>%
get_sentences() %$%
sentiment_by(., by = c('person', 'time'))
## method 3
presidential_debates_2012 %>%
get_sentences() %$%
sentiment_by(dialogue, by = list(person, time))
|
Also realize that a get_sentences_data_frame
object also has a column
with a get_sentences_character
class column which also has a method in
sentimentr.
When you use with()
OR %$%
then you're not actually passing
the get_sentences_data_frame
object to sentimentr and hence the
sentiment_by.get_sentences_data_frame
method isn't called rather
sentiment_by
is evaluated in the environment/data of the
get_sentences_data_frame object
. You can force the object passed this
way to be evaluated as a get_sentences_data_frame
object and thus
calling the sentiment_by.get_sentences_data_frame
method by using the
.
operator as I've done in method 2 above. Otherwise you pass the name
of the text column which is actually a get_sentences_character class
and it calls its own method. In this case the by argument expects vectors or
a list of vectors and since it's being evaluated within the data set you can
use list()
.
Other sentiment functions:
sentiment()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | mytext <- c(
'do you like it? It is red. But I hate really bad dogs',
'I am the best friend.',
"Do you really like it? I'm not happy"
)
## works on a character vector but not the preferred method avoiding the
## repeated cost of doing sentence boundary disambiguation every time
## `sentiment` is run
## Not run:
sentiment(mytext)
sentiment_by(mytext)
## End(Not run)
## preferred method avoiding paying the cost
mytext <- get_sentences(mytext)
sentiment_by(mytext)
sentiment_by(mytext, averaging.function = average_mean)
sentiment_by(mytext, averaging.function = average_weighted_mixed_sentiment)
get_sentences(sentiment_by(mytext))
(mysentiment <- sentiment_by(mytext, question.weight = 0))
stats::setNames(get_sentences(sentiment_by(mytext, question.weight = 0)),
round(mysentiment[["ave_sentiment"]], 3))
pres_dat <- get_sentences(presidential_debates_2012)
## Not run:
## less optimized way
with(presidential_debates_2012, sentiment_by(dialogue, person))
## End(Not run)
## Not run:
sentiment_by(pres_dat, 'person')
(out <- sentiment_by(pres_dat, c('person', 'time')))
plot(out)
plot(uncombine(out))
sentiment_by(out, presidential_debates_2012$person)
with(presidential_debates_2012, sentiment_by(out, time))
highlight(with(presidential_debates_2012, sentiment_by(out, list(person, time))))
## End(Not run)
## Not run:
## tidy approach
library(dplyr)
library(magrittr)
hu_liu_cannon_reviews %>%
mutate(review_split = get_sentences(text)) %$%
sentiment_by(review_split)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.