data_corpus_moviereviews: Movie reviews with polarity from Pang and Lee (2004)

Description Usage Format Details Source References Examples

Description

A corpus object containing 2,000 movie reviews classified by positive or negative sentiment.

Usage

1

Format

The corpus includes the following document variables:

sentiment

factor indicating whether a review was manually classified as positive pos or negative neg.

id1

Character counting the position in the corpus.

id2

Random number for each review.

Details

For more information, see cat(meta(data_corpus_moviereviews, "readme")).

Source

https://www.cs.cornell.edu/people/pabo/movie-review-data/

References

Pang, B., Lee, L. (2004) "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.", Proceedings of the ACL.

Examples

1
2
3
4
5
6
7
# check polarities
table(data_corpus_moviereviews$sentiment)

# make the data into sentences, because each line is a sentence
data_corpus_moviereviewsents <-
    quanteda::corpus_segment(data_corpus_moviereviews, "\n", extract_pattern = FALSE)
print(data_corpus_moviereviewsents, max_ndoc = 3)

quanteda.textmodels documentation built on April 6, 2021, 9:06 a.m.