data_corpus_LMRD: Large Movie Review Dataset from Maas et. al. (2011)
In quanteda/quanteda.classifiers: Models for supervised text classification

data_corpus_LMRD

R Documentation

Large Movie Review Dataset from Maas et. al. (2011)

Description

A corpus object containing a dataset for sentiment classification containing 25,000 highly polar movie reviews for training, and 25,000 for testing, from Maas et. al. (2011).

Usage

data_corpus_LMRD

Format

The corpus docvars consist of:

docnumber: serial (within set and polarity) document number
rating: user-assigned movie rating on a 1-10 point integer scale
set: used for test v. training set
polarity: either neg or pos to indicate whether the movie review was negative or positive. See Maas et al (2011) for the cut-off values that governed this assignment.

Source

http://ai.stanford.edu/~amaas/data/sentiment/

References

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis". The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.