This is a subset of the data used in the paper, which was assembled by Prettenhofer and Stein (2010). It contains 1000 reviews of books on Amazon, of which 500 were selected from the original training data and 500 from the test data.
The full dataset has been used for a variety of things, including classification using svm. The subset was chosen small enough to keep the computation time low, while still containing the examples in the paper.
A data frame with 1000 observations on the following 2 variables.
the review in text format (character)
factor indicating the sentiment of the review: negative (1) or positive (2)
Prettenhofer, P., Stein, B. (2010). Cross-language text classification using structural correspondence learning. Proceedings of the 48th annual meeting of the association for computational linguistics, 1118-1127.
1 2 3 4 5 6
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.