knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Predicting Yelp Review Ratings from Text

Here's a simple example that runs some test reviews through the model. I've written three straightforward reviews, and then one tricky one with lots of negations to try to fool the model.

library(yelpredict)
library(tibble)
library(magrittr)

review_examples <- tribble(
  ~"review_text", ~"star_rating",
  "This place was awful!", 1,
  "The service here was great I loved it it was amazing.", 5,
  "Meh, it was pretty good I guess but not the best.", 4,
  "Not bad, not bad at all, really the opposite of terrible. I liked it.", 5
)

review_examples 

The first step is to "flatten" the true ratings down from integer star ratings to binary positive/negative ratings. I've written a simple function called flatten_stars() to take care of this: you pipe the input data to it and tell it which column contains the numeric ratings, and it does the rest.

review_examples %>%
  flatten_stars(star_rating)

The next step is to prepare the text by finding its average AFINN score, the number of buts/nots, and which word-length quintile it falls into. The prepare_yelp() function takes care of this: you pipe the data to it and specify the column containing the text.

review_examples %>%
  flatten_stars(star_rating) %>%
  prepare_yelp(review_text)

Next we invoke the model to get the probability that each review is positive. We do this by piping our flattened and prepared data to get_prob(), which has no mandatory arguments and returns a log-odds and probability for each review.

review_examples %>%
  flatten_stars(star_rating) %>%
  prepare_yelp(review_text)%>%
  get_prob()

The last modeling step is to predict the rating, which we do with the suitably named predict_rating() function. By default it predicts a positive rating if the probability is > 0.5, but this can be modified. (Note that 0.5 gives the best results.)

review_examples %>%
  flatten_stars(star_rating) %>%
  prepare_yelp(review_text)%>%
  get_prob() %>%
  predict_rating()

Finally, if so desired you can compute the overall accuracy with the function get_accuracy(), which takes the name of the true ratings as its input.

review_examples %>%
  flatten_stars(star_rating) %>%
  prepare_yelp(review_text)%>%
  get_prob() %>%
  predict_rating() %>%
  get_accuracy(rating)

As expected, the model is 75% accurate on this toy set of data: it got confused by the review with lots of negations.



chris31415926535/yelpredict documentation built on Jan. 7, 2021, 9:34 p.m.