wine_reviews: Wine Reviews data
In m-clark/noiris: Data sets

Description Usage Format References Examples

Two data sets regarding wine reviews.

1	wine_reviews

The first is wine_reviews, a CC0 release of data obtained from Kaggle. I combined the two data sets available there and removed duplicates. Not every data point will have twitter info, but all will have most of the columns. The result is a data frame of more than 160 thousand rows and 13 columns. The reviews can serve as an example for text analysis, specifically sentiment analysis.

country: The country that the wine is from
description: The reviewers description of the data
designation: The vineyard within the winery where the grapes that made the wine are from
points: The number of points Wine Enthusiast rated the wine on a scale of 80-100
price: The cost for a bottle of the wine
province: The province or state that the wine is from
region_1: The wine growing area in a province or state (i.e. Napa)
region_2: Sometimes there are more specific regions specified within a wine growing area (i.e. Rutherford inside the Napa Valley)
taster_name: The reviewer's name
taster_twitter_handle: The reviewer's Twitter handle
title: The title of the wine review, which often contains the vintage if you're interested in extracting that feature.
variety: The type of grapes used to make the wine (i.e. Pinot Noir)
winery: The winery that made the wine

The second data set is wine_quality, obtained from the UCI repository, and the one that I use in my Introduction to Machine Learning document. It has nearly 6500 rows and 15 columns, mostly with physicochemical qualities of the wine. It can be used for standard regression using the quality score, or classification for color or 'good' quality. However, more than 90% of the scores are 5-7, so it can so it can also serve as an ordinal regression example with appropriate collapsing.

color: Labels are 'red' and 'white'
white: A binary based on color. White == 1.
fixed_acidity: tartaric acid - g / dm^3
volatile_acidity: acetic acid - g / dm^3
citric_acid: g / dm^3
residual_sugar: g / dm^3
chlorides: sodium chloride - g / dm^3
free_sulfur_dioxide: mg / dm^3
total_sulfur_dioxide: mg / dm^3
density: g / cm^3
pH: pH level
sulphates: potassium sulphate - g / dm^3
alcohol: % by volume
quality: Technically 0 (very bad) - 10 (excellent), but actual scores are from 3 to 9
good: Quality scores of 6 or greater. Labels are 'Good' and 'Bad'

wine_reviews: Kaggle

wine_quality: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. link; UCI Link

1
2
3

library(noiris)
str(wine_reviews)
str(wine_quality)

m-clark/noiris documentation built on Sept. 9, 2019, 9:08 a.m.

m-clark/noiris index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

m-clark/noiris
Data sets

wine_reviews: Wine Reviews data
In m-clark/noiris: Data sets

Description

Usage

Format

References

Examples

Related to wine_reviews in m-clark/noiris...

R Package Documentation

Browse R Packages

We want your feedback!

m-clark/noiris Data sets

wine_reviews: Wine Reviews data In m-clark/noiris: Data sets

Description

Usage

Format

References

Examples

Related to wine_reviews in m-clark/noiris...

R Package Documentation

Browse R Packages

We want your feedback!

m-clark/noiris
Data sets

wine_reviews: Wine Reviews data
In m-clark/noiris: Data sets