readUCI
facilitates the process of importing data from the University
of California Irvine Machine Learning
Repository. These datasets
are especially good for machine learning practice, and can be used to
create reproducible code examples. As of December, 2019, there are 488
available datasets.
The dataset UCI_datasets
has all of the available datasets from the
repository and some characteristics, including data types and common
tasks to perform with that data.
You can install the package through GitHub:
#devtools::install_github("emmal73/readUCI")
library(readUCI)
read_UCI
abalone <- read_UCI("abalone", "abalone.data")
head(abalone)
#> # A tibble: 6 x 9
#> X1 X2 X3 X4 X5 X6 X7 X8 X9
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 M 0.455 0.365 0.095 0.514 0.224 0.101 0.15 15
#> 2 M 0.35 0.265 0.09 0.226 0.0995 0.0485 0.07 7
#> 3 F 0.53 0.42 0.135 0.677 0.256 0.142 0.21 9
#> 4 M 0.44 0.365 0.125 0.516 0.216 0.114 0.155 10
#> 5 I 0.33 0.255 0.08 0.205 0.0895 0.0395 0.055 7
#> 6 I 0.425 0.3 0.095 0.352 0.141 0.0775 0.12 8
The data that gets imported does not have variable names, as we can see
by just calling preview_names
.
We can add those manually, based on the information provided in the
abalone.names
file. This can also be found at the homepage for the
Abalone
.
abalone_names <- read_UCI("abalone", "abalone.names")
abalone_names[58:79,]
#> # A tibble: 22 x 1
#> X1
#> <chr>
#> 1 7. Attribute information:
#> 2 " Given is the attribute name"
#> 3 " brief description. The number of rings is the value to predict: ei~
#> 4 " as a continuous value or as a classification problem."
#> 5 "\tName\t\tData Type\tMeas.\tDescription"
#> 6 "\t----\t\t---------\t-----\t-----------"
#> 7 "\tSex\t\tnominal\t\t\tM"
#> 8 "\tLength\t\tcontinuous\tmm\tLongest shell measurement"
#> 9 "\tDiameter\tcontinuous\tmm\tperpendicular to length"
#> 10 "\tHeight\t\tcontinuous\tmm\twith meat in shell"
#> # ... with 12 more rows
Here we will manually add the names.
names(abalone) <- c("Sex", "Length", "Diameter", "Height", "Whole Weight", "Shucked Weight", "Viscera Weight", "Shell Weight", "Rings")
Next we can run preview_names
to clean up the column names and display
the names.
names(abalone) <- preview_names(abalone)
#> [1] "sex" "length" "diameter" "height"
#> [5] "whole_weight" "shucked_weight" "viscera_weight" "shell_weight"
#> [9] "rings"
For your convenience, 5 datasets from the UCI database are imported and cleaned with the package. These are:
adult
: used to predict whether income is greater than $50K from
census dataflags
: contains details of countries and their flagslas_vegas
: contains features of online reviews of 21 hotels in Las
Vegastictactoe
: used for binary classification, based on possible
tic-tac-toe configurationswine
: used to determine origin of wines based on chemical analysisAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.