knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
readUCI
facilitates the process of importing data from the University of California Irvine Machine Learning Repository. These datasets are especially good for machine learning practice, and can be used to create reproducible code examples. As of December, 2019, there are 488 available datasets.
The dataset UCI_datasets
has all of the available datasets from the repository and some characteristics, including data types and common tasks to perform with that data.
You can install the package through GitHub:
#devtools::install_github("emmal73/readUCI") library(readUCI)
read_UCI
abalone <- read_UCI("abalone", "abalone.data") head(abalone)
The data that gets imported does not have variable names, as we can see by just calling preview_names
.
#preview_names(abalone)
We can add those manually, based on the information provided in the abalone.names
file. This can also be found at the homepage for the Abalone
.
abalone_names <- read_UCI("abalone", "abalone.names") abalone_names[58:79,]
Here we will manually add the names.
names(abalone) <- c("Sex", "Length", "Diameter", "Height", "Whole Weight", "Shucked Weight", "Viscera Weight", "Shell Weight", "Rings")
Next we can run preview_names
to clean up the column names and display the names.
names(abalone) <- preview_names(abalone)
For your convenience, 5 datasets from the UCI database are imported and cleaned with the package. These are:
adult
: used to predict whether income is greater than $50K from census dataflags
: contains details of countries and their flagslas_vegas
: contains features of online reviews of 21 hotels in Las Vegastictactoe
: used for binary classification, based on possible tic-tac-toe configurationswine
: used to determine origin of wines based on chemical analysisAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.