The goal of ucimlrepo
is to download and import data sets directly
into R from the UCI Machine Learning
Repository.
[!IMPORTANT]
This package is an unoffical port of the Python
ucimlrepo
package.[!NOTE]
Want to have datasets alongside a help documentation entry?
Check out the
{ucidata}
R package! The package provides a small selection of data sets from the UC Irvine Machine Learning Repository alongside of help entries.
You can install the development version of ucimlrepo from GitHub with:
# install.packages("remotes")
remotes::install_github("coatless-rpkg/ucimlrepo")
To use ucimlrepo
, load the package using:
library(ucimlrepo)
With the package now loaded, we can download a dataset using the
fetch_ucirepo()
function or use the list_available_datasets()
function to view a list of available datasets.
For example, to download the iris
dataset, we can use:
# Fetch a dataset by name
iris_by_name <- fetch_ucirepo(name = "iris")
names(iris_by_name)
#> [1] "data" "metadata" "variables"
There are many levels to the data returned. For example, we can extract
the original data frame containing the iris
dataset using:
iris_uci <- iris_by_name$data$original
head(iris_uci)
#> sepal length sepal width petal length petal width class
#> 1 5.1 3.5 1.4 0.2 Iris-setosa
#> 2 4.9 3.0 1.4 0.2 Iris-setosa
#> 3 4.7 3.2 1.3 0.2 Iris-setosa
#> 4 4.6 3.1 1.5 0.2 Iris-setosa
#> 5 5.0 3.6 1.4 0.2 Iris-setosa
#> 6 5.4 3.9 1.7 0.4 Iris-setosa
Alternatively, we could retrieve two data frames, one for the features and one for the targets:
iris_features <- iris_by_name$data$features
iris_targets <- iris_by_name$data$targets
We can then view the first few rows of each data frame:
head(iris_features)
#> sepal length sepal width petal length petal width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
head(iris_targets)
#> class
#> 1 Iris-setosa
#> 2 Iris-setosa
#> 3 Iris-setosa
#> 4 Iris-setosa
#> 5 Iris-setosa
#> 6 Iris-setosa
Alternatively, you can also directly query by using an ID found by using
list_available_datasets()
or by looking up the dataset on the UCI ML
Repo website:
# Fetch a dataset by id
iris_by_id <- fetch_ucirepo(id = 53)
We can also view a list of data sets available for download using the
list_available_datasets()
function:
# List available datasets
list_available_datasets()
[!NOTE]
Not all 600+ datasets on UCI ML Repo are available for download using the package. The current list of available datasets can be viewed here.
If you would like to see a specific dataset added, please submit a comment on an issue ticket in the upstream repository.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.