download_fashion_mnist: Download Fashion-MNIST
In jlmelville/snedata: SNE Simulation Dataset Functions

download_fashion_mnist

R Documentation

Download Fashion-MNIST

Description

Download Fashion-MNIST database of images of fashion products.

Usage

download_fashion_mnist(base_url = fashion_mnist_url, verbose = FALSE)

Arguments

`base_url`	Base URL that the files are located at.
`verbose`	If `TRUE`, then download progress will be logged as a message.

Format

A data frame with 786 variables:

px1, px2, px3 ... px784: Integer pixel value, from 0 (white) to 255 (black).
Label: The fashion item represented by the image, in the range 0-9.
Description: The name of the fashion item associated with the Label

Pixels are organized row-wise. The Label variable is stored as a factor. The labels correspond to:

0: T-shirt/top
1: Trouser
2: Pullover
3: Dress
4: Coat
5: Sandal
6: Shirt
7: Sneaker
8: Bag
9: Ankle boot

and are also present as the Description factor.

There are 70,000 items in the data set. The first 60,000 are the training set, as found in the train-images-idx3-ubyte.gz file. The remaining 10,000 are the test set, from the t10k-images-idx3-ubyte.gz file.

Items in the dataset can be visualized with the show_mnist_digit function.

For more information see https://github.com/zalandoresearch/fashion-mnist.

Details

Downloads the image and label files for the training and test datasets and converts them to a data frame. The dataset is intended to be a drop-in replacement for the MNIST digits dataset but with more relevance for benchmarking machine learning algorithms (i.e. it's more difficult).

Value

Data frame containing Fashion-MNIST.

Note

Originally based on a function by Brendan O'Connor.

References

Xiao, H., Kashif, R., & Vollgraf, R. (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747. https://github.com/zalandoresearch/fashion-mnist/

Examples

## Not run: 
# download the data set
fashion <- download_fashion_mnist()

# first 60,000 instances are the training set
fashion_train <- head(fashion, 60000)
# the remaining 10,000 are the test set
fashion_test <- tail(fashion, 10000)

# PCA on 1000 examples
fashion_r1000 <- fashion[sample(nrow(fashion), 1000), ]
pca <- prcomp(fashion_r1000[, 1:784], retx = TRUE, rank. = 2)
# plot the scores of the first two components
plot(pca$x[, 1:2], type = "n")
text(pca$x[, 1:2],
  labels = fashion_r1000$Label,
  col = rainbow(length(levels(fashion$Label)))[fashion_r1000$Label]
)

## End(Not run)

jlmelville/snedata documentation built on March 5, 2025, 12:22 p.m.