getDataset: Load example datasets

Description Usage Arguments Value covertype mnist Examples

View source: R/data.R

Description

Download and load one of the example datasets for the package: covertype or mnist. These datasets are required for the vignettes in the package. The code generating these datasets is available at https://github.com/jbaker92/sgmcmc-data.

Usage

1
getDataset(dataset)

Arguments

dataset

string which determines the dataset to load: either "covertype" or "mnist".

Value

Returns the desired dataset. The next two sections give more details about each dataset.

covertype

The samples in this dataset correspond to 30×30m patches of forest in the US, collected for the task of predicting each patch’s cover type, i.e. the dominant species of tree. We use the LIBSVM dataset, which transforms the data to a binary problem rather than multiclass.

format: A matrix with 581012 rows and 55 variables. The first column is the classification labels, the other columns are the 54 explanatory variables.

source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html

mnist

The MNIST dataset is a dataset of handwritten digits from 0-9. Each image is 28x28 pixels. We can interpret this as a large matrix of numbers, representing the value at each pixel. These 28x28 matrices are then flattened to be vectors of length 784. For each image, there is an associated label, which determines which digit the image is of. This image is encoded as a vector of length 10, where element i is 1 if the digit is i-1 and 0 otherwise. The dataset is split into two parts: 55,000 data points of training data and 10,000 points of test data.

format: A list with two elements train and test.

source: http://yann.lecun.com/exdb/mnist/

Examples

1
2
3
4
5
6
7
## Not run: 
# Download the covertype dataset
covertype = get_dataset("covertype")
# Download the mnist dataset
mnist = get_dataset("mnist")

## End(Not run)

sgmcmc documentation built on Oct. 30, 2019, 11:39 a.m.