slb.load.datasets: Load Datasets from Available Repositories

Description Usage Arguments Value Author(s) Examples

View source: R/load.R

Description

A function to load a specified dataset from the PMLB dataset.

Usage

1
2
slb.load.datasets(repositories = NULL, datasets = NULL, tasks = NULL,
  clean.invalid = TRUE, clean.ohe = FALSE, verbose = TRUE, ...)

Arguments

repositories

the name of the repository you would like to query a dataset from. Defaults to NULL, which will return datasets matching from all repositories.

  • NULL Load datasets from all repositories matching other queries.

  • "pmlb" Load datasets from the Penn Machine-Learning Benchmarks.

  • "uci" Load datasets from the University of California - Irvine Machine Learning Repository.

  • "mnist" Load datasets from the MNIST dataset.

  • c("repo1", "repo2", ...) Load data from the indicated repositories.

datasets

the name of the dataset you wish to load. Defaults to NULL.

  • NULL Load all the datasets without specifying a specific name matching the desired query.

  • 'datasetid' Returns the dataset with the desired id matching the desired query.

  • c("datasetid1", "datasetid2", ...) Load data from the indicated datasets.

tasks

the type of the task, either "classification" or "regression". Defaults to NULL.

  • NULL Return all datasets matching the desired query.

  • 'classification' Load all classification datasets matching the desired query.

  • 'regression' Load all regression datasets matching the desired query.

  • c("taskid1", "taskid2", ...) Load data for the indicated tasks.

clean.invalid

whether to remove samples with invalid entries. Defaults to TRUE.

  • TRUE Remove samples that have features with NaN entries or non-finite.

  • FALSE Do not remove samples that have features with NaN entries or are non-finite..

clean.ohe

options for whether to one-hot-encode columns. Defaults to FALSE.

  • clean.ohe < 1 Converts columns with < thr*n unique identifiers to one-hot encoded.

  • is.integer(clean.ohe) Converts columns with < thr unique identifiers to one-hot encoded.

  • FALSE Do not one-hot-encode any columns.

verbose

whether to print messages to the console if a repository or dataset is being ignored. defaults to TRUE.

...

trailing args.

Value

A list of lists, where each element is a key-worded list for a particular benchmark dataset, containing at least the following:

X

[n, d] array with the n samples in d dimensions.

Y

[n] vector or [n, r] array with responses for each of the n samples.

Author(s)

Eric Bridgeford

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library(slb)
# request 1 specific dataset from the pmlb dataset
test <- slb.load.datasets(repositories="pmlb", datasets="ESL", clean.invalid=FALSE, clean.ohe=FALSE)
length(test$ESL$Y) == 488 # a known example from the pmlb dataset

# request all of the pmlb classification datasets
## Not run: 
test <- slb.load.datasets(repositories="pmlb", tasks="classification")
length(test) <- 166  # validates that we loaded all of the classification datasets from pmlb

## End(Not run)

neurodata/slbR documentation built on May 22, 2019, 2:41 p.m.