slb.load.datasets: Load Datasets from Available Repositories
In neurodata/slbR: Statistical Learning Benchmarks

Description Usage Arguments Value Author(s) Examples

View source: R/load.R

A function to load a specified dataset from the PMLB dataset.

1 2	slb.load.datasets(repositories = NULL, datasets = NULL, tasks = NULL, clean.invalid = TRUE, clean.ohe = FALSE, verbose = TRUE, ...)

`repositories`	the name of the repository you would like to query a dataset from. Defaults to `NULL`, which will return datasets matching from all repositories. `NULL` Load datasets from all repositories matching other queries. `"pmlb"` Load datasets from the Penn Machine-Learning Benchmarks. `"uci"` Load datasets from the University of California - Irvine Machine Learning Repository. `"mnist"` Load datasets from the MNIST dataset. `c("repo1", "repo2", ...)` Load data from the indicated repositories.
`datasets`	the name of the dataset you wish to load. Defaults to `NULL`. `NULL` Load all the datasets without specifying a specific name matching the desired query. `'datasetid'` Returns the dataset with the desired id matching the desired query. `c("datasetid1", "datasetid2", ...)` Load data from the indicated datasets.
`tasks`	the type of the task, either "classification" or "regression". Defaults to `NULL`. `NULL` Return all datasets matching the desired query. `'classification'` Load all classification datasets matching the desired query. `'regression'` Load all regression datasets matching the desired query. `c("taskid1", "taskid2", ...)` Load data for the indicated tasks.
`clean.invalid`	whether to remove samples with invalid entries. Defaults to `TRUE`. `TRUE` Remove samples that have features with `NaN` entries or non-finite. `FALSE` Do not remove samples that have features with `NaN` entries or are non-finite..
`clean.ohe`	options for whether to one-hot-encode columns. Defaults to `FALSE`. `clean.ohe < 1` Converts columns with < thr*n unique identifiers to one-hot encoded. `is.integer(clean.ohe)` Converts columns with < thr unique identifiers to one-hot encoded. `FALSE` Do not one-hot-encode any columns.
`verbose`	whether to print messages to the console if a repository or dataset is being ignored. defaults to `TRUE`.
`...`	trailing args.

A list of lists, where each element is a key-worded list for a particular benchmark dataset, containing at least the following:

`X`	`[n, d]` array with the `n` samples in `d` dimensions.
`Y`	`[n]` vector or `[n, r]` array with responses for each of the `n` samples.

Eric Bridgeford

library(slb)
# request 1 specific dataset from the pmlb dataset
test <- slb.load.datasets(repositories="pmlb", datasets="ESL", clean.invalid=FALSE, clean.ohe=FALSE)
length(test$ESL$Y) == 488 # a known example from the pmlb dataset

# request all of the pmlb classification datasets
## Not run: 
test <- slb.load.datasets(repositories="pmlb", tasks="classification")
length(test) <- 166  # validates that we loaded all of the classification datasets from pmlb

## End(Not run)