listOMLDataSets: List the first 5000 OpenML data sets.

listOMLDataSetsR Documentation

List the first 5000 OpenML data sets.

Description

The returned data.frame contains the data set id “data.id”, the “status” (“active”, “deactivated”, “in_preparation”) and describing data qualities.

Note that by default only active data sets (due to “status = "active"”) will be returned. Furthermore, the argument “limit = 5000” will limit the number of results to 5000.

Usage

listOMLDataSets(
  number.of.instances = NULL,
  number.of.features = NULL,
  number.of.classes = NULL,
  number.of.missing.values = NULL,
  tag = NULL,
  data.name = NULL,
  limit = 5000,
  offset = NULL,
  status = "active",
  verbosity = NULL
)

Arguments

number.of.instances

[numeric(1) | numeric(2)]
If not NULL, subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given ranges.

number.of.features

[numeric(1) | numeric(2)]
If not NULL, it subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given range.

number.of.classes

[numeric(1) | numeric(2)]
If not NULL, subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given ranges.

number.of.missing.values

[numeric(1) | numeric(2)]
If not NULL, subsets the entries with respect to the given values or, if a vector of length 2 is passed, the given ranges.

tag

[character]
If not NULL only entries with the corresponding tags are listed.

data.name

[character(1)]
Name of the data set.

limit

[numeric(1)]
Optional. The maximum number of entries to return. Without specifying offset, it returns the first 'limit' entries. Setting limit = NULL returns all available entries.

offset

[numeric(1)]
Optional. The offset to start from. Should be indices starting from 0, which do not refer to IDs. Is ignored when no limit is given.

status

[character]
Subsets the results according to the status. Possible values are {"active", "deactivated", "in_preparation", "all"}. Default is "active".

verbosity

[integer(1)]
Print verbose output on console? Possible values are:
0: normal output,
1: info output,
2: debug output.
Default is set via setOMLConfig.

Value

[data.frame].

Note

This function is memoised. I.e., if you call this function twice in a running R session, the first call will query the server and store the results in memory while the second and all subsequent calls will return the cached results from the first call. You can reset the cache by calling forget on the function manually.

See Also

Other listing functions: chunkOMLlist(), listOMLDataSetQualities(), listOMLEstimationProcedures(), listOMLEvaluationMeasures(), listOMLFlows(), listOMLRuns(), listOMLSetup(), listOMLStudies(), listOMLTaskTypes(), listOMLTasks()

Other data set-related functions: OMLDataSetDescription, OMLDataSet, convertMlrTaskToOMLDataSet(), convertOMLDataSetToMlr(), deleteOMLObject(), getOMLDataSet(), tagOMLObject(), uploadOMLDataSet()

Examples

# \dontrun{
# 	datasets = listOMLDataSets()
# 	tail(datasets)
# }

OpenML documentation built on Oct. 20, 2022, 1:07 a.m.