nearest_datasets-methods: Select nearest datasets given input 'x'.
In pmlbr: Interface to the Penn Machine Learning Benchmarks Data Repository

nearest_datasets

R Documentation

Select nearest datasets given input 'x'.

Description

If 'x' is a data.frame object, computes dataset characteristics. If 'x' is a character object specifying dataset name from PMLB, use the already computed dataset statistics/characteristics in 'summary_stats'.

Usage

nearest_datasets(x, ...)

## Default S3 method:
nearest_datasets(x, ...)

## S3 method for class 'character'
nearest_datasets(
  x,
  n_neighbors = 5,
  dimensions = c("n_instances", "n_features"),
  target_name = "target",
  ...
)

## S3 method for class 'data.frame'
nearest_datasets(
  x,
  y = NULL,
  n_neighbors = 5,
  dimensions = c("n_instances", "n_features"),
  task = c("classification", "regression"),
  target_name = "target",
  ...
)

Arguments

`x`	Character string of dataset name from PMLB, or data.frame of n_samples x n_features(or n_features+1 with a target column)
`...`	Further arguments passed to each method.
`n_neighbors`	Integer. The number of dataset names to return as neighbors.
`dimensions`	Character vector specifying dataset characteristics to include in similarity calculation. Dimensions must correspond to numeric columns of [all_summary_stats.tsv](https://github.com/EpistasisLab/pmlb/blob/master/pmlb/all_summarystats.tsv). If 'all' (default), uses all numeric columns.
`target_name`	Character string specifying column of target/dependent variable.
`y`	Vector of target column. Required when 'x“ does not contain the target column.
`task`	Character string specifying classification or regression for summary stat generation.

Value

Character string of names of most similar datasets to df, most similar dataset first.

Examples

if (interactive()){
  nearest_datasets('penguins')
  nearest_datasets(fetch_data('penguins'))
}

pmlbr documentation built on April 12, 2025, 1:59 a.m.

pmlbr index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pmlbr
Interface to the Penn Machine Learning Benchmarks Data Repository

nearest_datasets-methods: Select nearest datasets given input 'x'.
In pmlbr: Interface to the Penn Machine Learning Benchmarks Data Repository

Select nearest datasets given input 'x'.

Description

Usage

Arguments

Value

Examples

Related to nearest_datasets-methods in pmlbr...

R Package Documentation

Browse R Packages

We want your feedback!

pmlbr Interface to the Penn Machine Learning Benchmarks Data Repository

nearest_datasets-methods: Select nearest datasets given input 'x'. In pmlbr: Interface to the Penn Machine Learning Benchmarks Data Repository

Select nearest datasets given input 'x'.

Description

Usage

Arguments

Value

Examples

Related to nearest_datasets-methods in pmlbr...

R Package Documentation

Browse R Packages

We want your feedback!

pmlbr
Interface to the Penn Machine Learning Benchmarks Data Repository

nearest_datasets-methods: Select nearest datasets given input 'x'.
In pmlbr: Interface to the Penn Machine Learning Benchmarks Data Repository