nearest_datasets-methods: Select nearest datasets given input 'x'.

nearest_datasetsR Documentation

Select nearest datasets given input 'x'.

Description

If 'x' is a data.frame object, computes dataset characteristics. If 'x' is a character object specifying dataset name from PMLB, use the already computed dataset statistics/characteristics in 'summary_stats'.

Usage

nearest_datasets(x, ...)

## Default S3 method:
nearest_datasets(x, ...)

## S3 method for class 'character'
nearest_datasets(
  x,
  n_neighbors = 5,
  dimensions = c("n_instances", "n_features"),
  target_name = "target",
  ...
)

## S3 method for class 'data.frame'
nearest_datasets(
  x,
  y = NULL,
  n_neighbors = 5,
  dimensions = c("n_instances", "n_features"),
  task = c("classification", "regression"),
  target_name = "target",
  ...
)

Arguments

x

Character string of dataset name from PMLB, or data.frame of n_samples x n_features(or n_features+1 with a target column)

...

Further arguments passed to each method.

n_neighbors

Integer. The number of dataset names to return as neighbors.

dimensions

Character vector specifying dataset characteristics to include in similarity calculation. Dimensions must correspond to numeric columns of [all_summary_stats.tsv](https://github.com/EpistasisLab/pmlb/blob/master/pmlb/all_summary_stats.tsv). If 'all' (default), uses all numeric columns.

target_name

Character string specifying column of target/dependent variable.

y

Vector of target column. Required when 'x“ does not contain the target column.

task

Character string specifying classification or regression for summary stat generation.

Value

Character string of names of most similar datasets to df, most similar dataset first.

Examples

nearest_datasets('penguins')
nearest_datasets(fetch_data('penguins'))


pmlbr documentation built on Sept. 29, 2023, 1:06 a.m.