MinSizeClassification: Algorithm for minimum sample size estimation in ML

View source: R/MinSizeClassification.R

MinSizeClassificationR Documentation

Algorithm for minimum sample size estimation in ML

Description

This algorithm determines the minimum sample size to use with a specified algorithm, given a minimum value for the metric ("Accuracy" or "Kappa"). It may be used for binary or multiple-feature classification.

Usage

MinSizeClassification(
  X,
  Y,
  algorithm,
  metric,
  thr_metric,
  formula_rhs = "(1-a)-b*X^c",
  start_parameters,
  p_vec = 1:99/100,
  cv_number = 5,
  show_plot = T,
  n.cores = 1
)

Arguments

X

Dataset of variable/s to use for prediction.

Y

Vector with the predictor variable, i.e., single variable to classify.

algorithm

Choose from "knn" (k Nearest Neighbors), "glm" (logistic regression), "nb" (Naive Bayes), or "rf" (Random Forest). For binary classification any of the algorithms may be used. For multiple feature classification, only "knn" or "rf" may be used.

metric

"Accuracy" or "Kappa". Classification metric to use to determine minimum sample size.

thr_metric

Threshold on the metric to calculate the corresponding minimum sample size.

p_vec

Vector of ratios to divide training data into. Code loops through the different ratios to get a sample size and calculate the corresponding metric. Default is 1:99/100

n.cores

Number of logical CPU threads to use. Default is 1.

Value

List with minimum sample size, corresponding CI, dataframe with sample size, corresponding obtained metrics, and fit parameters of the metric.


gpcastelo/ML-minimum-sample-size documentation built on June 3, 2023, 8:48 p.m.