npdrLearnerCV: 'npdrLearnerCV'

View source: R/npdrLearner.R

npdrLearnerCVR Documentation

npdrLearnerCV

Description

Tune a hyperparmeter that maximizes the cross-validation accuracy of a k -nearest-neighbors classifier. You can tune k, but keep in mind that the resulting k might be underestimated because the training sample size is smaller than the original sample size. When other hyperparameters are optimized, k is fixed to the npdr theoretical value that adapts to the training size (todo: make more flexible with knn alpha). You can tune the number of ICA or PCA components as the components are used as the space for calculating nearest neighbors. todo: create function interface that allows user to create their own sapply_hyper_fn.

Usage

npdrLearnerCV(
  x,
  label = "class",
  tune_grid = seq(10, 90, 10),
  dist_metric = "manhattan",
  tune_type = "knn",
  num_folds = 5,
  verbose = F
)

Arguments

x

(m+1) x p dataframe of m instances, 1 class column and p attributes

label

column label for class "class"

tune_grid

vector of hyperparameter values to test for best classification accuracy

dist_metric

for distance matrix between instances (default: "manhattan", others include "euclidean", and for GWAS "allele-sharing-manhattan").

tune_type

type of hyperparmater to optimize. default: "knn", others include "ica" (number of ica components for ica space transformation, and "pca" (number of components for PCA transformation.

num_folds

number of cross-validation folds for tuning

Value

list containing best hyperparameter (best_param), its highest accuracy (best_acc), and a table of fold and parameter accuracies (cv_table)

Examples

library(flexclust) # need for npdrLearner knn classifier
library(fastICA)   # need if tuning ica tansformation
cv.out <- npdrLearnerCV(x=dats, label="class", 
              tune_grid = seq(20,90,5),   # tuning knn
              dist_metric = "manhattan",
              tune_type = "knn",
              num_folds=5, verbose=T)
cv.out$best_param
plot(cv.out$cv_table$hyp,cv.out$cv_table$means,
        xlab="hyperparameter", ylab="accuracy", 
        main="CV hyperparameter tuning", type="l")
text(cv.out$best_param,cv.out$best_acc,paste("max.loc =",cv.out$best_param))
Or you can tune number of knns 
cv.out <- npdrLearnerCV(x=dats, label="class", 
                     tune_grid = seq(20,90,5),   # tuning knn
                       dist_metric = "manhattan",
                       tune_type = "knn",
                       num_folds=5, verbose=T)

insilico/glmSTIR documentation built on July 7, 2023, 12:29 a.m.