npdrLearnerCV: 'npdrLearnerCV'
In insilico/glmSTIR: Nearest-neighbor Projected-Distance Regression

npdrLearnerCV

R Documentation

`npdrLearnerCV`

Description

Tune a hyperparmeter that maximizes the cross-validation accuracy of a k -nearest-neighbors classifier. You can tune k, but keep in mind that the resulting k might be underestimated because the training sample size is smaller than the original sample size. When other hyperparameters are optimized, k is fixed to the npdr theoretical value that adapts to the training size (todo: make more flexible with knn alpha). You can tune the number of ICA or PCA components as the components are used as the space for calculating nearest neighbors. todo: create function interface that allows user to create their own sapply_hyper_fn.

Usage

npdrLearnerCV(
  x,
  label = "class",
  tune_grid = seq(10, 90, 10),
  dist_metric = "manhattan",
  tune_type = "knn",
  num_folds = 5,
  verbose = F
)

Arguments

`x`	(m+1) x p dataframe of m instances, 1 class column and p attributes
`label`	column label for class `"class"`
`tune_grid`	vector of hyperparameter values to test for best classification accuracy
`dist_metric`	for distance matrix between instances (default: `"manhattan"`, others include `"euclidean"`, and for GWAS `"allele-sharing-manhattan"`).
`tune_type`	type of hyperparmater to optimize. default: `"knn"`, others include `"ica"` (number of ica components for ica space transformation, and `"pca"` (number of components for PCA transformation.
`num_folds`	number of cross-validation folds for tuning

Value

list containing best hyperparameter (best_param), its highest accuracy (best_acc), and a table of fold and parameter accuracies (cv_table)

Examples

library(flexclust) # need for npdrLearner knn classifier
library(fastICA)   # need if tuning ica tansformation
cv.out <- npdrLearnerCV(x=dats, label="class", 
              tune_grid = seq(20,90,5),   # tuning knn
              dist_metric = "manhattan",
              tune_type = "knn",
              num_folds=5, verbose=T)
cv.out$best_param
plot(cv.out$cv_table$hyp,cv.out$cv_table$means,
        xlab="hyperparameter", ylab="accuracy", 
        main="CV hyperparameter tuning", type="l")
text(cv.out$best_param,cv.out$best_acc,paste("max.loc =",cv.out$best_param))
Or you can tune number of knns 
cv.out <- npdrLearnerCV(x=dats, label="class", 
                     tune_grid = seq(20,90,5),   # tuning knn
                       dist_metric = "manhattan",
                       tune_type = "knn",
                       num_folds=5, verbose=T)

insilico/glmSTIR documentation built on July 7, 2023, 12:29 a.m.