run.train.knn.classifier: run.train.knn.classifier - Train KNN classifier

Description Usage Arguments Value Author(s) References

View source: R/run.train.knn.classifier.R

Description

run.train.knn.classifier - Train KNN classifier

Usage

1
run.train.knn.classifier(dat, use.cols, label.col,min.num.neighbours, max.num.neighbours, evaluation.metric, num.folds, num.repeats)

Arguments

dat

NO DEFAULT. data.frame. A dataframe containing cells (rows) vs features/markers (columns) to be used to train k-nearest neighbour (KNN) classifier

use.cols

NO DEFAULT. Vector of column names to use for training k-nearest neighbour (KNN) classifier.

label.col

NO DEFAULT. Character. Name of the column representing the population name of the cells in dat.

min.num.neighbours

DEFAULTS to 1. Numeric. When using a k-nearest neighbour (KNN) classifier, this parameter specifies the minimum number of nearest neighbours used to train the KNN classifier.

max.num.neighbours

DEFAULTS to 1. Numeric. When using a k-nearest neighbour (KNN) classifier, this parameter specifies the maximum number of nearest neighbours used to train the KNN classifier.

evaluation.metric

DEFAULTS to 'Accuracy'. Character. How do you want the classifier performance to be evaluated? By measuring accuracy (pass "Accuracy") or Cohen Kappa score (pass "Kappa").

num.folds

DEFAULTS to 10. Numeric. Number of chunks the training data is to be split into. The classifier will then take 1 chunk for testing the classifier (after it's trained), and use the remaining chunk to train the classifier.

num.repeats

DEFAULTS to 10. Numeric. Number of time the classifier will be trained (per number of neighbours). For each repeat, different chunk will be used for testing and training.

Train a k-nearest neighbour (KNN) classifier. The classifier will be trained on a number of neighbours, starting from min.num.neighbours, and increased gradually by 1 until max.num.neighbours is reached. For each number of neighbour, the accuracy (or other evaluation metric as specified in evaluation.metric) of the classifier will be computed. Data will be normalised to range between 0 and 1 before used in training the classifier. The normalisation method used is MinMaxScaling method. NOTE: the larger the training data, the longer it takes to train the classifier. Hence please be mindful of the training data size.

Value

The performance of the classifier on different number of neighbours as well as some description on how the training was performed i.e. what sampling process is used (how the data is split for training and testing), the sample sizes (number of data points used for training/testing). Included as well is the recommended number of neighbours for the data and the reasoning behind why that number of neighbours is the best.

Author(s)

Givanna Putri, ghar1821@uni.sydney.edu.au

References

https://sydneycytometry.org.au/spectre.


tomashhurst/Spectre documentation built on Dec. 23, 2021, 11:55 a.m.