train.knn.classifier: train.knn.classifier - Train KNN classifier

Description Usage Arguments Value Author(s) References

View source: R/train.knn.classifier.R

Description

train.knn.classifier - Train KNN classifier

Usage

1
train.knn.classifier(dat, use.cols, label.col,min.num.neighbours, max.num.neighbours, evaluation.metric, num.folds, num.repeats)

Arguments

dat

NO DEFAULT. data.frame. A dataframe containing cells (rows) vs features/markers (columns) to be used to train k-nearest neighbour (KNN) classifier

use.cols

NO DEFAULT. Vector of column names to use for training k-nearest neighbour (KNN) classifier.

label.col

NO DEFAULT. Character. Name of the column representing the population name of the cells in dat.

min.num.neighbours

DEFAULTS to 1. Numeric. When using a k-nearest neighbour (KNN) classifier, this parameter specifies the minimum number of nearest neighbours used to train the KNN classifier.

max.num.neighbours

DEFAULTS to 1. Numeric. When using a k-nearest neighbour (KNN) classifier, this parameter specifies the maximum number of nearest neighbours used to train the KNN classifier.

method

DEFAULTS to random. Can either be random which randomly shuffle data and split them into 2 halves, one for training, the other for testing. Or CV (cross validation) where data is split into num.folds complementary portions, and num.folds-1 portions used for training and the remaining for testing.

num.folds

DEFAULTS to 10. Numeric. Number of chunks the training data is to be split into. The classifier will then take 1 chunk for testing the classifier (after it's trained), and use the remaining chunk to train the classifier.

num.repeats

DEFAULTS to 1. Numeric. Number of time the training data will be split into num.folds chunks. If you set this to 3 and num.folds to 10, the classifier will split data into 10 chunks, train classifier on those chunks 10 times, and repeat the entire procedure 3 times (each time differet data will be in each chunk).

seed

DEFAULTS to 42. Seed used when splitting data into training and testing set.

Train a k-nearest neighbour (KNN) classifier. The classifier will be trained on a number of neighbours, starting from min.num.neighbours, and increased gradually by 1 until max.num.neighbours is reached. For each number of neighbour, the accuracy (or other evaluation metric as specified in evaluation.metric) of the classifier will be computed. Data will be normalised to range between 0 and 1 before used in training the classifier. The normalisation method used is MinMaxScaling method. NOTE: the larger the training data, the longer it takes to train the classifier. Hence please be mindful of the training data size.

Value

The performance of the classifier on different number of neighbours as well as some description on how the training was performed i.e. what sampling process is used (how the data is split for training and testing), the sample sizes (number of data points used for training/testing). Included as well is the recommended number of neighbours for the data and the reasoning behind why that number of neighbours is the best.

Author(s)

Givanna Putri, ghar1821@uni.sydney.edu.au

References

https://sydneycytometry.org.au/spectre.


sydneycytometry/Spectre documentation built on March 20, 2021, 2:15 a.m.