train.knn.classifier: train.knn.classifier - Train KNN classifier
In ImmuneDynamics/Spectre: High-dimensional cytometry and imaging analysis

train.knn.classifier

R Documentation

train.knn.classifier - Train KNN classifier

Description

train.knn.classifier - Train KNN classifier

Usage

train.knn.classifier(dat, use.cols, label.col,min.num.neighbours, 
max.num.neighbours, evaluation.metric, num.folds, num.repeats)

Arguments

`dat`	NO DEFAULT. data.frame. A dataframe containing cells (rows) vs features/markers (columns) to be used to train k-nearest neighbour (KNN) classifier
`use.cols`	NO DEFAULT. Vector of column names to use for training k-nearest neighbour (KNN) classifier.
`label.col`	NO DEFAULT. Character. Name of the column representing the population name of the cells in dat.
`min.num.neighbours`	DEFAULTS to 1. Numeric. When using a k-nearest neighbour (KNN) classifier, this parameter specifies the minimum number of nearest neighbours used to train the KNN classifier.
`max.num.neighbours`	DEFAULTS to 1. Numeric. When using a k-nearest neighbour (KNN) classifier, this parameter specifies the maximum number of nearest neighbours used to train the KNN classifier.
`method`	DEFAULTS to random. Can either be random which randomly shuffle data and split them into 2 halves, one for training, the other for testing. Or CV (cross validation) where data is split into num.folds complementary portions, and num.folds-1 portions used for training and the remaining for testing.
`num.folds`	DEFAULTS to 10. Numeric. Number of chunks the training data is to be split into. The classifier will then take 1 chunk for testing the classifier (after it's trained), and use the remaining chunk to train the classifier.
`num.repeats`	DEFAULTS to 1. Numeric. Number of time the training data will be split into num.folds chunks. If you set this to 3 and num.folds to 10, the classifier will split data into 10 chunks, train classifier on those chunks 10 times, and repeat the entire procedure 3 times (each time differet data will be in each chunk).
`seed`	DEFAULTS to 42. Seed used when splitting data into training and testing set. Train a k-nearest neighbour (KNN) classifier. The classifier will be trained on a number of neighbours, starting from min.num.neighbours, and increased gradually by 1 until max.num.neighbours is reached. For each number of neighbour, the accuracy (or other evaluation metric as specified in evaluation.metric) of the classifier will be computed. Data will be normalised to range between 0 and 1 before used in training the classifier. The normalisation method used is MinMaxScaling method. NOTE: the larger the training data, the longer it takes to train the classifier. Hence please be mindful of the training data size.

Value

The performance of the classifier on different number of neighbours as well as some description on how the training was performed i.e. what sampling process is used (how the data is split for training and testing), the sample sizes (number of data points used for training/testing). Included as well is the recommended number of neighbours for the data and the reasoning behind why that number of neighbours is the best.