knn_classifier: Classify cells from one Seurat object in terms of another...

Description Usage Arguments Details Value

View source: R/classifier.R

Description

Classify cells from one Seurat object in terms of another Seurat object's identity field, with a "reject option" for unfamiliar cells.

Usage

1
2
knn_classifier(dge_train, dge_test, ident.use = "ident", vars.all = NULL,
  my_transform = "PCA_20", badness = NULL, k = 25, reject_prop = 0)

Arguments

dge_train

Cells to train classifier on. Seurat object.

dge_test

Cells to be classified. Seurat object.

ident.use

Identity variable to use for training labels.

vars.all

List of raw genes/features to use. If possible, will be accessed through 'FetchData'; in this case, should be numeric. For others, zeroes are filled in. If NULL, uses variable genes from both 'dge_train' and 'dge_test'.

my_transform

NULL, character, or function. If 'is.null(my_transform)' (default), then 'my_transform' is the identity. if 'my_transform' has the form "PCA_<integer>", then the 'my_transform' is an unscaled <integer>-dimensional PCA based on the training data. This option triggers special behavior for quantifying classifier badness, because NN will perform badly in a principal subspace. If a function is given, 'my_transform' should accept and return matrices where rows are cells.

badness

Either "pc_dist" or "neighbor_dist" or 'NULL'. If 'NULL', default depends on ‘my_transform'. You can’t use "pc_dist" unless 'my_transform' has the form "PCA_<integer>".

k

Number of nearest neighbors to use. Default 25.

reject_prop

Expected rate of false rejections you're willing to tolerate on held-out training instances. Default is 1/100. This is not honest if 'my_transform' is chosen using the training data, and it cannot account for batch effects.

Details

Using k-nearest neighbors, classify cells from 'dge_test' in terms of the options in 'unique(FetchData(dge_train, ident.use))', plus a reject option. Rejection happens when the badness (usually distance to the nearest neighbors) falls above a threshold (see 'reject_prop'). Badness gets adjusted by cluster, because some clusters naturally are less concentrated on the principal subspace or the coordinates of interest.

Value

Seurat object identical to 'dge_test' but with new/modified fields for - 'classifier_ident' (predicted class) - 'classifier_badness' (lower means higher confidence) - 'classifier_probs_<each identity class from trainset>' (predicted class probabilities)


maehrlab/thymusatlastools documentation built on May 28, 2019, 2:32 a.m.