knn_classifier: Classify cells from one Seurat object in terms of another...
In maehrlab/thymusatlastools: Tools for analysis of single-cell transcriptomic data

Description Usage Arguments Details Value

Classify cells from one Seurat object in terms of another Seurat object's identity field, with a "reject option" for unfamiliar cells.

1 2	knn_classifier(dge_train, dge_test, ident.use = "ident", vars.all = NULL, my_transform = "PCA_20", badness = NULL, k = 25, reject_prop = 0)

`dge_train`	Cells to train classifier on. Seurat object.
`dge_test`	Cells to be classified. Seurat object.
`ident.use`	Identity variable to use for training labels.
`vars.all`	List of raw genes/features to use. If possible, will be accessed through 'FetchData'; in this case, should be numeric. For others, zeroes are filled in. If NULL, uses variable genes from both 'dge_train' and 'dge_test'.
`my_transform`	NULL, character, or function. If 'is.null(my_transform)' (default), then 'my_transform' is the identity. if 'my_transform' has the form "PCA_<integer>", then the 'my_transform' is an unscaled <integer>-dimensional PCA based on the training data. This option triggers special behavior for quantifying classifier badness, because NN will perform badly in a principal subspace. If a function is given, 'my_transform' should accept and return matrices where rows are cells.
`badness`	Either "pc_dist" or "neighbor_dist" or 'NULL'. If 'NULL', default depends on ‘my_transform'. You can’t use "pc_dist" unless 'my_transform' has the form "PCA_<integer>".
`k`	Number of nearest neighbors to use. Default 25.
`reject_prop`	Expected rate of false rejections you're willing to tolerate on held-out training instances. Default is 1/100. This is not honest if 'my_transform' is chosen using the training data, and it cannot account for batch effects.

Using k-nearest neighbors, classify cells from 'dge_test' in terms of the options in 'unique(FetchData(dge_train, ident.use))', plus a reject option. Rejection happens when the badness (usually distance to the nearest neighbors) falls above a threshold (see 'reject_prop'). Badness gets adjusted by cluster, because some clusters naturally are less concentrated on the principal subspace or the coordinates of interest.

Seurat object identical to 'dge_test' but with new/modified fields for - 'classifier_ident' (predicted class) - 'classifier_badness' (lower means higher confidence) - 'classifier_probs_<each identity class from trainset>' (predicted class probabilities)

maehrlab/thymusatlastools documentation built on May 28, 2019, 2:32 a.m.