train_classifier: Train cell type classifier

Description Usage Arguments Value Note Examples

Description

Train a classifier for a new cell type. If cell type has a parent, only available for scClassifR object as parent cell classifying model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
train_classifier(
  train_obj,
  cell_type,
  features,
  parent_cell = NA_character_,
  parent_clf = NULL,
  path_to_models = c("default", "."),
  zscore = TRUE,
  ...
)

## S4 method for signature 'Seurat'
train_classifier(
  train_obj,
  cell_type,
  features,
  parent_cell = NA_character_,
  parent_clf = NULL,
  path_to_models = c("default", "."),
  zscore = TRUE,
  seurat_tag_slot = "active.ident",
  seurat_parent_tag_slot = "predicted_cell_type",
  seurat_assay = "RNA",
  seurat_slot = "counts",
  ...
)

## S4 method for signature 'SingleCellExperiment'
train_classifier(
  train_obj,
  cell_type,
  features,
  parent_cell = NA_character_,
  parent_clf = NULL,
  path_to_models = c("default", "."),
  zscore = TRUE,
  sce_tag_slot = "ident",
  sce_parent_tag_slot = "predicted_cell_type",
  sce_assay = "logcounts",
  ...
)

Arguments

train_obj

object that can be used for training the new model. Seurat object or SingleCellExperiment object is expected. If the training model has parent, parent_tag_slot may have been indicated. This field would have been filled out automatically if user precedently run classify_cells function. If no (predicted) cell type annotation provided, the function can be run if 1- parent_cell or 2- parent_clf is provided.

cell_type

string indicating the name of the subtype This must exactly match cell tag/label if cell tag/label is a string.

features

list of features used for the new training model

parent_cell

string indicated the name of the parent cell type, if parent cell type classifier has already been saved in model database. Adjust path_to_models for exact database.

parent_clf

classification model for the parent cell type

path_to_models

path to the folder containing the model database. As default, the pretrained models in the package will be used. If user has trained new models, indicate the folder containing the new_models.rda file.

zscore

whether gene expression in train_obj is transformed to zscore

...

arguments passed to other methods

seurat_tag_slot

string, name of slot in cell meta data indicating cell tag/label in the training object. Strings indicating cell types are expected in this slot. For Seurat object, default value is "active.ident". Expected values are string (A-Z, a-z, 0-9, no special character accepted) or binary/logical, 0/"no"/F/FALSE: not being new cell type, 1/"yes"/T/TRUE: being new cell type.

seurat_parent_tag_slot

string, name of a slot in cell meta data indicating assigned/predicted cell type. Default is "predicted_cell_type". This slot would have been filled automatically if user have called classify_cells function. The slot must contain only string values.

seurat_assay

name of assay to use in training object. Default to 'RNA' assay.

seurat_slot

type of expression data to use in training object. For Seurat object, available types are: "counts", "data" and "scale.data". Default to "counts", which contains unnormalized data.

sce_tag_slot

string, name of annotation slot indicating cell tag/label in the training object. For SingleCellExperiment object, default value is "ident". Expected values are string (A-Z, a-z, 0-9, no special character accepted) or binary/logical, 0/"no"/F/FALSE: not being new cell type, 1/"yes"/T/TRUE: being new cell type.

sce_parent_tag_slot

string, name of a slot in cell meta data indicating pre-assigned/predicted cell type. Default field is "predicted_cell_type". This field would have been filled automatically when user called classify_cells function. The slot must contain only string values.

sce_assay

name of assay to use in training object. Default to 'logcounts' assay.

Value

scClassifR object

Note

Only one cell type is expected for each cell in object. Ambiguous cell type, such as: "T cells/NK cells/ILC", will be ignored from training. Subtypes used in training model for parent cell types must be indicated as parent cell type. For example, when training for B cells, plasma cells must be annotated as B cells in order to be used.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# load small example dataset
data("tirosh_mel80_example")

# this dataset already contains pre-defined cell labels
table(Seurat::Idents(tirosh_mel80_example))

# define genes to use to classify this cell type (B cells in this example)
selected_features_B = c("CD19", "MS4A1", "CD79A")

# train the classifier, the "cell_type" argument must match 
# the cell labels in the data, except upper/lower case
set.seed(123)
clf_b <- train_classifier(train_obj = tirosh_mel80_example, 
features = selected_features_B, cell_type = "b cells")

# classify cell types using B cell classifier, 
# a test classifier process may be used before applying the classifier 
tirosh_mel80_example <- classify_cells(classify_obj = tirosh_mel80_example, 
classifiers = c(clf_b))

# tag all cells that are plasma cells (random example here)
tirosh_mel80_example[['plasma_cell_tag']] <- c(rep(1, 80), rep(0, 400))

# set new features for the subtype
p_features = c("SDC1", "CD19", "CD79A")

# train the classifier, the "B cell" classifier is used as parent. 
# This means, only cells already classified as "B cells" will be evaluated.
# the "tag_slot" parameter tells the classifier to use this cell meta data
# for the training process.
set.seed(123)
plasma_clf <- train_classifier(train_obj = tirosh_mel80_example, 
cell_type = "Plasma cell", features = p_features, parent_clf = clf_b, 
seurat_tag_slot = 'plasma_cell_tag')

grisslab/scClassifR documentation built on Oct. 27, 2021, 12:13 p.m.