cell_classifier: Cell classifier
In SingleCellSignalR: Cell Signalling Using Single Cell RNAseq Data Analysis

Description Usage Arguments Details Value Examples

Classifies cells using cell type specific markers.

cell_classifier(
  data,
  genes,
  markers = markers_default,
  tsne = NULL,
  plot.details = FALSE,
  write = TRUE,
  verbose = TRUE
)

`data`	a data frame of n rows (genes) and m columns (cells) of read or UMI counts (note : rownames(data)=genes)
`genes`	a character vector of HUGO official gene symbols of length n
`markers`	a data frame of cell type signature genes
`tsne`	(optional) a table of n rows and 2 columns with t-SNE projection coordinates for each cell
`plot.details`	a logical (if TRUE, then plots the number of cells attributed to one cell type, see below)
`write`	a logical
`verbose`	a logical

The ' markers' argument must be a table with cell type gene signatures, one cell type in each column. The column names are the names of the cell types.

The *markers.default* table provides an example of this format.

If ' tsne' is not provided, then the function will just not display the cells on the t-SNE. Although t-SNE maps are widely used to display cells on a 2D projection, the user can provide any table with two columns and a number of rows equal to the number of columns of 'data' (e.g. the two first components of a PCA).

If ' plot.details' is TRUE, then the function plots the number of cells attributed to a single cell type as a function of the threshold applied to the normalized gene signature average.

If ' write' is TRUE, then the function writes four different text files. (1) The "raw classification matrix" provides the normalized average gene signature for each cell type in each individual cell, a number between 0 and 1. This matrix has one row per cell type and one column per cell, and the sum per column is 1. Row names are the cell type names (column names of the markers table) and the column names are the individual cell identifiers (column names of 'data'). (2) The "thresholded classification matrix", which is obtained by eliminating all the values of the "raw classification matrix" that are below a threshold a\*. In practice, a\* is automatically determined by the function to maximize the number of cells that are assigned to a single cell type and all the cells (columns) assigned to 0 or >1 cell types are discarded. The number of cells assigned to a single type depending on a\* can be plotted by using the parameter 'plot.details=TRUE'. (3) A cluster vector assigning each cell to a cell type. Note that a supplementary, virtual cluster is created to collect all the cells assigned to 0 or >1 types. This virtual cluster is named "undefined". (4) A table associating each cell type to a cluster number in the cluster vector.

The function returns a list containing the thresholded table, the maximum table, the raw table, a cluster vector and the cluster names. The maximum table is a special thresholded table where in every column only the maximum gene signature is kept. It can be used to force the classification of every cell.

data <- matrix(runif(1000,0,1),nrow=50,ncol=20)
rownames(data) <- paste("gene",seq_len(50))
markers <- matrix(paste("gene",seq_len(10)),ncol=5,nrow=2)
colnames(markers) <- paste("type",seq_len(5))
cell_classifier(data,rownames(data),markers)