CellSIUS: CellSIUS: Cell Subtype Identification from Upregulated gene...

Description Usage Arguments Details Warning References See Also Examples

Description

Identification of cluster-specific gene sets and cell classification according to their expression.

CellSIUS enables the identification and characterization of (rare) cell sub-populations from complex scRNA-seq datasets: it takes as input expression values of N cells grouped into M(>1) clusters. Within each cluster, genes with a bimodal distribution are selected and only genes with cluster-specific expression are retained. Among these candidate marker genes, sets with correlated expression patterns are identified by graph-based clustering. Finally, cells are assigned to subgroups based on their average expression of each gene set. The CellSIUS algorithm output provides the rare/ sub cell types by cell indices and their transcriptomic signatures.

Usage

1
2
3
CellSIUS(mat.norm, group_id, min_n_cells = 10, min_fc = 2,
  corr_cutoff = NULL, iter = 0, max_perc_cells = 50,
  fc_between_cutoff = 1, mcl_path = "~/local/bin/mcl")

Arguments

mat.norm

Numeric Matrix: normalized gene expression matrix where rownames and colnames are gene and cell ID, respectively.

group_id

Character: vector with cluster cell assignment. Make sure that the order of cell cluster assignment reflects the order of the columns of norm.mat.

min_n_cells

Integer: when identifying bimodal gene distributions, this specifies the minimum number of cells per mode. Clusters with a total number of cells below min_n_cells will be entirely ignored. Defaults to 10.

min_fc

Numeric: minimum difference in mean [log2] between the two modes of the gene expression distribution. Defaults to 2.

corr_cutoff

Numeric: correlation cutoff for MCL clustering of candidate marker genes. If NULL, it will be set automatically for each cluster, which in general works much better than forcing a fixed value. Therefore, leave this at the default unless having a good reason to change it. Defaults to NULL.

iter

0 or 1: relevant for the final step (assigning cells to subgroups). If set to 1, the first mode of gene expression will be discarded. Cells are then assigned based on the 2nd and 3rd mode, which results in a more stringent assignment. The default is 0 and is usually stringent enough. Defaults to 0.

max_perc_cells

Numeric: maximum percentage of cells that can be part of a subcluster. Defaults to 50, implying that a “subgroup” cannot contain more than half of the total observations.

fc_between_cutoff

Numeric: minimum difference [log2] in gene expression between cells in the subcluster and all other cells. The higher, the more cluster-specific is the gene signature. Note that this should not be set higher than min_fc.

mcl_path

Character: path to the MCL executable. The external GNU MCL software for UNIX needs to be installed by the user.

Details

CellSIUS returns a data.table of a collection of key/ value pairs:

Warning

The execution of CellSIUS requires the external GNU Markov Cluster Algorithm (MCL) software for UNIX. It needs to be installed by the user [link].

References

Wegmann, R.et Al., 2018. CellSIUS for sensitive and specific identification and characterization of rare cell populations from complex single-cell RNA sequencing data. Nature Comunications Submission

See Also

Accessory functions are provided to help the user to summarize and visualize the CellSIUS results:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
require(CellSIUS)
require(ggplot2)
require(data.table)

# 2D cell projection
qplot(my_tsne[,1],my_tsne[,2],xlab="tSNE1",ylab="tSNE1",main='2D scRNAseq cell projection')
qplot(my_tsne[,1],my_tsne[,2],xlab="tSNE1",ylab="tSNE1",color=as.factor(clusters),main="Cells colored by clusters")

# Run CellSIUS
# !WARNING: before to run the CellSIUS function, take care to correctly set the mcl_path to point
# to the executable of your local installation of MCL algorithm

CellSIUS.out<-CellSIUS(mat.norm = norm.counts,group_id = clusters,min_n_cells=10, min_fc = 2,
             corr_cutoff = NULL, iter=0, max_perc_cells = 50,
             fc_between_cutoff = 1,mcl_path = "~/local/bin/mcl")

#____________________________
# EXPLORE CellSIUS RESULTS:
#____________________________

# Summary of sub-populations identified by CellSIUS

Result_List=CellSIUS_GetResults(CellSIUS.out=CellSIUS.out)

# 2D visualization of of sub-populations identified by CellSIUS
require(RColorBrewer)
CellSIUS_plot(coord = my_tsne,CellSIUS.out = CellSIUS.out)

# Final clustering assignment

Final_Clusters = CellSIUS_final_cluster_assignment(CellSIUS.out=CellSIUS.out, group_id=clusters, min_n_genes = 3)
table(Final_Clusters)
qplot(my_tsne[,1],my_tsne[,2],xlab="tSNE1",ylab="tSNE1",color=as.factor(Final_Clusters))

Novartis/CellSIUS documentation built on June 4, 2019, 12:01 a.m.