Accurate and fast cell marker gene identification with COSG
COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.
The method and benchmarking results are described in Dai et al., (2022). The preprint is available in bioRxiv.
Here is the R version for COSG, and the python version is hosted in https://github.com/genecell/COSG.
# install.packages('remotes')
remotes::install_github(repo = 'genecell/COSGR')
Please check out the vignette and the PBMC10K tutorial to get started.
suppressMessages(library(Seurat))
data('pbmc_small',package='Seurat')
# Check cell groups:
table(Idents(pbmc_small))
#>
#> 0 1 2
#> 36 25 19
#######
# Run COSG:
marker_cosg <- cosg(
pbmc_small,
groups='all',
assay='RNA',
slot='data',
mu=1,
n_genes_user=100)
#######
# Check the marker genes:
head(marker_cosg$names)
#> 0 1 2
#> 1 CD7 S100A8 MS4A1
#> 2 CCL5 TYMP CD79A
#> 3 GNLY S100A9 TCL1A
#> 4 LAMP1 FCGRT NT5C
#> 5 GZMA IFITM3 CD79B
#> 6 LCK LST1 FCER2
head(marker_cosg$scores)
#> 0 1 2
#> 1 0.6391917 0.8954042 0.6922908
#> 2 0.6391267 0.8312083 0.5832425
#> 3 0.6328148 0.8120045 0.5757478
#> 4 0.6164937 0.7755955 0.5533107
#> 5 0.5846589 0.7413060 0.5163446
#> 6 0.5795238 0.7380483 0.5115180
####### Run COSG for selected groups, i.e., '0' and 2':
#######
marker_cosg <- cosg(
pbmc_small,
groups=c('0', '2'),
assay='RNA',
slot='data',
mu=1,
n_genes_user=100)
mu
to larger values, such as mu=10
or mu=100
.remove_lowly_expressed
to TRUE
to not consider genes expressed very lowly in the target cell group, and you can use the parameter expressed_pct
to adjust the threshold for the percentage. For example:marker_region<-cosg(
seo,
groups='all',
assay='peaks',
slot='data',
mu=100,
n_genes_user=100,
remove_lowly_expressed=TRUE,
expressed_pct=0.1
)
If COSG is useful for your research, please consider citing Dai, M., Pei, X., Wang, X.-J., 2022. Accurate and fast cell marker gene identification with COSG. Brief. Bioinform. bbab579.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.