scMAGIC: A cell type classifier for single cell RNA sequencing data

View source: R/scMAGIC.R

scMAGICR Documentation

A cell type classifier for single cell RNA sequencing data

Description

scMAGIC main function. User can provide reference data and query data, and scMAGIC can assign labels to cells in query dataset based on reference.

Usage

scMAGIC(exp_sc_mat, exp_ref_mat, exp_ref_label = NULL,
        single_round = F, identify_unassigned = T,
        atlas = c(NULL, 'MCA', 'HCL'), use.RUVseq = T,
        method_findmarker = c('COSG', 'Seurat'),
        percent_high_exp = 0.7, num_marker_gene = 100,
        cluster_num_pc = 50, cluster_resolution = 3, min_cell = 1,
        method1 = ('kendall', 'spearman', 'pearson', 'cosine', 'multinomial'),
        method2 = ('multinomial', 'kendall', 'spearman', 'pearson', 'cosine', 'randomforest'),
        corr_use_HVGene1 = 2000, corr_use_HVGene2 = 2000,
        threshold = 5, num_threads = 4, cluster_assign = F, simple.output = T)

Arguments

exp_sc_mat

The expression matrix of query data

exp_ref_mat

The expression matrix of reference data. If the type_ref is 'sc-counts', 'exp_ref_mat' is a single cell counts matrix where the row names correspond to gene symbols and the column names correspond to cell barcodes; if not, 'exp_ref_mat' is a matrix where the row names correspond to gene symbols and the column names correspond to cell type labels.

exp_ref_label

If 'type_ref' is 'sc-counts', 'exp_ref_label' is the vector including well-annotated cell labels corresponding to the cell barcodes of the 'exp_ref_mat'; if not, 'exp_ref_label' is NULL.

single_round

Whether to use single round annotation strategy, by default, scMAGIC use two-round annotation strategy.

identify_unassigned

Whether to label some cells as "Unassigned", by default, it is TRUE. If you think reference covers all cell types in query data, choose FALSE.

atlas

If reference is from mouse, it is 'MCA'; if reference is from human, it is 'HCL'; if no reference is avaiable, it is NULL.

use_RUVseq

Whether to use 'RUVSeq' to remove the batch effect between reference and atlas, by default, choose TRUE.

method_findmarker

Method to find marker genes.

percent_high_exp

In each cell type, the genes whose expression values are higher than "percent_high_exp" of genes are selected.

num_marker_gene

Number of each cell type's marker genes.

cluster_num_pc

Number of PCs used in clustering.

cluster_resolution

Resolution of the clustering algorithm, if you want get more clusters, you can give a larger resolution.

min_cell

If the number of validated cells with a reference cell type isn't less than "min_cell", these cells will be added into local reference.

method1

The methods of similarity calculation in first-round annotation, by default, choose 'kendall'.

method2

The methods of similarity calculation in second-round annotation, by default, choose 'multinomial'.

corr_use_HVGene1

Number of genes to select as top variable genes in similarity calculation of first-round annotation

corr_use_HVGene2

Number of genes to select as top variable genes in similarity calculation of second-round annotation

threshold

If confidence score is lower than 'threshold', we consider the corresponding label incorrect.

num_threads

Number of CPU used in calculation.

cluster_assign

Whether to annotate single cells by assigning a cell type to each cluster.

simple_output

Whether to output intermediate results.

Value

A dataframe including cell type labels and confidence scores.

Author(s)

Yu Zhang

Examples

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (x)
{
  }

Drizzle-Zhang/scMAGIC documentation built on March 17, 2023, 2:31 a.m.