assignCellType: Assign cell type.

Description Usage Arguments Value Examples

View source: R/assignCellType.R

Description

This is the main function of scTypeGSEA, which can do quality control, data pre-process, cluster, get fold changes, do GSEA and label the cell in one step.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
assignCellType(
  obj,
  datatype = "RNA",
  metadata = NULL,
  min.cells = 3,
  min.features = 200,
  percent.mt = 10,
  oversd = NULL,
  normalization.method = "LogNormalize",
  scale.factor = 10000,
  selection.method = "vst",
  nfeatures = 2000,
  npcs = 50,
  cluster_cell = NULL,
  dims = 1:50,
  k.param = 20,
  resolution = 0.5,
  hclustmethod = "complete",
  ncluster = 3,
  min.pct = 0.1,
  test.use = "wilcox",
  logfc.threshold = 0.1,
  db = "PanglaoDB_list",
  minSize = 15,
  maxSize = 500,
  annotation.file = NULL,
  seq.levels = c(1:22, "X", "Y"),
  include.body = TRUE,
  upstream = 2000,
  downstream = 0
)

Arguments

obj

A seurat object or or any matrix where each column is a cell.

datatype

Data type for your data, which can be "RNA" for scRNAseq data, "ATAC" for scATACseq data or any other data type.

metadata

Add metadata when creating Seurat object.

min.cells

An integer value. Include features detected in at least this many cells.

min.features

An integer value. Include cells where at least this many features are detected.

percent.mt

Define the highest percentage of reads that map to the mitochondrial genome.

oversd

Remove cells whose library size is greater than mean + oversd * sd. Default is null, which doesn't remove cells.

normalization.method

Method for normalization. Include 'LogNormalize', 'CLR' and 'RC'.

scale.factor

Sets the scale factor for cell-level normalization.

selection.method

How to choose top variable features. Include 'vst', 'mean.var.plot' and 'dispersion'.

nfeatures

An integer value. Define the number of features to select as top variable features.

npcs

An integer value. Define total Number of PCs to compute and store (50 by default).

cluster_cell

The cluster result for cells if it is already known.

dims

An integer value. Define dimensions of reduction to use as input. (Do cluster for single cell data..)

k.param

An integer value. Defines k for the k-nearest neighbor algorithm. (Do cluster for single cell data..)

resolution

Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities. (Do cluster for single cell data..)

hclustmethod

The agglomeration method to be used for hierarchical clustering, defalut is "complete". (Do cluster for other data type.)

ncluster

An integer, which is the number of cluster when your input including results from hierarchical clustering.

min.pct

A decimal value between 0 and 1. Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed.

test.use

Denotes which test to use. Available options are 'wilcox', 'bimod', 'roc', 'negbinom', 'poisson', 'LR', 'MAST' and 'DESeq2'. The defalut is "MAST" for scRNAseq data, we suggest to use 'wilcox' for other data type.

logfc.threshold

A decimal value between 0 and 1. Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Increasing logfc.threshold speeds up the function, but can miss weaker signals.

db

The cell type data base to use. For single cell data, we provide three data base, the first one is 'PanglaoDB' data base (db = 'PanglaoDB_list'), the second one is 'GSEA' data base (db = 'GSEA_list') and the third one is the reference genome for Arabidopsis (db = 'TAIR_list'). It can also be a path to the new (referential) data base that hope to be used, the file must be 'rds' format.

minSize

An integer value. Minimal size of a gene set to test. All pathways below the threshold are excluded.

maxSize

An integer value. Maximal size of a gene set to test. All pathways above the threshold are excluded.

annotation.file

Path to GTF annotation file. (Only for "ATAC" data)

seq.levels

Which seqlevels to keep (corresponds to chromosomes usually).

include.body

Include the gene body? (Only for "ATAC" data)

upstream

Number of bases upstream to consider. (Only for "ATAC" data)

downstream

Number of bases downstream to consider. (Only for "ATAC" data)

Value

It will return the Seurat object with cell type, a cell type matrix and a cluster list.

Examples

1
2
3
4
pbmc_example <- assignCellType(small_pbmc_rna, min.cells = 1, min.features = 10,
                               nfeatures = 100, npcs = 10,
                               dims = 1:10, k.param = 5, resolution = 0.75,
                               min.pct = 0.25, test.use = "MAST", minSize = 5)

cailab-tamu/scTypeGSEA documentation built on July 15, 2020, 10:56 a.m.