View source: R/FeatureGeneSelection.R
feature_gene_selection | R Documentation |
This function selects a subset of feature genes that are expected to express differently across cell types before calculating CDI.
feature_gene_selection(
X,
batch_label = NULL,
count_slot = NULL,
batch_slot = NULL,
method = "wds",
nfeature = 500,
zp_threshold = 0.95
)
X |
The class of X can be "matrix", "Seurat" object, or "SingleCellExperiment" object. If X is a matrix, it should be a UMI count matrix where each row represents a gene, and each column represents a cell. If X is a Seurat object or SingleCellExperiment object, users need to specify where the count matrix, and batch labels are stored in count_slot and batch_slot, respectively. |
batch_label |
A vector of characters indicating the batch labels of the cells. The length of batch_label should be the same as the number of columns in the count matrix. The default value is NULL indicating that there is no batch information available. |
count_slot |
A string indicating the location of raw UMI count. For Seurat object, it is a slot in "RNA" of "assays"; For SingleCellExperiment object, it is a slot in "assays". Each row represents a gene, and each column represents a cell. The genes should be those before feature gene selection. |
batch_slot |
A string indicating the location of batch labels of cells. For Seurat object, it is a slot in meta.data; For SingleCellExperiment object, it is a slot in "colData". The default value is NULL indicating that there is no batch information available. |
method |
A character indicating the method used to select feature genes. "wds" (default) represent the working dispersion score proposed in our study; "vst" is the default method for feature gene selection in FindVariableFeatures function of Seurat package. |
nfeature |
An integer indicating the number of features to select. The default value is 500. |
zp_threshold |
A number of zero proportion threshold. The range should be (0,1). Genes with zero proportion greater than this value will be excluded before feature selection. |
A vector of indices of selected feature genes corresponding to the row indices of input count matrix.
Stuart and Butler et al. Comprehensive Integration of Single-Cell Data. Cell (2019) [Seurat V3]
## Simulate a matrix of 100 rows (genes), where the first 50 genes have
## different mean expression levels.
## Apply feature_gene_selection to select genes
ng <- 100; nc <- 100
set.seed(1)
X <- cbind(
matrix(
c(rnbinom(ng*nc/4, size = 1, mu = 0.1),
rnbinom(ng*nc/4, size = 1, mu = 0.5)),
nrow = ng,
byrow = TRUE),
matrix(
c(rnbinom(ng*nc/4, size = 1, mu = 1),
rnbinom(ng*nc/4, size = 1, mu = 0.5)),
nrow = ng,
byrow = TRUE))
colnames(X) <- paste0('c', seq_len(nc))
rownames(X) <- paste0('g', seq_len(ng))
Batches <- rep(seq_len(2), nc/2)
## Input: matrix
feature_gene_selection(
X = X,
batch_label = Batches,
nfeature = 20,
zp_threshold = 0.95)
## Input: SingleCellExperiment object
library(SingleCellExperiment)
sim_sce <- SingleCellExperiment(
list(count = X),
colData = data.frame(
Cell_name = colnames(X),
batch = Batches),
rowData = data.frame(gene_name = rownames(X)))
selected_genes <- feature_gene_selection(
X = sim_sce,
count_slot = "count",
batch_slot = "batch",
nfeature = 20,
zp_threshold = 0.95)
## Input: Seurat object
library(Seurat)
library(SeuratObject)
sim_seurat <- CreateSeuratObject(counts = as.data.frame(X))
sim_seurat <- AddMetaData(sim_seurat, colnames(X), "Cell_name")
sim_seurat <- AddMetaData(sim_seurat, Batches, "batch")
selected_genes <- feature_gene_selection(
X = sim_seurat,
count_slot = "counts",
batch_slot = "batch",
nfeature = 20,
zp_threshold = 0.95)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.