selectGenes | R Documentation |
This function identifies highly variable genes from each dataset and combines these gene sets (either by union or intersection) for use in downstream analysis. Assuming that gene expression approximately follows a Poisson distribution, this function identifies genes with gene expression variance above a given variance threshold (relative to mean gene expression). Alternatively, we allow selecting a desired number of genes for each dataset by ranking the relative variance, and then take the combination.
selectGenes(object, thresh = 0.1, nGenes = NULL, alpha = 0.99, ...)
## S3 method for class 'liger'
selectGenes(
object,
thresh = 0.1,
nGenes = NULL,
alpha = 0.99,
useDatasets = NULL,
useUnsharedDatasets = NULL,
unsharedThresh = 0.1,
combine = c("union", "intersection"),
chunk = 1000,
verbose = getOption("ligerVerbose", TRUE),
var.thresh = thresh,
alpha.thresh = alpha,
num.genes = nGenes,
datasets.use = useDatasets,
unshared.datasets = useUnsharedDatasets,
unshared.thresh = unsharedThresh,
tol = NULL,
do.plot = NULL,
cex.use = NULL,
unshared = NULL,
...
)
## S3 method for class 'Seurat'
selectGenes(
object,
thresh = 0.1,
nGenes = NULL,
alpha = 0.99,
useDatasets = NULL,
layer = "ligerNormData",
assay = NULL,
datasetVar = "orig.ident",
combine = c("union", "intersection"),
verbose = getOption("ligerVerbose", TRUE),
...
)
object |
A liger, ligerDataset or
|
thresh |
Variance threshold used to identify variable genes. Higher
threshold results in fewer selected genes. Liger and Seurat S3 methods accept
a single value or a vector with specific threshold for each dataset in
|
nGenes |
Number of genes to find for each dataset. By setting this,
we optimize the threshold used for each dataset so that we get |
alpha |
Alpha threshold. Controls upper bound for expected mean gene
expression. Lower threshold means higher upper bound. Default |
... |
Arguments passed to other methods. |
useDatasets |
A character vector of the names, a numeric or logical
vector of the index of the datasets to use for shared variable feature
selection. Default |
useUnsharedDatasets |
A character vector of the names, a numeric or
logical vector of the index of the datasets to use for finding unshared
variable features. Default |
unsharedThresh |
The same thing as |
combine |
How to combine variable genes selected from all datasets.
Choose from |
chunk |
Integer. Number of maximum number of cells in each chunk, when
gene selection is applied to any HDF5 based dataset. Default |
verbose |
Logical. Whether to show information of the progress. Default
|
var.thresh , alpha.thresh , num.genes , datasets.use , unshared.datasets , unshared.thresh |
Deprecated. These arguments are renamed and will be removed in the future. Please see function usage for replacement. |
tol , do.plot , cex.use , unshared |
Deprecated. Gene variability
metric is now visualized with separated function
|
layer |
Where the input normalized counts should be from. Default
|
assay |
Name of assay to use. Default |
datasetVar |
Metadata variable name that stores the dataset source
annotation. Default |
Updated object
liger method - Each involved dataset stored in
ligerDataset is updated with its featureMeta
slot and varUnsharedFeatures
slot (if requested with
useUnsharedDatasets
), while varFeatures(object)
will be
updated with the final combined gene set.
Seurat method - Final selection will be updated at
Seurat::VariableFeatures(object)
. Per-dataset information is
stored in the meta.features
slot of the chosen Assay.
pbmc <- normalize(pbmc)
# Select basing on thresholding the relative variance
pbmc <- selectGenes(pbmc, thresh = .1)
# Select specified number for each dataset
pbmc <- selectGenes(pbmc, nGenes = c(60, 60))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.