selectGenes: Select a subset of informative genes

View source: R/rliger.R

selectGenesR Documentation

Select a subset of informative genes

Description

This function identifies highly variable genes from each dataset and combines these gene sets (either by union or intersection) for use in downstream analysis. Assuming that gene expression approximately follows a Poisson distribution, this function identifies genes with gene expression variance above a given variance threshold (relative to mean gene expression). It also provides a log plot of gene variance vs gene expression (with a line indicating expected expression across genes and cells). Selected genes are plotted in green.

Usage

selectGenes(
  object,
  var.thresh = 0.1,
  alpha.thresh = 0.99,
  num.genes = NULL,
  tol = 1e-04,
  datasets.use = 1:length(object@raw.data),
  combine = "union",
  capitalize = FALSE,
  do.plot = FALSE,
  cex.use = 0.3,
  chunk = 1000,
  unshared = FALSE,
  unshared.datasets = NULL,
  unshared.thresh = NULL
)

Arguments

object

liger object. Should have already called normalize.

var.thresh

Variance threshold. Main threshold used to identify variable genes. Genes with expression variance greater than threshold (relative to mean) are selected. (higher threshold -> fewer selected genes). Accepts single value or vector with separate var.thresh for each dataset. (default 0.1)

alpha.thresh

Alpha threshold. Controls upper bound for expected mean gene expression (lower threshold -> higher upper bound). (default 0.99)

num.genes

Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes. Accepts single value or vector with same length as number of datasets (optional, default=NULL).

tol

Tolerance to use for optimization if num.genes values passed in (default 0.0001).

datasets.use

List of datasets to include for discovery of highly variable genes. (default 1:length(object@raw.data))

combine

How to combine variable genes across experiments. Either "union" or "intersection". (default "union")

capitalize

Capitalize gene names to match homologous genes (ie. across species) (default FALSE)

do.plot

Display log plot of gene variance vs. gene expression for each dataset. Selected genes are plotted in green. (default FALSE)

cex.use

Point size for plot.

chunk

size of chunks in hdf5 file. (default 1000)

unshared

Whether to consider unshared features (Default FALSE)

unshared.datasets

A list of the datasets to consider unshared features for, i.e. list(2), to use the second dataset

unshared.thresh

A list of threshold values to apply to each unshared dataset. If only one value is provided, it will apply to all unshared datasets. If a list is provided, it must match the length of the unshared datasets submitted.

Value

liger object with var.genes slot set.

Examples

ligerex <- createLiger(list(ctrl = ctrl, stim = stim))
ligerex <- normalize(ligerex)
ligerex <- selectGenes(ligerex)

rliger documentation built on Nov. 9, 2023, 1:07 a.m.