FindInformativeVariants: Find informative variants

View source: R/FindInformativeVariants.R

FindInformativeVariantsR Documentation

Find informative variants

Description

Computes the fraction of detection of variants at the single cell level, and entropy associated with the variants, based on a seurat objects containing a variant call consensus matrix (VAR). Stores this information in the assay metadata, and compute the most informative variants (stored as VariableFeatures of the assay). Returns the updated Seurat object. If a genotype object is also given, it will also be updated with this info, as well as variants_by_coverage and variants_by_information sorted lists.

Usage

FindInformativeVariants(
  seurat,
  genotype = NA,
  n.variants = 10000,
  assay = "VAR"
)

Arguments

seurat

A seurat object containing a consensus matrix.

genotype

A genotype object to be updated with the coverage and information data, or NA to give no genotype to update and return only a Seurat object.

n.variants

numeric(1). The number of most informative variants stored in the @informative_variants slot (Default: 10000).

assay

character(1). The name of the assay which stores the single cell variant calls data within the Seurat object, in vartrix formatting (sparse matrix with values 1-2-3 for ref-alt-het).

Details

FindInformativeVariants populates the following metadata columns and slots in the genotype/Seurat objects:

$excess_entropy (see below),

$coverage (number of cells with data for the variant),

$coverage_frac (fraction of the cell with data for the variant),

@variants_by_coverage (variants sorted from top to least coverage, genotype object only),

@variants_by_information (variants sorted from top to least excess entropy, genotype object only),

@informative_variants or VariableFeatures() (most informative variants, categorical data equivalent of what most variable features is for continuous data).

The amount of information provided by a variant at the single cell level is computed as the Shannon entropy, sum(-p*log2(p)), minus the minimal entropy contributed by the mere coverage of the variant, i.e. the entropy value if the data had just the two levels nodata and data. This is analogous to the excess variance for continuous data, translated to discrete variant data. This quantity is referred to as excess_entropy in the genotype metadata.

Value

Returns the updated Seurat object if no genotype is given, or list(seurat,genotype) objects if a genotype is given.

Examples

MySeuratObject <- FindInformativeVariants(MySeuratObject, n.variants = 20000)
library(zeallot) # To enable the multiassignment operator. Otherwise need to deconstruct the list manually.
c(MySeuratObject, MyGenotypes) %<-% FindInformativeVariants(MySeuratObject, MyGenotypes, n.variants = 20000)

nbroguiere/burgertools documentation built on Jan. 30, 2024, 3:48 a.m.