KeepHighGene: Filter and return highly expressed or highly variable genes

keepHighGeneR Documentation

Filter and return highly expressed or highly variable genes

Description

This function is for filtering genes in the expression count matrix based on their average expression and variability. Usually genes with low expressions and variability are less interesting and do not contribute too much to downstream analyses, but rather bring technical noise. It is always recommended to pre-filter gene expression matrix before any analyses.

We apply the function FindVariableFeatures with selection.method = "mvp" from package Seurat on log transformed expression matrix to detect high variable genes. This method does not require a pre-specified number of high variable genes.

Usage

keepHighGene(
  count_mat,
  top_high = 5000,
  mean_cutoff = 1,
  return_matrix = FALSE,
  verbose = TRUE
)

Arguments

count_mat

(matrix of num) Input count matrix to be filtered. Can be either standard matrix format or sparse matrix format.

top_high

(int) Only look for highly expressed and variable genes within this number of top expressed genes. Default: 5000.

mean_cutoff

(num) Genes with average expressions among all spots exceeding this cutoff are kept as highly expressed genes.

return_matrix

(logical) Whether return filtered matrix instead of gene names. Default: FALSE

verbose

(logical) Whether print progress information. Default: TRUE

Value

A vector of gene names or a filtered expression count matrix with the same class as count_mat.

Examples


data(mbrain_raw)
dim(mbrain_raw)

mbrain_raw_f <- keepHighGene(mbrain_raw, mean_cutoff=100,
                             return_matrix=TRUE)
dim(mbrain_raw_f)


zijianni/SpotClean documentation built on Nov. 15, 2023, 12:53 a.m.