filterCodingGenes: Filter Coding Gene Symbols (or any matching input Patterns)

View source: R/Seurat.Utils.R

filterCodingGenesR Documentation

Filter Coding Gene Symbols (or any matching input Patterns)

Description

This function filters out gene names that match specified patterns. It reports the original and final number of gene symbols and the percentage remaining after filtering. It filters out non-coding gene symbols by default.

Usage

filterCodingGenes(
  genes,
  pattern_NC = c("^A[CFLP][0-9]{6}", "^Z[0-9]{5}", "^LINC0[0-9]{4}", "^C[1-9]+orf[1-9]+",
    "[-|\\.]AS[1-9]*$", "[-|\\.]DT[1-9]*$", "^MIR[1-9]", "^SNHG[1-9]"),
  v = TRUE,
  unique = TRUE,
  ...
)

Arguments

genes

A character vector of gene symbols.

pattern_NC

A character vector of patterns to filter out non-coding gene symbols. Default: c("^AC.", "^AL.", "^c1-9orf", "\.AS1-9$").

v

"verbose" Whether to print the number of genes before and after filtering.

unique

Whether to return unique gene symbols. Default: TRUE.

...

Additional arguments to pass to str_detect.

Value

A character vector of filtered gene symbols.

Examples

genes <- c("AC123", "AL456", "c1orf7", "TP53", "BRCA1", "X1.AS1", "MYC")
genes_kept <- filterCodingGenes(genes)
print(genes_kept)


vertesy/Seurat.utils documentation built on Dec. 4, 2024, 5:20 p.m.