filter_genes: Filter lowly expressed genes

Description Usage Arguments Details Value Examples

View source: R/de_analysis.R

Description

'filter_genes()' is a wrapper function for several filtering methods.

Usage

1
2
3
4
5
6
7
8
filter_genes(
  count_df,
  id,
  filter_method,
  min_samples = 10,
  min_cpm = 0.25,
  ...
)

Arguments

count_df

preprocessed dataframe of pure counts

id

vector of gene IDs

filter_method

Either 'edgeR', 'samplenr', or 'cpm'

min_samples

minimum number of samples

min_cpm

minimum cpm

...

additional arguments to 'filterByExpr()'

Details

I encourage users to exercise caution before using this filter function. Oftentimes, the filtering step should be specific to the sequencing experiment. The 'edgeR' option is a wrapper for 'edgeR::filterByExpr()'. The 'samplenr' option filters out genes across sample whose counts are lower 2*number_of_samples The 'cpm' option filters out genes whose rowsums (excluding cells lower than 'min_cpm') are less than number_of_samples/min_samples

Value

a 'list' ('DGEList') with the following components:

counts

a vector of the filtered counts

samples

a dataframe containing the library sizes and the normalization factors

genes

a dataframe containing the gene IDs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
counts <- readr::read_delim("data/GSE60450_Lactation-GenewiseCounts.txt", delim = "\t")
meta <- readr::read_delim("data/SampleInfo_Corrected.txt", delim = "\t") %>%
  mutate(FileName = stringr::str_replace(FileName, "\\.", "-"))

# this step may differ depending on how your data is formatted
id <- as.character(counts$EntrezGeneID)

check_sample_names(counts, c(1,2), meta, FileName) %>%
  purrr::pluck("mod_count") %>%
  filter_genes(., id, "edgeR")

latlio/tidyde documentation built on Dec. 21, 2021, 9:40 a.m.