filter_genes: Filter lowly expressed genes
In latlio/tidyde: Tidy Differential Expression

Description Usage Arguments Details Value Examples

View source: R/de_analysis.R

'filter_genes()' is a wrapper function for several filtering methods.

filter_genes(
  count_df,
  id,
  filter_method,
  min_samples = 10,
  min_cpm = 0.25,
  ...
)

`count_df`	preprocessed dataframe of pure counts
`id`	vector of gene IDs
`filter_method`	Either 'edgeR', 'samplenr', or 'cpm'
`min_samples`	minimum number of samples
`min_cpm`	minimum cpm
`...`	additional arguments to 'filterByExpr()'

I encourage users to exercise caution before using this filter function. Oftentimes, the filtering step should be specific to the sequencing experiment. The 'edgeR' option is a wrapper for 'edgeR::filterByExpr()'. The 'samplenr' option filters out genes across sample whose counts are lower 2*number_of_samples The 'cpm' option filters out genes whose rowsums (excluding cells lower than 'min_cpm') are less than number_of_samples/min_samples

a 'list' ('DGEList') with the following components:

`counts`	a vector of the filtered counts
`samples`	a dataframe containing the library sizes and the normalization factors
`genes`	a dataframe containing the gene IDs

counts <- readr::read_delim("data/GSE60450_Lactation-GenewiseCounts.txt", delim = "\t")
meta <- readr::read_delim("data/SampleInfo_Corrected.txt", delim = "\t") %>%
  mutate(FileName = stringr::str_replace(FileName, "\\.", "-"))

# this step may differ depending on how your data is formatted
id <- as.character(counts$EntrezGeneID)

check_sample_names(counts, c(1,2), meta, FileName) %>%
  purrr::pluck("mod_count") %>%
  filter_genes(., id, "edgeR")