filter_feature: filter_feature.R
In OxfordCMS/OCMS_Utility: Utility functions for OCMS

filter_feature

R Documentation

filter_feature.R

Description

filter out reads based on cutoff threshold and asv prevalence across samples

Usage

filter_feature(
  count_df,
  tax_df,
  filter_method = "abs_count",
  asv_cutoff = 1,
  prev_cutoff = 2
)

Arguments

`count_df`	dataframe. count table with samples in columns and ASV in rows. feature ID in rownames.
`tax_df`	dataframe. featureID must match rownames `count_df`. has columns `'featureID','Kingdom','Phylum', 'Class','Order','Family','Genus','Species'`
`filter_method`	default `"abs_count"` must be one of `c("abs_count", "percent_sample", "percent_dataset")`. Therefore, ASVs must reach a certain percentage of the entire dataset
`asv_cutoff`	cutoff used to filter sequences. features are kept when they are greater than this cutoff
`prev_cutoff`	prevalence cutoff. ASVs must reach the `asv_cutoff` in at least this many samples to be kept.

Details

Filtering is performed based on read count and sample prevalence. ASVs are kept if they pass the ASV count cut-off OR if they pass the sample prevalence cut-off. asv_cutoff = 'abs_count' uses a read count as a threshold cutoff. recommended default of 1 When asv_cutoff is set to 'percent_sample' uses percent of sample total read count as the threshold cutoff. Therefore, ASVs must reach a certain percentage of a given sample. Recommended default of 0.01 for 0.01% of each sample When asv_cutoff is set to 'percent_dataset' uses percent of dataset total read count as the threshold cutoff. Recommended default of 0.01 for 0.01% of entire dataset prev_cutoff has minimum value of 1 (sequence must reach cutof in at least 1 sample, which would not filter out any sequences). Default value is set to 2, which is the most relaxed cutoff A recommended default is the number of samples to make up 5% of total number of samples.

Value

list of:
filtered_table - filtered
Also returns list of:
\code{p_agg} - plot of sequences removed/kept based on relative abundance
vs asv prevalence in aggregated (mean ASV relative abundance)
\code{p_exp} - expanded view (ASV relative abundance for every sample shown).
\code{feat_keep} - vector of ASVs remaining after filtering
\code{feat_remove} - vector of ASVs removed during filtering

Examples

data(dss_example)

# put featureID as rownames
tax_df <- dss_example$merged_taxonomy
count_df <- dss_example$merged_abundance_id %>%
  column_to_rownames('featureID')
# set features in count tax to be in same order
count_df <- count_df[tax_df$featureID,]

filtered_ls <- filter_feature(count_df, tax_df, 'percent_sample', 0.001, 2)
summary(filtered_ls)
filtered_count <- filtered_ls$filtered
dim(filtered_count)
head(filtered_count)

OxfordCMS/OCMS_Utility documentation built on July 16, 2025, 9:06 p.m.