filter_feature: filter_feature.R

View source: R/filter_feature.R

filter_featureR Documentation

filter_feature.R

Description

filter out reads based on cutoff threshold and asv prevalence across samples

Usage

filter_feature(
  count_df,
  tax_df,
  filter_method = "abs_count",
  asv_cutoff = 1,
  prev_cutoff = 2
)

Arguments

count_df

dataframe. count table with samples in columns and ASV in rows. feature ID in rownames.

tax_df

dataframe. featureID must match rownames count_df. has columns 'featureID','Kingdom','Phylum', 'Class','Order','Family','Genus','Species'

filter_method

default "abs_count" must be one of c("abs_count", "percent_sample", "percent_dataset"). Therefore, ASVs must reach a certain percentage of the entire dataset

asv_cutoff

cutoff used to filter sequences. features are kept when they are greater than this cutoff

prev_cutoff

prevalence cutoff. ASVs must reach the asv_cutoff in at least this many samples to be kept.

Details

Filtering is performed based on read count and sample prevalence. ASVs are kept if they pass the ASV count cut-off OR if they pass the sample prevalence cut-off. asv_cutoff = 'abs_count' uses a read count as a threshold cutoff. recommended default of 1 When asv_cutoff is set to 'percent_sample' uses percent of sample total read count as the threshold cutoff. Therefore, ASVs must reach a certain percentage of a given sample. Recommended default of 0.01 for 0.01% of each sample When asv_cutoff is set to 'percent_dataset' uses percent of dataset total read count as the threshold cutoff. Recommended default of 0.01 for 0.01% of entire dataset prev_cutoff has minimum value of 1 (sequence must reach cutof in at least 1 sample, which would not filter out any sequences). Default value is set to 2, which is the most relaxed cutoff A recommended default is the number of samples to make up 5% of total number of samples.

Value

list of:
filtered_table - filtered
Also returns list of:
\code{p_agg} - plot of sequences removed/kept based on relative abundance
vs asv prevalence in aggregated (mean ASV relative abundance)
\code{p_exp} - expanded view (ASV relative abundance for every sample shown).
\code{feat_keep} - vector of ASVs remaining after filtering
\code{feat_remove} - vector of ASVs removed during filtering

Examples

data(dss_example)

# put featureID as rownames
tax_df <- dss_example$merged_taxonomy
count_df <- dss_example$merged_abundance_id %>%
  column_to_rownames('featureID')
# set features in count tax to be in same order
count_df <- count_df[tax_df$featureID,]

filtered_ls <- filter_feature(count_df, tax_df, 'percent_sample', 0.001, 2)
summary(filtered_ls)
filtered_count <- filtered_ls$filtered
dim(filtered_count)
head(filtered_count)

OxfordCMS/OCMSutility documentation built on Feb. 27, 2025, 8:19 p.m.