prefilter_data: Pre-filter features

View source: R/cadra_functions.R

prefilter_dataR Documentation

Pre-filter features

Description

Pre-filter a dataset prior running candidate_search to avoid testing features that are too prevalent or too sparse across samples in the dataset

Usage

prefilter_data(FS, max_cutoff = 0.6, min_cutoff = 0.03, verbose = FALSE)

Arguments

FS

a matrix of binary features or a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of ‘omics’ features.

max_cutoff

a numeric value between 0 and 1 describing the absolute prevalence of a feature across all samples in the FS object which the feature will be filtered out. Default is 0.6 (feature that occur in 60 percent or more of the samples will be removed)

min_cutoff

a numeric value between 0 and 1 describing the absolute prevalence of a feature across all samples in the FS object which the feature will be filtered out. Default is 0.03 (feature that occur in 3 percent or less of the samples will be removed)

verbose

a logical value indicates whether or not to print the diagnostic messages. Default is FALSE.

Value

A SummarizedExperiment object with only the filtered-in features given the filtered thresholds

Examples


# Load pre-computed feature set
data(sim_FS)

# Filter out features having < 3% and > 60% prevalence across all samples
# by (default)
sim_FS_filt1 <- prefilter_data(FS = sim_FS)

# Change the min cut-off to 1% prevalence, instead of the default of 3%
sim_FS_filt2 <- prefilter_data(FS = sim_FS, min_cutoff  = 0.01)

# Change the max cut-off to 65% prevalence, instead of the default of 60%
sim_FS_filt3 <- prefilter_data(FS = sim_FS, max_cutoff = 0.65)


montilab/CaDrA documentation built on Aug. 22, 2024, 11:55 p.m.