filter_outliers: Filter lowly abundant features
In PhilipBerg/PaiR: Imputation and Significance Analysis of Proteomics Data

filter_outliers

R Documentation

Filter lowly abundant features

Description

Function for filtering lowly abundant features. By default, it uses all numerical columns. Missing values are always considered as outliers.

Usage

filter_outliers(data, target = NULL, percent = 1, k = 1.5, lower_limit = NULL)

Arguments

`data`	data to filter featuers from.
`target`	columns to base the filtering on, supports `tidyselect-package`.
`percent`	A feature gets filtered out if it is lowly abundant or missing in `percent` columns.
`k`	Parameter for the lower limit of Tukey's fence, any value bellow this will be considered an outlier.
`lower_limit`	a user defined lower limit at which a measurement is considered an outlier.

Value

data with outliers removed

Examples

# Since Tukey's fences are not ideal for raw proteomics data one could use
# the e.g., the tenth percentile as a indicator of lower abundance
filter_outliers(yeast, lower_limit = stats::quantile(yeast[-1], .1, na.rm = TRUE))

# We recommend normalizing the data before filtering outliers with Tukey's fences.
# This way we ensure that no peptides are considered outliers as an effect
# of a set of samples, one average, have lower quantification or that the
# lower fence is smaller then the smallest value in the dataset
yeast <- psrn(yeast, "identifier")
filter_outliers(yeast, -1, 1, 1.5)

PhilipBerg/PaiR documentation built on March 18, 2022, noon