pw_outlier: pw_outlier
In jrod55/pseudoDrift: pseudoDrift

View source: R/pw_outlier.R

pw_outlier

R Documentation

pw_outlier

Description

Pairwise outlier removal of replicate samples within analytical batches.
This is useful for example to identify technical errors, particularly when there is not extensive replication among samples to conduct more conventional outlier detection.

Usage

pw_outlier(
  df = NULL,
  n.cores = 1,
  mad_threshold = 3,
  pw_threshold = 0.95,
  peak_shrinkage = TRUE,
  grouping_factor = "batch",
  return_plot = FALSE,
  plot_name = "pw_outlier_plot",
  samps_exclude = "QC"
)

Arguments

`df`	The dataframe containing peak data. At minimum it should contain columns labeled: name, sample, batch, compound, area, rep, rep_tech Additional columns are OK, but will not be used.
`n.cores`	`numeric()` The number of cores to be used for processing if being run on a multi-core machine
`mad_threshold`	`numeric()` The median absolute deviation (MAD) threshold to be used when identifying potential outliers
`pw_threshold`	`numeric()` The pairwise outlier difference threshold. Should be between 0 and 1. Default is set to 0.95, meaning 5% of the data will be identified as a potential outlier.
`peak_shrinkage`	`logical()` For samples surpassing the MAD threshold, should they be shrunk towards the distribution (TRUE) or completely removed from the analysis (FALSE). If set to TRUE (default), values are shrunk towards the distribution of samples and will maintain the same rankings of samples while reducing the overall distribution skew during the pairwise elimination calculation.
`grouping_factor`	The column label containing the grouping factor from which pairwise differences will be calculated. Default is batch column.
`return_plot`	`logical()` Should a density plot be returned showing the pairwise difference threshold distributions per compound. Default is FALSE.
`plot_name`	`character()` If return_plot is TRUE, what is the name of the output .pdf file. This will be saved to the current working directory.
`samps_exclude`	`character()` Label designating the QC sample in the sample column of df. These will not be subjected to pairwise outlier detection since they can be useful downstream, for example with signal drift correction.

Value

list() containing:

df (original input data)
df_cleaned (pairwise outlier cleaned data).
df_rm (samples removed by the pairwise outlier elimination).
batch_plots plots of quantile threshold for each batch. Only returned if return_plot is set to TRUE.

Examples

pw_out = pw_outlier(
df = dat,
samps_exclude = "QC",
n.cores = 2,
mad_threshold = 3,
pw_threshold = 0.95,
peak_shrinkage = TRUE,
grouping_factor = "batch",
return_plot = FALSE,
plot_name = "pw_outlier_plot")

list2env(pw_out ,.GlobalEnv)

jrod55/pseudoDrift documentation built on April 6, 2024, 5:23 a.m.