pw_outlier: pw_outlier

View source: R/pw_outlier.R

pw_outlierR Documentation

pw_outlier

Description

Pairwise outlier removal of replicate samples within analytical batches.
This is useful for example to identify technical errors, particularly when there is not extensive replication among samples to conduct more conventional outlier detection.

Usage

pw_outlier(
  df = NULL,
  n.cores = 1,
  mad_threshold = 3,
  pw_threshold = 0.95,
  peak_shrinkage = TRUE,
  grouping_factor = "batch",
  return_plot = FALSE,
  plot_name = "pw_outlier_plot",
  samps_exclude = "QC"
)

Arguments

df

The dataframe containing peak data. At minimum it should contain columns labeled: name, sample, batch, compound, area, rep, rep_tech
Additional columns are OK, but will not be used.

n.cores

numeric() The number of cores to be used for processing if being run on a multi-core machine

mad_threshold

numeric() The median absolute deviation (MAD) threshold to be used when identifying potential outliers

pw_threshold

numeric() The pairwise outlier difference threshold. Should be between 0 and 1. Default is set to 0.95, meaning 5% of the data will be identified as a potential outlier.

peak_shrinkage

logical() For samples surpassing the MAD threshold, should they be shrunk towards the distribution (TRUE) or completely removed from the analysis (FALSE).
If set to TRUE (default), values are shrunk towards the distribution of samples and will maintain the same rankings of samples while reducing the overall distribution skew during the pairwise elimination calculation.

grouping_factor

The column label containing the grouping factor from which pairwise differences will be calculated. Default is batch column.

return_plot

logical() Should a density plot be returned showing the pairwise difference threshold distributions per compound. Default is FALSE.

plot_name

character() If return_plot is TRUE, what is the name of the output .pdf file. This will be saved to the current working directory.

samps_exclude

character() Label designating the QC sample in the sample column of df. These will not be subjected to pairwise outlier detection since they can be useful downstream, for example with signal drift correction.

Value

list() containing:

  • df (original input data)

  • df_cleaned (pairwise outlier cleaned data).

  • df_rm (samples removed by the pairwise outlier elimination).

  • batch_plots plots of quantile threshold for each batch. Only returned if return_plot is set to TRUE.

Examples

pw_out = pw_outlier(
df = dat,
samps_exclude = "QC",
n.cores = 2,
mad_threshold = 3,
pw_threshold = 0.95,
peak_shrinkage = TRUE,
grouping_factor = "batch",
return_plot = FALSE,
plot_name = "pw_outlier_plot")

list2env(pw_out ,.GlobalEnv)

jrod55/pseudoDrift documentation built on April 6, 2024, 5:23 a.m.