pseudo_sdc: pseudo_sdc

View source: R/pseudo_sdc.R

pseudo_sdcR Documentation

pseudo_sdc

Description

Signal drift correction using QC samples present in some batches but absent in others.

Usage

pseudo_sdc(
  df = NULL,
  n.cores = 1,
  train.batch = NULL,
  test.breaks = NULL,
  test.window = NULL,
  test.index = NULL,
  criteria = "MSE",
  qc.label = NULL,
  qc.multibatch = FALSE,
  min.qc = 5,
  quantile.increment = 1,
  log_transform = TRUE,
  mad_outlier = TRUE,
  mad_threshold = 3
)

Arguments

df

The dataframe containing peak data. At minimum should contain columns labeled: name, sample, batch, compound, area, experiment_index, batch_index.

n.cores

numeric() The number of cores to be used for processing if being run on a multi-core machine

train.batch

character() The batch name in the df which contains QC samples (should only be one batch which will be used for training regression spline model).

test.breaks

numeric() A numeric vector indicating the number of equal sized sub-batches for the train.batch to be divided into and tested.

test.window

numeric() A numeric vector containing the sizes of sliding windows to test when performing the sliding window median calculation.

test.index

numeric() A numeric vector containing the injection position offset for pseudo QC inclusion in the peak data matrix.

criteria

character() What criteria should be minimized when determining the optimal set of parameters. Should be one of:

  • "RSD" relative standard deviation assuming a Gaussian distribution of errors

  • "RSD_robust" relative standard deviation assuming a non-Gaussian distribution of errors

  • "MSE" mean squared error

  • "TSS" total sum of squares

qc.label

character() Label designating the QC sample in the sample column of df.

min.qc

numeric() The minimum number of pseudo-QC samples to consider during model training. Should be a value greater than 2.

quantile.increment

numeric() Incremental step for qunatiles of peak areas to retain in training model.

log_transform

logical() TRUE(default)/FALSE should data be log transformed

mad_outlier

logical() TRUE(default)/FALSE should median absolute deviation (MAD) based outliers be excluded from signal drift calculation

mad_threshold

numeric() How many MAD from the median a value can be before considered an outlier. Default is 3.

Value

list() containing:

  • df (original input data)

  • df_pseudoQC (data with pseudoQC calculated samples included). Includes an additional column labeled 'class' which categorizes true QC, Sample, Pseudo_QC samples.

  • df_pseudoQC_corrected (signal drift corrected data using pseudoQC samples). Same columns as df_pseudoQC returned, with an aditional 'area_corrected' column designating the signal drift corrected data.

  • criteria_table (table with results for criteria applied along with the others not-used).

Examples

sim_dat = simulate_data(db_ids = "FIO00738",
                        nsamps_per_batch = 100,
                        xls_file_name = system.file("extdata", "Index.xls", package = "pseudoDrift"),
                        valid_sdf_file = system.file("extdata", "valid-test.sdf", package = "pseudoDrift"))

df = sim_dat[["t4_sim_mat"]][[1]]

sdc_out = pseudo_sdc(df = df,
                     train.batch = "B3",
                     test.breaks = seq(2,3,1),
                     test.window = seq(1,3,2),
                     test.index = seq(2,3,1),
                     qc.label = "QC",
                     min.qc = 2)
list2env(sdc_out ,.GlobalEnv)

jrod55/pseudoDrift documentation built on April 6, 2024, 5:23 a.m.