pseudo_sdc: pseudo_sdc
In jrod55/pseudoDrift: pseudoDrift

View source: R/pseudo_sdc.R

pseudo_sdc

R Documentation

pseudo_sdc

Description

Signal drift correction using QC samples present in some batches but absent in others.

Usage

pseudo_sdc(
  df = NULL,
  n.cores = 1,
  train.batch = NULL,
  test.breaks = NULL,
  test.window = NULL,
  test.index = NULL,
  criteria = "MSE",
  qc.label = NULL,
  qc.multibatch = FALSE,
  min.qc = 5,
  quantile.increment = 1,
  log_transform = TRUE,
  mad_outlier = TRUE,
  mad_threshold = 3
)

Arguments

`df`	The dataframe containing peak data. At minimum should contain columns labeled: name, sample, batch, compound, area, experiment_index, batch_index.
`n.cores`	`numeric()` The number of cores to be used for processing if being run on a multi-core machine
`train.batch`	`character()` The batch name in the df which contains QC samples (should only be one batch which will be used for training regression spline model).
`test.breaks`	`numeric()` A numeric vector indicating the number of equal sized sub-batches for the train.batch to be divided into and tested.
`test.window`	`numeric()` A numeric vector containing the sizes of sliding windows to test when performing the sliding window median calculation.
`test.index`	`numeric()` A numeric vector containing the injection position offset for pseudo QC inclusion in the peak data matrix.
`criteria`	`character()` What criteria should be minimized when determining the optimal set of parameters. Should be one of: "RSD" relative standard deviation assuming a Gaussian distribution of errors "RSD_robust" relative standard deviation assuming a non-Gaussian distribution of errors "MSE" mean squared error "TSS" total sum of squares
`qc.label`	`character()` Label designating the QC sample in the sample column of df.
`min.qc`	`numeric()` The minimum number of pseudo-QC samples to consider during model training. Should be a value greater than 2.
`quantile.increment`	`numeric()` Incremental step for qunatiles of peak areas to retain in training model.
`log_transform`	`logical()` TRUE(default)/FALSE should data be log transformed
`mad_outlier`	`logical()` TRUE(default)/FALSE should median absolute deviation (MAD) based outliers be excluded from signal drift calculation
`mad_threshold`	`numeric()` How many MAD from the median a value can be before considered an outlier. Default is 3.

Value

list() containing:

df (original input data)
df_pseudoQC (data with pseudoQC calculated samples included). Includes an additional column labeled 'class' which categorizes true QC, Sample, Pseudo_QC samples.
df_pseudoQC_corrected (signal drift corrected data using pseudoQC samples). Same columns as df_pseudoQC returned, with an aditional 'area_corrected' column designating the signal drift corrected data.
criteria_table (table with results for criteria applied along with the others not-used).

Examples

sim_dat = simulate_data(db_ids = "FIO00738",
                        nsamps_per_batch = 100,
                        xls_file_name = system.file("extdata", "Index.xls", package = "pseudoDrift"),
                        valid_sdf_file = system.file("extdata", "valid-test.sdf", package = "pseudoDrift"))

df = sim_dat[["t4_sim_mat"]][[1]]

sdc_out = pseudo_sdc(df = df,
                     train.batch = "B3",
                     test.breaks = seq(2,3,1),
                     test.window = seq(1,3,2),
                     test.index = seq(2,3,1),
                     qc.label = "QC",
                     min.qc = 2)
list2env(sdc_out ,.GlobalEnv)

jrod55/pseudoDrift documentation built on April 6, 2024, 5:23 a.m.