msPrepare: Summarize, filter, impute, transform and normalize...

Description Usage Arguments Value Examples

View source: R/prepare.R

Description

Wrapper function for the entire MSPrep pre-analytics pipeline. Calls msSummarize(), msFilter, msImpute(), and msNormalize().

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
msPrepare(
  data,
  cvMax = 0.5,
  minPropPresent = 1/3,
  filterPercent = 0.8,
  imputeMethod = c("halfmin", "bpca", "knn", "rf", "none"),
  kKnn = 5,
  nPcs = 3,
  maxIterRf = 10,
  nTreeRf = 100,
  compoundsAsNeighbors = FALSE,
  normalizeMethod = c("median", "ComBat", "quantile", "quantile + ComBat",
    "median + ComBat", "CRMN", "RUV", "SVA", "none"),
  nControl = 10,
  controls = NULL,
  nComp = 2,
  kRUV = 3,
  covariatesOfInterest = NULL,
  batch = NULL,
  transform = c("log10", "log2", "none"),
  replicate = "replicate",
  compVars = c("mz", "rt"),
  sampleVars = c("subject_id"),
  colExtraText = NULL,
  separator = NULL,
  missingValue = NA,
  returnSummaryDetails = FALSE,
  returnToSE = FALSE,
  returnToDF = FALSE
)

Arguments

data

Data set as either a data frame or 'SummarizedExperiement'.

cvMax

Decimal value from 0 to 1 representing the acceptable level of coefficient of variation between replicates.

minPropPresent

Decimal value from 0 to 1 representing the minimum proportion present to summarize with median or mean. Below this the compound will be set to 0.

filterPercent

Decimal value indicating filtration threshold. Compounds which are present in fewer samples than the specified proportion will be removed.

imputeMethod

String specifying imputation method. Options are "halfmin" (half the minimum value), "bpca" (Bayesian PCA), and "knn" (k-nearest neighbors), or "none" to skip imputation.

kKnn

Number of clusters for 'knn' method.

nPcs

Number of principle components used for re-estimation for 'bpca' method.

maxIterRf

Maximum number of iterations to be performed given the stopping criterion is not met beforehand for 'rf' method.

nTreeRf

Number of trees to grow in each forest for 'rf' method.

compoundsAsNeighbors

For KNN imputation. If TRUE, compounds will be used as neighbors rather than samples. Note that using compounds as neighbors is significantly slower than using samples.

normalizeMethod

Name of normalization method. "ComBat" (only ComBat batch correction), "quantile" (only quantile normalization), "quantile + ComBat" (quantile with ComBat batch correction), "median" (only median normalization), "median + ComBat" (median with ComBat batch correction), "CRMN" (cross-contribution compensating multiple standard normalization), "RUV" (remove unwanted variation), "SVA" (surrogate variable analysis), or "none" to skip normalization.

nControl

Number of controls to estimate/utilize (for CRMN and RUV).

controls

Vector of control identifiers. Leave blank for data driven controls. Vector of column numbers from metafin dataset of that control (for CRMN and RUV).

nComp

Number of factors to use in CRMN algorithm.

kRUV

Number of factors to use in RUV algorithm.

covariatesOfInterest

Sample variables used as covariates in normalization algorithms (required for ComBat, CRMN, and SVA).

batch

Name of the sample variable identifying batch.

transform

Select transformation to apply to data prior to normalization. Options are "log10", "log2", and "none".

replicate

Name of sample variable specifying replicate. Must match an element in 'sampleVars' or a column in the column data of a 'SummarizedExperiment'.

compVars

Vector of the columns which identify compounds. If a 'SummarizedExperiment' is used for 'data', row variables will be used.

sampleVars

Vector of the ordered sample variables found in each sample column.

colExtraText

Any extra text to ignore at the beginning of the sample columns names. Unused for 'SummarizedExperiments'.

separator

Character or text separating each sample variable in sample columns. Unused for 'SummarizedExperiment'.

missingValue

Specifies the abundance value which indicates missing data. May be a numeric or 'NA'.

returnSummaryDetails

Logical value specifying whether to return details of replicate summarization.

returnToSE

Logical value specifying whether to return as 'SummarizedExperiment'

returnToDF

Logical value specifying whether to return as data frame.

Value

A data frame or 'SummarizedExperiment' with summarized technical replicates (if present), filtered compounds, missing values imputed, and transformed and normalized abundances. Default return type is set to match the data input but may be altered with the 'returnToSE' or 'returnToDF' arguments.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Load example data
data(msquant)

# Call function to tidy, summarize, filter, impute, and normalize data
peparedData <- msPrepare(msquant, cvMax = 0.50, minPropPresent = 1/3,
                         filterPercent = 0.8, imputeMethod = "halfmin",
                         normalizeMethod = "quantile",
                         compVars = c("mz", "rt"),
                         sampleVars = c("spike", "batch", "replicate", 
                                        "subject_id"),
                         colExtraText = "Neutral_Operator_Dif_Pos_",
                         separator = "_", missingValue = 1, 
                         returnToSE = FALSE)

KechrisLab/MSPrep documentation built on Feb. 2, 2022, 2:43 a.m.