msqrobsum: Robust differential abundance analysis for label-free...

Description Usage Arguments Value Examples

View source: R/functions.r

Description

Perform robust differential abundance analysis directly on peptide abundances from label-free quantitative proteomics experiment. Or first do a robust summarization on these peptide abundances to protein abundances and perform a robust differential abundance analysis on these summarized protein abundances.

Usage

1
2
3
4
5
6
7
msqrobsum(data, formulas, group_vars = "protein", contrasts = NULL,
  mode = c("msqrobsum", "msqrob", "sum"), robust_lmer_iter = "auto",
  squeeze_variance = TRUE, p_adjust_method = c("BH", p.adjust.methods),
  keep_model = FALSE, rlm_args = list(maxit = 20L),
  lmer_args = list(control = lmerControl(calc.derivs = FALSE)),
  parallel_args = list(strategy = "multisession"),
  type_df = "traceHat", squeeze_covariate = FALSE, fit_fun = do_mm)

Arguments

data

MSnset object or dataframe or with at least folowing 3 columns:

expression

Expression values in log scale.

sample

Which sample/run the measurement belongs to. Only required when mode = “sum” or “msqrobsum”.

feature

Which feature (eg. peptide id) the measurement belongs to. Only required when mode = “sum” or “msqrobsum”.

Any variables specified in formulas.
formulas

Vector of formulas. These are the msqrob model specifications. A two-sided linear “lme4” formula object describing both the fixed-effects and random-effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors (eg. (1 | treatment)). See “lme4” package for more details. When multiple models are specified then the first formula is tried first to fit the data. When this fails, the second is tried, etc.

group_vars

Character vector of variable names. The variables used to group the data (eg. protein id). A model will be fitted for each group.

contrasts

Numeric matrix with contrasts or character with variable name. When a variable name is specified then this should correspond to categorial variable (eg. treatment) an should be specified in the model as a random effect. Every possible contrast will then be calculated. The contrast matrix should also only involve categorial parameters specified as random effect. This is because the reference level in the model can change between groups (eg. proteins) due to missing category levels

mode

Character. “'msqrobsum”' Summarization and MSqRob analysis is performed on the data. “'msqrob”' Only MSqRob analysis is performed on the data. “'sum”' Only Summarization is performed on the data.

robust_lmer_iter

Integer or “'auto”'. Number of iterations used for robust estimation in MSqRob (M-estimation with Huber weights). when set to “'auto”', defaults to 1 if “mode = msqrobsum” and 20 if “mode = msqrob”

squeeze_variance

Logical. “TRUE” if you want to squeeze the residual standard deviation of all models should be squeezed towards a common value

p_adjust_method

Character. Correction method for multiple testing. Defaults to "fdr". See “fdrtool::p.adjust” for more information an all available methods.

keep_model

Logical. “TRUE” (default) if you want to keep all lme4 models in the output. (memory heavy)

rlm_args

Named list. All parameters to be passed to the 'rlm' function used in the summarization step. Default parameters when empty list. See “MASS::rlm” for more information on all parameters and default settings.

lmer_args

Named list. All parameters to be passed to the 'lmer' function used in the MSqRob analysis. Default parameters when empty list. See “lme4::lmer” for more information on all parameters and default settings.

parallel_args

Named list. All parameters to be passed to the ‘plan' function from the 'future' package which allows for parallelization. Set “strategy = ’multisession” to allow parallelization using all available cores (default). Set the “workers” parameter to an integer to choose the number of cores to be used. Set “strategy = sequential” to disable parallelization. See “future::plan” for more information on all available perallelization strategies and other parameters with their default settings.

Value

A data frame. Following columns are present:

group vars
data
data_summarized
formula
df
sigma
intercept
sigma_post
df_prior
sigma_prior
df_post
contrasts

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Robust summarization from peptide intensities to protein summaries on
## the build-in data set with peptide intensities from 100 proteins.
## For only 100 proteins we will not benefit from parallezation because
## robust summarization is a fairly fast routine.
results1 <- msqrobsum( data = peptide_intensities
                    , mode = 'sum'
                    , group_vars = 'protein'
                    , parallel_args = list(strategy = 'sequential'))

## MSqRobSum analysis
## There are 20 samples belonging to 5 different conditions.
## Differential expression is tested between all conditions.
form = expression ~ (1|condition)
results2 <- msqrobsum(data = peptide_intensities
                    , formulas = form
                    , mode = 'msqrobsum'
                    , group_vars = 'protein'
                    , contrasts = 'condition'
                    , parallel_args = list(strategy = 'sequential'))

## MSqRob analysis
## There are 20 samples belonging to 5 different conditions.
## Since there is no prior summarization from peptide to protein intensities.
## The model has to take into account the sample and feature (peptide) effects
form =  c(expression ~ (1|condition) + (1|sample) + (1|feature), expression ~ (1|condition))
## Differential expression is tested between all conditions.
## Fitting the full MSqRob models takes longer then the simplified models in MSqRobSum.
## Therefore it's suggested that you allow for parallelization,
## especially if you have big data sets with many samples and thousands of proteins.
## eg. if you have 2 available processing cores.
results3 <- msqrobsum(data = peptide_intensities
                    , formulas = form
                    , mode = 'msqrob'
                    , group_vars = 'protein'
                    , contrasts = 'condition'
                    , parallel_args = list(strategy = 'multisession', workers = 2))

statOmics/MSqRobSum documentation built on July 5, 2021, 4:49 p.m.