fit.model: Fit peptide-based models

View source: R/fit_model.R

fit.modelR Documentation

Fit peptide-based models

Description

Fits a model to each protein of a protdata object and returns a corresponding protLM object. In its standard settings, the function returns a protLM object containing robust ridge models as described in Goeminne et al. (2015). However, the user can also specify to turn off the ridge regression and fit the models by ordinary least squares (OLS) and/or to turn off the down-weighing of outliers by M estimation with Huber weights and/or to turn off the Empirical Bayes squeezing of variances.

Usage

fit.model(protdata, response = NULL, fixed = NULL, random = NULL,
  add.intercept = TRUE, shrinkage.fixed = NULL, weights = "Huber",
  k = 1.345, par_squeeze = NULL, squeezeVar = TRUE, min_df = 1,
  robust_var = TRUE, robustM = TRUE, scaleUnshrFix = FALSE,
  modfiedGS = FALSE, tolPwrss = 1e-10, verbose = FALSE,
  printProgress = FALSE, shiny = FALSE, message_fitting = NULL,
  message_thetas = NULL, message_squeeze = NULL, message_update = NULL,
  ...)

Arguments

protdata

A protdata object to which peptide-based models must be fitted. Note that factors should be coded as factors and numeric variables as numeric in each data slot.

response

The name of the column in the data slot of the protdata object that contains the response variable for the model, mostly the column containing the log2 transformed intensity values.

fixed

Either a vector of names corresponding to the columns in the data slot of the protdata object containing the predictor variables, or a right-hand sided fixed effects formula without intercept, which should be indicated in the argument add.intercept. NULL (the default) indicates that no fixed effects other than a possible fixed intercept are present in the model.

random

Either a vector of names corresponding to the columns in the data slot of the protdata object containing the predictor variables or a right-hand sided random effects formula. Adding the peptide sequences as one of the random effects predictors is highly recommended as individual peptide effects are often quite strong. NULL (the default) indicates that no random effects are present in the model.

add.intercept

A logical value indicating whether the fitted models should contain a fixed intercept. If missing, the value is set to TRUE, indicating the intercept should be present in the model.

shrinkage.fixed

A numeric vector containing only 0 and/or 1 of length equal to the number of fixed effects, potential intercept included. The nth element in the shrinkage.fixed vector indicates whether the nth fixed effect should be shrunken (value 1) or not (value 0). If add.intercept=TRUE, the first element of the vector indicates the intercept. shrinkage.intercept = NULL (default) indicates all fixed effects except the intercept should be shrunken.

weights

The type of weighing that should be performed. Supported weighing methods incluce "Huber" (the default) for M estimation with Huber weights and NULL when no weighing should be applied. One can also supply a list of weights with length equal to the number of proteins in the protdata object. Each element of the list should either contain "Huber" or NULL or a numeric vector containing weights with length equal to the number of observations for that protein.

k

The tuning constant for the Huber mean when weighing down outliers. The default (k = 1.345) produces 95 % efficiency relative to the sample mean when the population is normal but provides substantial resistance to outliers when it is not.

par_squeeze

A character vector indicating which model parameters need to be squeezed. When squeezing random effects, provide their names. Fixed effects are present in shrinkage groups, e.g. ridgeGroup.1. If you want them to be squeezed as well, provide the names of the shrinkage groups that need to be squeezed. The default NULL indicates that no parameters will be squeezed.

squeezeVar

A logical indicating whether the residual standard deviation of all models should be squeezed towards a common value. Defaults to TRUE. If set to FALSE, residual standard deviations will not be squeezed.

min_df

A numeric value indicating the minimal degrees of freedom that will be taken into account for calculating the prior degrees of freedom and prior variance. Only used when par_squeeze=TRUE or squeezeVar is not NULL.

robust_var

A logical indicating wheter the estimation of the prior degrees of freedom and the prior variance (needed for shrinkage) should be robustified against outlier variances. Only used when par_squeeze=TRUE or squeezeVar is not NULL. Defaults to TRUE.

robustM

A logical indicating wheter the weighing of the observations in the M estimation step should be based on the Huber weights of the residuals devided by the median absolute deviation (mad) of the residuals instead of the residuals devided by the residual standard deviation. Older MSqRob implementations used the approach with the residual standard deviation as a default. With the new approach, downweighing outliers will be more efficient, as the mad is not strongly influenced by outliers. Defaults to TRUE.

modfiedGS

A logical indicating wheter the Modified Gram-Schmidt algorithm should be used for orthogonalizing the fixed effects design matrix instead of the regular QR decomposition algorithm. Defaults to FALSE.

tolPwrss

A numeric value indicating the maximally tolerated deviation on the penalized weighted residual sum of squares when iteratively estimating the weights by M estimation.

verbose

A logical value indicating whether the models should be printed out. Defaults to FALSE.

printProgress

A logical indicating whether the R should print a message before fitting each model. Defaults to FALSE.

shiny

A logical indicating whether this function is being used by a Shiny app. Setting this to TRUE only works when using this function in a Shiny app and allows for dynamic progress bars. Defaults to FALSE.

message_fitting

Only used when printProgress=TRUE and shiny=TRUE. A single-element character vector: the message to be displayed to the user during the fitting of the models, or NULL to hide the current message (if any).

message_thetas

Only used when printProgress=TRUE and shiny=TRUE. A single-element character vector: the message to be displayed to the user during the extraction of the variances, or NULL to hide the current message (if any).

message_squeeze

Only used when printProgress=TRUE and shiny=TRUE. A single-element character vector: the message to be displayed to the user during the squeezing of the variances, or NULL to hide the current message (if any).

message_update

Only used when printProgress=TRUE and shiny=TRUE. A single-element character vector: the message to be displayed to the user during the updating of the models, or NULL to hide the current message (if any).

...

Other parameters to be passed to the model fitting function internally.

Value

A protLM object containing the names of all proteins in the input protdata object and their corresponding fitted models.

References

Ludger Goeminne, Kris Gevaert and Lieven Clement. Peptide-level robust ridge regression improves estimation, sensitivity and specificity in data-dependent quantitative label-free shotgun proteomics. Molecular & Cellular Proteomics, 2016.

Examples

data(proteinsCPTAC, package="MSqRob")
#Fitting models for the first 10 proteins in the protdata object proteinsCPTAC using the robust ridge approach of Goeminne et al. (2015):
modelRRCPTAC <- fit.model(protdata=proteinsCPTAC[1:10], response="value", fixed="conc", random=c("Sequence","instrlab"), add.intercept=TRUE, shrinkage.fixed=NULL, weights="Huber", k = 1.345, tolPwrss = 1e-10, verbose=FALSE)
#Fitting models for the first 10 proteins in the protdata object proteinsCPTAC using an ordinary least squares approach (i.e. no shrinkage and no M estimation):
modelLmCPTAC <- fit.model(protdata=proteinsCPTAC[1:10], response="value", fixed=c("conc","Sequence","instrlab"), random=NULL, add.intercept=TRUE, shrinkage.fixed=c(0,0,0), weights=NULL, k = 1.345, tolPwrss = 1e-10, verbose=FALSE)

statOmics/MSqRob documentation built on Dec. 8, 2022, 6 a.m.