msImpute: Imputation of peptide log-intensity in mass spectrometry...

View source: R/msImpute.R

msImputeR Documentation

Imputation of peptide log-intensity in mass spectrometry label-free proteomics by low-rank approximation

Description

Returns a completed matrix of peptide log-intensity where missing values (NAs) are imputated by low-rank approximation of the input matrix. Non-NA entries remain unmodified. msImpute requires at least 4 non-missing measurements per peptide across all samples. It is assumed that peptide intensities (DDA), or MS1/MS2 normalised peak areas (DIA), are log2-transformed and normalised (e.g. by quantile normalisation).

Usage

msImpute(
  y,
  method = c("v2-mnar", "v2", "v1"),
  group = NULL,
  a = 0.2,
  rank.max = NULL,
  lambda = NULL,
  thresh = 1e-05,
  maxit = 100,
  trace.it = FALSE,
  warm.start = NULL,
  final.svd = TRUE,
  biScale_maxit = 20,
  gauss_width = 0.3,
  gauss_shift = 1.8
)

Arguments

y

Numeric matrix giving log-intensity where missing values are denoted by NA. Rows are peptides, columns are samples.

method

character. Allowed values are "v2" for msImputev2 imputation (enhanced version) for MAR. method="v2-mnar" (modified low-rank approx for MNAR), and "v1" initial release of msImpute

group

character or factor vector of length ncol(y)

a

numeric. the weight parameter. default to 0.2. Weights the MAR-imputed distribution in the imputation scheme.

rank.max

Numeric. This restricts the rank of the solution. is set to min(dim(y)-1) by default in "v1".

lambda

Numeric. Nuclear-norm regularization parameter. Controls the low-rank property of the solution to the matrix completion problem. By default, it is determined at the scaling step. If set to zero the algorithm reverts to "hardImputation", where the convergence will be slower. Applicable to "v1" only.

thresh

Numeric. Convergence threshold. Set to 1e-05, by default. Applicable to "v1" only.

maxit

Numeric. Maximum number of iterations of the algorithm before the algorithm is converged. 100 by default. Applicable to "v1" only.

trace.it

Logical. Prints traces of progress of the algorithm. Applicable to "v1" only.

warm.start

List. A SVD object can be used to initialize the algorithm instead of random initialization. Applicable to "v1" only.

final.svd

Logical. Shall final SVD object be saved? The solutions to the matrix completion problems are computed from U, D and V components of final SVD. Applicable to "v1" only.

biScale_maxit

number of iteration for the scaling algorithm to converge . See scaleData. You may need to change this parameter only if you're running method=v1. Applicable to "v1" only.

gauss_width

numeric. The width parameter of the Gaussian distribution to impute the MNAR peptides (features). This the width parameter in the down-shift imputation method.

gauss_shift

numeric. The shift parameter of the Gaussian distribution to impute the MNAR peptides (features). This the width parameter in the down-shift imputation method.

Details

msImpute operates on the softImpute-als algorithm in softImpute package. The algorithm estimates a low-rank matrix ( a smaller matrix than the input matrix) that approximates the data with a reasonable accuracy. SoftImpute-als determines the optimal rank of the matrix through the lambda parameter, which it learns from the data. This algorithm is implemented in method="v1". In v2 we have used a information theoretic approach to estimate the optimal rank, instead of relying on softImpute-als defaults. Similarly, we have implemented a new approach to estimate lambda from the data. Low-rank approximation is a linear reconstruction of the data, and is only appropriate for imputation of MAR data. In order to make the algorithm applicable to MNAR data, we have implemented method="v2-mnar" which imputes the missing observations as weighted sum of values imputed by msImpute v2 (method="v2") and random draws from a Gaussian distribution. Missing values that tend to be missing completely in one or more experimental groups will be weighted more (shrunken) towards imputation by sampling from a Gaussian parameterised by smallest observed values in the sample (similar to minProb, or Perseus). However, if the missing value distribution is even across the samples for a peptide, the imputed values for that peptide are shrunken towards low-rank imputed values. The judgment of distribution of missing values is based on the EBM metric implemented in selectFeatures, which is also a information theory measure.

Value

Missing values are imputed by low-rank approximation of the input matrix. If input is a numeric matrix, a numeric matrix of identical dimensions is returned.

Author(s)

Soroor Hediyeh-zadeh

References

Hastie, T., Mazumder, R., Lee, J. D., & Zadeh, R. (2015). Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research, 16(1), 3367-3402.

Hediyeh-zadeh, S., Webb, A. I., & Davis, M. J. (2020). MSImpute: Imputation of label-free mass spectrometry peptides by low-rank approximation. bioRxiv.

See Also

selectFeatures

Examples

data(pxd010943)
y <- log2(data.matrix(pxd010943))
group <- gsub("_[1234]","", colnames(y))
yimp <- msImpute(y, method="v2-mnar", group=group)

DavisLaboratory/msImpute documentation built on Jan. 5, 2024, 3:50 a.m.