imputation: Quantitative mass spectrometry data imputation

Description Usage Arguments Details Author(s) References Examples

Description

The impute_matrix function performs data imputation on matrix objects instance using a variety of methods (see below).

Users should proceed with care when imputing data and take precautions to assure that the imputation produce valid results, in particular with naive imputations such as replacing missing values with 0.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
impute_matrix(x, method, ...)

imputeMethods()

impute_neighbour_average(x, k = min(x, na.rm = TRUE))

impute_knn(x, ...)

impute_mle(x, ...)

impute_bpca(x, ...)

impute_mixed(x, randna, mar, mnar, ...)

impute_min(x)

Arguments

x

A matrix with missing values to be imputed.

method

character(1) defining the imputation method. See imputeMethods() for available ones.

...

Additional parameters passed to the inner imputation function.

k

numeric(1) providing the imputation value used for the first and last samples if they contain an NA. The default is to use the smallest value in the data.

randna

logical of length equal to nrow(object) defining which rows are missing at random. The other ones are considered missing not at random. Only relevant when methods is mixed.

mar

Imputation method for values missing at random. See method above.

mnar

Imputation method for values missing not at random. See method above.

Details

There are two types of mechanisms resulting in missing values in LC/MSMS experiments.

MNAR features should ideally be imputed with a left-censor method, such as QRILC below. Conversely, it is recommended to use host deck methods such nearest neighbours, Bayesian missing value imputation or maximum likelihood methods when values are missing at random.

Currently, the following imputation methods are available.

The imputeMethods() function returns a vector with valid imputation method arguments.

Author(s)

Laurent Gatto

References

Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays Bioinformatics (2001) 17 (6): 520-525.

Oba et al., A Bayesian missing value estimation method for gene expression profile data, Bioinformatics (2003) 19 (16): 2088-2096.

Cosmin Lazar (2015). imputeLCMD: A collection of methods for left-censored missing data imputation. R package version 2.0. http://CRAN.R-project.org/package=imputeLCMD.

Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. PubMed PMID:26906401.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
## test data
set.seed(42)
m <- matrix(rlnorm(60), 10)
dimnames(m) <- list(letters[1:10], LETTERS[1:6])
m[sample(60, 10)] <- NA

## available methods
imputeMethods()

impute_matrix(m, method = "zero")

impute_matrix(m, method = "min")

impute_matrix(m, method = "knn")

## same as impute_zero
impute_matrix(m, method = "with", val = 0)

## impute with half of the smalles value
impute_matrix(m, method = "with",
              val = min(m, na.rm = TRUE) * 0.5)

## all but third and fourth features' missing values
## are the result of random missing values
randna <- rep(TRUE, 10)
randna[c(3, 9)] <- FALSE

impute_matrix(m, method = "mixed",
              randna = randna,
              mar = "knn",
              mnar = "min")

MsCoreUtils documentation built on Nov. 8, 2020, 10:59 p.m.