Description Details Methods Author(s) References Examples

The `impute`

method performs data imputation on an
`MSnSet`

instance using a variety of methods (see below). The
imputation and the parameters are logged into the
`processingData(object)`

slot.

Users should proceed with care when imputing data and take precautions to assure that the imputation produce valid results, in particular with naive imputations such as replacing missing values with 0.

There are two types of mechanisms resulting in missing values in LC/MSMS experiments.

Missing values resulting from absence of detection of a feature, despite ions being present at detectable concentrations. For example in the case of ion suppression or as a result from the stochastic, data-dependent nature of the MS acquisition method. These missing value are expected to be randomly distributed in the data and are defined as missing at random (MAR) or missing completely at random (MCAR).

Biologically relevant missing values resulting from the absence of the low abundance of ions (below the limit of detection of the instrument). These missing values are not expected to be randomly distributed in the data and are defined as missing not at random (MNAR).

MNAR features should ideally be imputed with a left-censor method,
such as `QRILC`

below. Conversely, it is recommended to use host
deck methods such nearest neighbours, Bayesian missing value
imputation or maximum likelihood methods when values are missing at
random.

Currently, the following imputation methods are available:

- MLE
Maximum likelihood-based imputation method using the EM algorithm. Implemented in the

`norm::imp.norm`

function. See`imp.norm`

for details and additional parameters. Note that here,`...`

are passed to the`em.norm`

function, rather to the actual imputation function`imp.norm`

.- bpca
Bayesian missing value imputation are available, as implemented in the and

`pcaMethods::pca`

functions. See`pca`

for details and additional parameters.- knn
Nearest neighbour averaging, as implemented in the

`impute::impute.knn`

function. See`impute.knn`

for details and additional parameters.- QRILC
A missing data imputation method that performs the imputation of left-censored missing data using random draws from a truncated distribution with parameters estimated using quantile regression. Implemented in the

`imputeLCMD::impute.QRILC`

function. See`impute.QRILC`

for details and additional parameters.- MinDet
Performs the imputation of left-censored missing data using a deterministic minimal value approach. Considering a expression data with

*n*samples and*p*features, for each sample, the missing entries are replaced with a minimal value observed in that sample. The minimal value observed is estimated as being the q-th quantile (default`q = 0.01`

) of the observed values in that sample. Implemented in the`imputeLCMD::impute.MinDet`

function. See`impute.MinDet`

for details and additional parameters.- MinProb
Performs the imputation of left-censored missing data by random draws from a Gaussian distribution centred to a minimal value. Considering an expression data matrix with

*n*samples and*p*features, for each sample, the mean value of the Gaussian distribution is set to a minimal observed value in that sample. The minimal value observed is estimated as being the q-th quantile (default`q = 0.01`

) of the observed values in that sample. The standard deviation is estimated as the median of the feature standard deviations. Note that when estimating the standard deviation of the Gaussian distribution, only the peptides/proteins which present more than 50% recorded values are considered. Implemented in the`imputeLCMD::impute.MinProb`

function. See`impute.MinProb`

for details and additional parameters.- min
Replaces the missing values by the smallest non-missing value in the data.

- zero
Replaces the missing values by 0.

- mixed
A mixed imputation applying two methods (to be defined by the user as

`mar`

for values missing at random and`mnar`

for values missing not at random, see example) on two M[C]AR/MNAR subsets of the data (as defined by the user by a`randna`

logical, of length equal to`nrow(object)`

).- nbavg
Average neighbour imputation for fractions collected along a fractionation/separation gradient, such as sub-cellular fractions. The method assumes that the fraction are ordered along the gradient and is invalid otherwise.

Continuous sets

`NA`

value at the beginning and the end of the quantitation vectors are set to the lowest observed value in the data or to a user defined value passed as argument`k`

. Them, when a missing value is flanked by two non-missing neighbouring values, it is imputed by the mean of its direct neighbours. A stretch of 2 or more missing values will not be imputed. See the example below.- none
No imputation is performed and the missing values are left untouched. Implemented in case one wants to only impute value missing at random or not at random with the

`mixed`

method.

The `naset`

`MSnSet`

is an real quantitative
data where quantitative values have been replaced by `NA`

s. See
`script/naset.R`

for details.

`signature(object = "MSnSet", method, ...)`

This method performs data imputation on the

`object`

`MSnSet`

instance using the`method`

algorithm.`...`

is used to pass parameters to the imputation function. See the respective methods for details and additional parameters.

Laurent Gatto and Samuel Wieczorek

Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays Bioinformatics (2001) 17 (6): 520-525.

Oba et al., A Bayesian missing value estimation method for gene expression profile data, Bioinformatics (2003) 19 (16): 2088-2096.

Cosmin Lazar (2015). imputeLCMD: A collection of methods for left-censored missing data imputation. R package version 2.0. http://CRAN.R-project.org/package=imputeLCMD.

Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. PubMed PMID: 26906401.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ```
data(naset)
## table of missing values along the rows
table(fData(naset)$nNA)
## table of missing values along the columns
pData(naset)$nNA
## non-random missing values
notna <- which(!fData(naset)$randna)
length(notna)
notna
impute(naset, method = "min")
if (require("imputeLCMD")) {
impute(naset, method = "QRILC")
impute(naset, method = "MinDet")
}
if (require("norm"))
impute(naset, method = "MLE")
impute(naset, "mixed",
randna = fData(naset)$randna,
mar = "knn", mnar = "QRILC")
## neighbour averaging
x <- naset[1:4, 1:6]
exprs(x)[1, 1] <- NA ## min value
exprs(x)[2, 3] <- NA ## average
exprs(x)[3, 1:2] <- NA ## min value and average
## 4th row: no imputation
exprs(x)
exprs(impute(x, "nbavg"))
``` |

```
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: mzR
Loading required package: Rcpp
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: ProtGenerics
Attaching package: ‘ProtGenerics’
The following object is masked from ‘package:stats’:
smooth
This is MSnbase version 2.16.0
Visit https://lgatto.github.io/MSnbase/ to get started.
Attaching package: ‘MSnbase’
The following object is masked from ‘package:base’:
trimws
0 1 2 3 4 8 9 10
301 247 91 13 2 23 10 2
[1] 34 45 56 39 47 52 49 61 41 42 55 45 51 43 57 53
[1] 35
[1] 6 20 79 88 130 187 227 231 238 264 275 317 324 363 373 382 409 437 445
[20] 453 456 474 484 485 492 514 516 546 568 580 594 631 648 664 671
MSnSet (storageMode: lockedEnvironment)
assayData: 689 features, 16 samples
element names: exprs
protocolData: none
phenoData
sampleNames: M1F1A M1F4A ... M2F11B (16 total)
varLabels: nNA
varMetadata: labelDescription
featureData
featureNames: AT1G09210 AT1G21750 ... AT4G39080 (689 total)
fvarLabels: nNA randna
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
- - - Processing information - - -
Data imputation using min Mon Feb 15 13:06:15 2021
MSnbase version: 1.15.6
Loading required package: imputeLCMD
Loading required package: tmvtnorm
Loading required package: mvtnorm
Loading required package: Matrix
Attaching package: ‘Matrix’
The following object is masked from ‘package:S4Vectors’:
expand
Loading required package: gmm
Loading required package: sandwich
Loading required package: norm
Loading required package: pcaMethods
Attaching package: ‘pcaMethods’
The following object is masked from ‘package:stats’:
loadings
Loading required package: impute
MSnSet (storageMode: lockedEnvironment)
assayData: 689 features, 16 samples
element names: exprs
protocolData: none
phenoData
sampleNames: M1F1A M1F4A ... M2F11B (16 total)
varLabels: nNA
varMetadata: labelDescription
featureData
featureNames: AT1G09210 AT1G21750 ... AT4G39080 (689 total)
fvarLabels: nNA randna
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
- - - Processing information - - -
Data imputation using MinDet Mon Feb 15 13:06:17 2021
Using default parameters
MSnbase version: 1.15.6
Iterations of EM:
1...2...3...4...5...6...7...8...9...10...11...
MSnSet (storageMode: lockedEnvironment)
assayData: 689 features, 16 samples
element names: exprs
protocolData: none
phenoData
sampleNames: M1F1A M1F4A ... M2F11B (16 total)
varLabels: nNA
varMetadata: labelDescription
featureData
featureNames: AT1G09210 AT1G21750 ... AT4G39080 (689 total)
fvarLabels: nNA randna
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
- - - Processing information - - -
Data imputation using MLE Mon Feb 15 13:06:17 2021
Using default parameters
MSnbase version: 1.15.6
MSnSet (storageMode: lockedEnvironment)
assayData: 689 features, 16 samples
element names: exprs
protocolData: none
phenoData
sampleNames: M1F1A M1F4A ... M2F11B (16 total)
varLabels: nNA
varMetadata: labelDescription
featureData
featureNames: AT1G09210 AT1G21750 ... AT4G39080 (689 total)
fvarLabels: nNA randna
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
- - - Processing information - - -
Data imputation using mixed Mon Feb 15 13:06:18 2021
Using default parameters
MSnbase version: 1.15.6
M1F1A M1F4A M1F7A M1F11A M1F2B M1F5B
AT1G09210 NA 0.275500 0.21600 0.18525 0.465667 0.199667
AT1G21750 0.332000 0.279667 NA 0.16600 0.451500 0.200375
AT1G51760 NA NA 0.16825 0.18825 0.459750 0.214500
AT1G56340 0.336733 NA NA NA 0.487167 0.201833
Assuming values are ordered.
M1F1A M1F4A M1F7A M1F11A M1F2B M1F5B
AT1G09210 0.166000 0.275500 0.2160000 0.18525 0.465667 0.199667
AT1G21750 0.332000 0.279667 0.2228335 0.16600 0.451500 0.200375
AT1G51760 0.166000 0.167125 0.1682500 0.18825 0.459750 0.214500
AT1G56340 0.336733 NA NA NA 0.487167 0.201833
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.