koptimp: Imputation of Low Values

Description Usage Arguments Value Note Author(s) References Examples

Description

Wrapper function to select an optimal number of neighbours (k) in impute.knn from the IMPUTE package. For several values of k, predictions made on random data points by impute.knn are compared to their original value to calculate the root mean squared error. In the original matrix, thres corresponds to the limit under which intensities are considered missing. perc represents the percentage of "non missing" intensities randomly selected to estimate RMSE. The optimal number koptim corresponds to number of k that improves RMSE by less than 10%. This value is automatically used for computing the resulting matrix x matrix.

Usage

1
  koptimp(x,thres=1,log.t=TRUE,lk=3:10,perc=0.1,niter=10,...)

Arguments

x

A data frame or matrix to be imputed.

thres

Threshold below which intensities in x are considered missing.

log.t

A logical which specifies whether or not the log transformation is performed on the data set before imputation.

lk

A vector of numbers of neighbours to be tested.

perc

Percentage of non-low value to be randomly selected.

niter

Number of iteration.

...

Arguments passed to or from other methods.

Value

A list containing the following components:

x

An imputed data matrix using k=koptim.

koptim

Optimal number of neighbors found in lk.

rmse

Root mean squared error matrix (niter by length of lk).

Note

Version of package impute must be 1.8.0 or greater. At the moment of the package writing, only the package available on the Bioconductor website seemed to be regularly updated

Author(s)

David Enot [email protected]

References

Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P. and Botstein, D.(1999). Imputing Missing Data for Gene Expression Arrays, Stanford University Statistics Department Technical report. http://www-stat.stanford.edu/~hastie/Papers/missing.pdf

Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, (2001). Missing value estimation methods for DNA microarrays. Bioinformatics. Vol. 17, no. 6, Pages 520-525.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
  ## load data
  data(abr1)
  mat <- abr1$pos[,110:300]

 ## find an optimal number of k between 3 and 6 to impute values lower than 1
 ## 10 perc. of intensities >1 are used to evaluate each solution
 ## imputation is done with the log transformed matrix
  res <- koptimp(mat,thres=1,log.t=TRUE,lk=3:6,perc=0.1,niter=5)
  names(res)
  
  ## check RMSE of the solutions at various k
  boxplot(res$rmse,xlab="Number of neighbours",ylab="Root mean square error")

  ## Do the imputation with a given k
  ## thres=1 and log.t=TRUE
  mat[mat <= 1] <- NA ; mat <- log(mat) 
  ## uses k=6 for example
  mimp <- t(impute.knn(t(mat), k = 6, 1, 1, maxp = ncol(mat))$data) 
  ## transform to the original space
  mimp <- exp(mimp)

wilsontom/FIEmspro documentation built on Feb. 19, 2018, 9:03 a.m.