Description Usage Arguments Value Note Author(s) References Examples
Wrapper function to select an optimal number of neighbours (k
) in impute.knn
from the IMPUTE package. For several values of k
, predictions made on
random data points by impute.knn
are compared to their original value to
calculate the root mean squared error. In the original matrix, thres
corresponds to the limit under which intensities are considered missing. perc
represents the percentage of "non missing" intensities randomly selected to
estimate RMSE. The optimal number koptim
corresponds to number of k
that improves RMSE by less than 10%. This value is automatically used for
computing the resulting matrix x
matrix.
1 |
x |
A data frame or matrix to be imputed. |
thres |
Threshold below which intensities in |
log.t |
A logical which specifies whether or not the log transformation is performed on the data set before imputation. |
lk |
A vector of numbers of neighbours to be tested. |
perc |
Percentage of non-low value to be randomly selected. |
niter |
Number of iteration. |
... |
Arguments passed to or from other methods. |
A list containing the following components:
x |
An imputed data matrix using |
koptim |
Optimal number of neighbors found in |
rmse |
Root mean squared error matrix ( |
Version of package impute
must be 1.8.0 or greater. At the moment of the package writing, only the package available on the Bioconductor website seemed to be regularly updated
David Enot dle@aber.ac.uk
Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P. and Botstein, D.(1999). Imputing Missing Data for Gene Expression Arrays, Stanford University Statistics Department Technical report. http://www-stat.stanford.edu/~hastie/Papers/missing.pdf
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, (2001). Missing value estimation methods for DNA microarrays. Bioinformatics. Vol. 17, no. 6, Pages 520-525.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## load data
data(abr1)
mat <- abr1$pos[,110:300]
## find an optimal number of k between 3 and 6 to impute values lower than 1
## 10 perc. of intensities >1 are used to evaluate each solution
## imputation is done with the log transformed matrix
res <- koptimp(mat,thres=1,log.t=TRUE,lk=3:6,perc=0.1,niter=5)
names(res)
## check RMSE of the solutions at various k
boxplot(res$rmse,xlab="Number of neighbours",ylab="Root mean square error")
## Do the imputation with a given k
## thres=1 and log.t=TRUE
mat[mat <= 1] <- NA ; mat <- log(mat)
## uses k=6 for example
mimp <- t(impute.knn(t(mat), k = 6, 1, 1, maxp = ncol(mat))$data)
## transform to the original space
mimp <- exp(mimp)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.