wNNSel: Imputatin using wNNSel method.

Description Usage Arguments Details Value References See Also Examples

Description

'wNNSel' is used to impute the missing values particularly in high dimensional data. It uses a cross validation procedure for selecting the best values of the tuning parameters. It also works when the samples are smaller than the covariates.

Usage

1
2
3
4
5
wNNSel(x, x.initial = NULL, x.true = NULL, k, useAll = TRUE,
  x.dist = "euclidean", kernel = "gaussian", method = "2", impute.fn,
  convex = TRUE, m.values = seq(2, 8, by = 2), c.values = seq(0.1, 0.5, by
  = 0.1), lambda.values = seq(0, 0.6, by = 0.01)[-1], times.max = 5,
  testNA.prop = 0.05, withinFolds = FALSE, folds, verbose = TRUE)

Arguments

x

a numeric data matrix containing missing values

x.initial

an optional. A complete data matrix e.g. using mean imputation of x. If provided, it will be used for the computation of correlations.

x.true

a matrix of true or complete data. If provided, MSIE will be returned in the results list.

k

an optional, the number of nearest neighbors to use for imputation.

useAll

logical. If TRUE, all available neighbors are used for the imputation.

x.dist

distance to compute. The default is x.dist="euclidean", that uses the Euclidean distance. Set x.dist to NULL for Manhattan distance.

kernel

kernel function to be used in nearest neighbors imputation. Default kernel function is "gaussian".

method

convex function, performs selection of variables. If method="1", linear function is used and the power function is used when method="2".

impute.fn

the imputation function to run on the length k vector of values for a missing feature. Defaults to a weighted mean of the neighboring values, weighted by the specified kernel. If not specified then wNN imputation will be used by default.

convex

logical. If TRUE, selected variables are used for the computation of distance. The default is TRUE.

m.values

a vector of integer values, required when mehtod="2".

c.values

a vector between 0 and less than 1. It is required when mehtod="1".

lambda.values

a vector, for the tuning parameter λ

times.max

maximum number of repititions for the cross validation procedure.

testNA.prop

proportion of values to be deleted artificially for cross validation in the missing matrix x. Default method uses 5 percent.

withinFolds

logical. Use only if the neighbors/rows belong to particular folds/groups. Default is set to FALSE.

folds

a list of vectors specifying folds/groups for neighbors. lenght of list is equal to the number of folds/groups. Each element/vector of the list indicates row indices belonging to that particular group/fold.

verbose

logical. If TRUE, prints status updates

Details

For each sample, identify missinng features. For each missing feature find the nearest neighbors which have that feature. Impute the missing value using the imputation function on the selected vector of values found from the neighbors. By default the wNNSel method automatically searches for optimal values for a given data matrix.

The default method uses x.dist="euclidean" including selected covariates. The specific distancs are computed using important covariates only. If mehtod="1", the linear function in absolute value of r is used, defined by

\frac{|r|}{1-c} - \frac{c}{1-c},

for |r|>c, and, 0 , otherwise. By default, the power function |r|^m is used when mehtod="2". For more detailed discussion, see references.

Value

a list containing imputed data matrix, and cross validation results

x.impute

imputed data matrix

MSIE

True error. Note it is only available when x.true is provided.

lambda.opt

optimal parameter selected by cross validation

m.opt

optimal parameter selected by cross validation

MSIE.cv

cross validation error

References

Tutz, G. and Ramzan,S. (2015). Improved methods for the imputation of missing data by nearest neighbor methods. Computational Statistics and Data Analysis, Vol. 90, pp. 84-99.

Faisal, S. and Tutz, G. (2017). Missing value imputation for gene expression data by tailored nearest neighbors. Statistical Application in Genetics and Molecular Biology. Vol. 16(2), pp. 95-106.

See Also

cv.wNNSel, wNNSel.impute

Examples

1
2
3
4
5
6
7
8
9
 set.seed(3)
 x.true = matrix(rnorm(100),10,10)
 ## create 10% missing values in x
 x.miss = artifNA(x.true, 0.10)
 ## imputed matrix
 result <- wNNSel(x.miss)
 result$x.impute
 ## cross validation result can be accessed using
 result$cross.val

Example output

[1] "Cross validation in process..."
[1] "Cross validation complete"
             [,1]       [,2]        [,3]        [,4]       [,5]       [,6]
 [1,] -0.96193342 -0.7447816 -0.57848372  0.90062473  0.7865069  0.7268389
 [2,] -0.29252572 -1.1312186 -0.94230073  0.85177045 -0.3104631 -0.8094409
 [3,] -0.32474639 -0.7163585 -0.20372818  0.72771517  1.6988848  0.2670851
 [4,] -1.15213189  0.2526524 -1.66647484  0.02576859 -0.7945937 -1.7372637
 [5,]  0.19578283  0.1520457 -0.48445511 -0.35212962  0.3484377 -1.4114251
 [6,]  0.03012394 -0.3076564 -0.74107266  0.70551551 -2.2654011 -0.4535512
 [7,]  0.08541773 -0.9530173  1.16061578  0.61171985 -0.1622053 -1.0354913
 [8,]  1.11661021 -0.6482428  1.01206712  0.03825201 -0.2672170  1.3621429
 [9,] -1.21885742  1.2243136 -0.07207847 -0.97928377 -0.4555460  0.9174567
[10,]  1.26736872  0.1998116 -1.13678230  0.79376123 -0.8991663 -0.7851422
            [,7]       [,8]       [,9]       [,10]
 [1,]  0.5735182 -0.0313255  1.7355352 -0.85381845
 [2,]  0.9181962  0.4670973  0.7308506 -0.98999433
 [3,]  0.2562873  1.0241977  0.6886400 -0.65087774
 [4,]  0.4180793  0.1770846  1.2244061  1.05394666
 [5,]  1.1743374  0.2318261  0.7942963 -0.39087803
 [6,] -0.4808464  0.7475925  0.4119010 -0.07058639
 [7,] -0.4188297  1.2170685  0.2191506 -0.46205081
 [8,]  0.9551128  0.5300453 -0.8864638  0.54090827
 [9,]  0.5428348 -0.9880528  0.4397603  0.93163497
[10,]  0.1861974 -0.1568529 -0.8863898 -0.20927435
lambda.opt      m.opt    MSIE.cv 
 0.1800000  4.0000000  0.8685304 

wNNSel documentation built on May 2, 2019, 2:49 p.m.