Description Usage Arguments Value Note Author(s) References See Also Examples
Imputes missing values in a high-dimensional matrix composed of categorical variables using k Nearest Neighbors.
1 2 3 |
data |
a numeric matrix consisting of integers between 1 and n.cat,
where n.cat is maximum number of levels the categorical variables
can take. If Each row of |
mat.na |
a numeric matrix containing missing values. Must have the same number of
columns as |
fac |
a numeric or character vector of length |
fac.na |
a numeric or character vector of length |
nn |
an integer specifying k, i.e.\ the number of nearest neighbors, used to impute the missing values. |
distance |
character string naming the distance measure used in k Nearest Neighbors.
Must be either |
n.num |
an integer giving the number of rows of |
use.weights |
should weighted k nearest neighbors be used to impute the missing values?
If |
verbose |
should more information about the progress of the imputation be printed? |
If mat.na = NULL
, then a matrix of the same size as data
in which the missing
values have been replaced. If mat.na
has been specified, then a matrix of the same size as
mat.na
in which the missing values have been replaced.
While in knncatimpute
all variable/rows are considered when replacing
missing values, knncatimputeLarge
only considers the rows with no missing values
when searching for the k nearest neighbors.
Holger Schwender, holger.schwender@udo.edu
Schwender, H. and Ickstadt, K.\ (2008). Imputing Missing Genotypes with k Nearest Neighbors. Technical Report, SFB 475, Department of Statistics, University of Dortmund. Appears soon.
knncatimpute
, gknn
, smc
, pcc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ## Not run:
# Generate a data set consisting of 100 columns and 2000 rows (actually,
# knncatimputeLarge is made for much larger data sets), where the values
# are randomly drawn from the integers 1, 2, and 3.
# Afterwards, remove 200 of the observations randomly.
mat <- matrix(sample(3, 200000, TRUE), 2000)
mat[sample(200000, 20)] <- NA
# Apply knncatimputeLarge to mat to remove the missing values.
mat2 <- knncatimputeLarge(mat)
sum(is.na(mat))
sum(is.na(mat2))
# Now assume that the first 100 rows belong to SNPs from chromosome 1,
# the second 100 rows to SNPs from chromosome 2, and so on.
chromosome <- rep(1:20, e = 100)
# Apply knncatimputeLarge to mat chromosomewise, i.e. only consider
# the SNPs that belong to the same chromosome when replacing missing
# genotypes.
mat4 <- knncatimputeLarge(mat, fac = chromosome)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.