Home

/

CRAN

/

knnGarden

/

dataFiller: Missing Observations Filling Function

dataFiller: Missing Observations Filling Function
In knnGarden: Multi-distance based k-Nearest Neighbors

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/dataFiller.R

fill in the missing observations in a dataset by exploring similarities between cases

1	dataFiller(data, NAstring = NA)

`data`	a dataset that contains missing observations in some cases
`NAstring`	a character or string that denotes missing values in the input dataset

fill the cases with missing observations by finding the median of 10 most similar cases with the current one. Of course, the missing in the same column of the 10 cases will be removed when calculating the median. The criterion we define "similar" is based on euclidian distance between standardized cases

A complete data set with missing observations filled will be returned.

The cases with missing values in the input dataset will be printed on the screen instead of being returned. The return will be only the complete data set with missing observations filled.

Boxian Wei(The ideas are inspired by Luis Torgo, and thanks)

Luis Torgo (2003) Data Mining with R:learning by case studies. LIACC-FEP, University of Porto

knnMCN, knnVCN

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## Define Data 
library(knnGarden)
data(iris)
v1=c(iris[1:4,3],NA,iris[6:10,3])
v2=iris[101:110,4]
v3=iris[101:110,1]
v4=c(iris[11:18,3],NA,iris[20,3])
data1=data.frame(v1,v2,v3,v4)

## Call Function
data2=dataFiller(data1)

## The function is currently defined as
function (data, NAstring = NA) 
{
    central.value <- function(x) {
        if (is.numeric(x)) 
            median(x, na.rm = T)
        else if (is.factor(x)) 
            levels(x)[which.max(table(x))]
        else {
            f <- as.factor(x)
            levels(f)[which.max(table(f))]
        }
    }
    dist.mtx <- as.matrix(daisy(data, stand = T))
    ShowMissing = NULL
    ShowMissing = data[which(!complete.cases(data)), ]
    for (r in which(!complete.cases(data))) data[r, which(is.na(data[r, 
        ]))] <- apply(data.frame(data[c(as.integer(names(sort(dist.mtx[r, 
        ])[2:11]))), which(is.na(data[r, ]))]), 2, central.value)
    cat("the missing case(s) in the orignal dataset ", "\n\n")
    print(ShowMissing)
    cat("\n\n")
    return(data)
  }

knnGarden documentation built on May 2, 2019, 11:02 a.m.

knnGarden index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

knnGarden
Multi-distance based k-Nearest Neighbors

dataFiller: Missing Observations Filling Function
In knnGarden: Multi-distance based k-Nearest Neighbors

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to dataFiller in knnGarden...

R Package Documentation

Browse R Packages

We want your feedback!

knnGarden Multi-distance based k-Nearest Neighbors

dataFiller: Missing Observations Filling Function In knnGarden: Multi-distance based k-Nearest Neighbors

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to dataFiller in knnGarden...

R Package Documentation

Browse R Packages

We want your feedback!

knnGarden
Multi-distance based k-Nearest Neighbors

dataFiller: Missing Observations Filling Function
In knnGarden: Multi-distance based k-Nearest Neighbors