Home

/

CRAN

/

analytics

/

na.cleaner: Missing Value Imputation

na.cleaner: Missing Value Imputation
In analytics: Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Missing value imputation based on different methods. Can handle continuous and categorical variables.

1 2	na.cleaner(dataset, t1 = 0.5, t2 = 0.5, auto = TRUE, maxDel1 = 0.2, maxDel2 = 0.3, Mode = "mean", neigh = 3:7)

`dataset`	a matrix or data frame. May have continuous and/or categorical variables.
`t1`	the threshold value in interval 0-1 beyond which a record is deemed as having a high % of NAs. Default: 0.5.
`t2`	the threshold value in interval 0-1 beyond which a variable is deemed as having a high % of NAs. Default: 0.5.
`auto`	If TRUE (the default), it will eliminate those records and/or variables deemed as having a high % of NAs. If FALSE, one handpicks which records/variables will be deleted.
`maxDel1`	the proportion in interval 0-1 of records that can at most be deleted. Default: 0.2.
`maxDel2`	the proportion in interval 0-1 of variables that can at most be deleted. Default: 0.3.
`Mode`	a string specifying the imputation method to be used, among "mean" (default), "median", "mean&lm", "median&lm", "knn".
`neigh`	the neighbours to be used in knn, both for continuous and categorical variables. Default: interval 3-7. For each value in neigh, knn is run, and then in the case of continuous variables, the outcome of those runs are averaged out. In the case of categorical variables, the imputed value is the most common imputed value across runs.

Each of the available methods in this function may be the best choice for a particular dataset, but since it is impossible to know which one it is in each particular case, Mode "all" might be a good, robust choice. For categorical variables, the only mode implemented is knn, so argument Mode really refers only to the continuous variables.

the original dataset with imputed missing values.

Albert Dorador

kNN rowmean

mtcars_mod <- mtcars
set.seed(1)
mtcars_mod <- as.data.frame(lapply(mtcars_mod, function(cc) cc[ sample(c(TRUE, NA),
prob = c(0.6, 0.4), size = length(cc), replace = TRUE) ]))
rownames(mtcars_mod) <- rownames(mtcars)

# Compare methods
kNN_dt <- na.cleaner(dataset = mtcars_mod, Mode = "kNN")
mean_lm_dt <- na.cleaner(dataset = mtcars_mod, Mode = "mean&lm")
median_dt <- na.cleaner(dataset = mtcars_mod, Mode = "median")
all_dt <- na.cleaner(dataset = mtcars_mod, Mode = "all")
dev_kNN <- norm(as.matrix(mtcars[-c(4,6,8,13,18,20), -6])-as.matrix(kNN_dt))
dev_m_ml <- norm(as.matrix(mtcars[-c(4,6,8,13,18,20), -6])-as.matrix(mean_lm_dt))
dev_md <- norm(as.matrix(mtcars[-c(4,6,8,13,18,20), -6])-as.matrix(median_dt))
dev_all <- norm(as.matrix(mtcars[-c(4,6,8,13,18,20), -6])-as.matrix(all_dt))

iris_mod <- iris
set.seed(5)
iris_mod <- as.data.frame(lapply(iris_mod, function(cc) cc[ sample(c(TRUE, NA),
prob = c(0.6, 0.4), size = length(cc), replace = TRUE) ]))
rownames(iris_mod) <- rownames(iris)
na.cleaner(dataset = iris_mod, neigh = 1, Mode = "all")

analytics documentation built on May 2, 2019, 3:37 p.m.

analytics index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

analytics
Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis

na.cleaner: Missing Value Imputation
In analytics: Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to na.cleaner in analytics...

R Package Documentation

Browse R Packages

We want your feedback!

analytics Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis

na.cleaner: Missing Value Imputation In analytics: Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to na.cleaner in analytics...

R Package Documentation

Browse R Packages

We want your feedback!

analytics
Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis

na.cleaner: Missing Value Imputation
In analytics: Regression Outlier Detection, Stationary Bootstrap, Testing Weak Stationarity, NA Imputation, and Other Tools for Data Analysis