MissingImputation: Missing Imputation

Description Usage Arguments Value Examples

View source: R/MissingImputation.r

Description

This function completes regression-based missing imputation. The function accepts a data frame containing variables with missing values. The data frame can contain variables with no missing values if you would not like to include them in replacing the values with missing variables. However the data frame must have at least one column with at least one missing value. The function will look for strictly positive values and percentage values so those same patterns hold with replaced values. If you want to override this you need to include at least one observation where that logic is broken. In addition the function can also use a percentage of values to build the predictive models using sample_frac option. This is to speed up the missing imputations however not all fractions are faster than just using all observations. I would recomend using a value less than .5.

Usage

1
2
MissingImputation(missing_df, num_iter = 10, progress = F,
  sample_frac = 1)

Arguments

missing_df

A data frame containing at least some columns with missing values

num_iter

Number of iterations to perform

progress

A logical indicator to print the number of completed interactions

sample_frac

A number between 0 and 1 indicating the fraction of observations to use to build the predictive models. Default is 1 which will use all observations.

Value

A list containing

complete_obs a data frame with the missing values replaced through regression imputation

change The differences between each iteration for each variable

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
iris_df <- iris
set.seed(123456)
na1 <- runif(nrow(iris_df)) < .25
na2 <- runif(nrow(iris_df)) < .33
iris_df[["Species"]][na1] <- NA
iris_df[["Sepal.Length"]][na2] <- NA
iris_complete <- MissingImputation(iris_df, num_iter = 5)
difs <- data.frame(Species.Orig = as.character(iris[["Species"]][na1]), Species.Replace = iris_complete$complete_obs[["Species"]][na1], Sepal.Orig = iris[["Sepal.Length"]][na1], Sepal.Replace = iris_complete$complete_obs[["Sepal.Length"]][na1], stringsAsFactors = F)
## Not run: View(difs[[1]], difs[[2]])
## Not run: ggplot2::qplot(x = Sepal.Orig, y = Sepal.Replace, data = difs)

mattmills49/modeler documentation built on May 21, 2019, 1:25 p.m.