We will use the package mice at this stage of the analysis. The dataframe or matrix have to be at least 2 variables.

library(RepDataPeerAssessment1)
library(lattice)
library(mice)

df  <- activity[, c("steps", "interval")]
# length(steps)
res.mice <- mice(df, method = "pmm")

We show here a dataframe with all replaced or imputed values data frames. They are five by default.

res.mice$imp$steps

Now, to show all the values, observed and imputed, we use the mice function complete(). It will show the first of the five dataframes of the multiple imputation. For the second dataframe just add an index at the end:

mice.imp.1 <- mice::complete(res.mice, 1)
head(mice.imp.1)
summary(mice.imp.1$steps)

This is for the 2nd imputation data frame.

mice.imp.2 <- mice::complete(res.mice, 2)
head(mice.imp.2)
summary(mice.imp.2$steps)

Plots for the five imputations. The first is the original data frame, with no imputations at all.

stripplot(res.mice, pch = 20, cex = 1.2)
xyplot(res.mice, steps ~ interval | .imp, pch = 1, cex = 0.5, alpha = 0.5)

3. Create a dataset with missing values filled in

Create a new dataset that is equal to the original dataset but with the missing data filled in.

dim(mice.imp.1)
summary(mice.imp.1)

Building a custom function for ImputeTestbench

# A sample function to randomly impute the missing data
library(imputeTS)

sss <- function(In){
  out <- na.random(In)
  out <- as.numeric(out)
  return(out)
}
pmm <- function(In) {
  library(mice)
  library(RepDataPeerAssessment1)

  df  <- activity[, c("steps", "interval")]
  res.mice <- mice(df, method = "pmm")
  mice.imp.1 <- mice::complete(res.mice, 1)
  out <- mice.imp.1$steps
  out <- as.numeric(out)
  return(out)
}
pmm(activity$steps)
ex <- impute_errors(dataIn = aus, methodPath = '../../R/imputation.R',
methods = c('na.mean', 'na.locf', 'na.approx', 'sss'))

ex
source("../../R/imputation.R")
run.mice.pmm()
library(imputeTestbench)

imp <- impute_errors(dataIn = activity$steps, 
                     methodPath = '../../R/imputation.R',
                     methods = c('na.mean', 'pmm'))

imp
plot_errors(imp)


AlfonsoRReyes/RepDataPeerAssessment1 documentation built on May 5, 2019, 4:53 a.m.