In AlfonsoRReyes/RepDataPeerAssessment1: Reproducible Research Assignment 1. Developed as a package

Source: https://github.com/neerajdhanraj/imputeTestbench/blob/master/Manuscript%20imputeTestbench.pdf

The default dataset if dataIn = NULL is nottem, a time series object of average air temperatures recorded at Nottingham Castle from 1920 to 1930. This dataset is included with the base datasets package.

library(imputeTestbench)
set.seed(123)

a <- impute_errors()
a

The errprof object is a list with seven elements. The ﬁrst element, Parameter, is a character string of the error metric used for comparing imputation methods. The second element, MissingPercent, is a numeric vector of the missing percentages that were evaluated in the input dataset. The remaining ﬁve elements show the average error for each imputation method at each interval of missing data in MissingPercent. The averages at each interval are based on the repetitions speciﬁed in the initial call to impute_errors() where the default is Repetition = 1 . Although the print method for the errprof object returns a list, the object stores the unique error estimates for every imputation method, repetition, and missing data interval. These values are used to estimate the averages in the printed output and to plot the distribution of errors with plot_errors() shown below.

plot_errors(a)

plot_errors(a, plotType = 'line')

plot_impute(showmiss = T)

Demonstration of imputeTestbench with examples

The austres dataset is a ts object of Australian population in thousands, measured quarterly from 1971 to 1994 (Brockwell and Davis, 1996). The plot_errors() function shows that all imputation methods had larger error values with additional missing observations, as expected, and that the na.mean imputation method had the largest error values. Differences between the error values can be understood by viewing a sample of the imputed data with plot_impute(). The example belows shows an example of imputed values using the na.approx, na.locf, and na.mean functions at 10% and 90% missing observations using MCAR sampling.

aus <- datasets::austres
aus
str(aus)
summary(aus)
ex <- impute_errors((dataIn = aus))
ex

plot_errors(ex, plotType = 'line')

plot_impute(aus)

plot_impute(aus, methods = c("na.mean", "na.locf", "na.approx"), missPercent = 90)

Sample

Source: http://www.neerajbokde.com/cran/imputetestbench

datax <- c(1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5)
length(datax)
datax

library(imputeTestbench)

q <- impute_errors(datax)
q

plot_impute(datax, methods = c("na.mean", "na.locf", "na.approx"))

plot_errors(q)

plot_errors(dataIn = q, plotType = "line")

attrib <- attr(q, 'errall')
attrib

Applying on activities dataset

library(RepDataPeerAssessment1)

data("activity")
activity

ok <- complete.cases(activity$steps)
st <- activity$steps[ok]
length(st)

library(imputeTestbench)

itb <- impute_errors(st)
itb

plot_errors(itb)

plot_errors(itb, plotType = 'line')

plot_impute(st, methods = c("na.mean", "na.locf", "na.interp"), missPercent = 10)

err_mape <- impute_errors(st, errorParameter = "mae")
err_mape

plot_errors(dataIn = err_mape, plotType = "line")

attr will give us the values of the five methods (approx, interp, interpolation, locf, mean) for each of the nine NA percentage errors.

attr(itb, 'errall')

Adding an imputation function

As described above, imputation methods supplied by the user can be added to impute_errors(). The example below demonstrates the addition of a random number imputation method to the error proﬁle. An R script ﬁle must be created for adding and saving the function. Additional functions can be added to the script as needed. User-supplied functions for imputation should use time series data with missing values as input and return the time series data with the imputed values as shown below.

The path where the R script is saved is used as an input string to the methodPath argument. The name of the new function is added to the methods argument, including any of the default methods used by impute_errors. Results are shown below.

ex <- impute_errors(dataIn = aus, methodPath = '../../R/imputation.R',
methods = c('na.mean', 'na.locf', 'na.approx', 'sss'))

ex

AlfonsoRReyes/RepDataPeerAssessment1 documentation built on May 5, 2019, 4:53 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AlfonsoRReyes/RepDataPeerAssessment1
Reproducible Research Assignment 1. Developed as a package

In AlfonsoRReyes/RepDataPeerAssessment1: Reproducible Research Assignment 1. Developed as a package

Demonstration of imputeTestbench with examples

Sample

Applying on activities dataset

Adding an imputation function

R Package Documentation

Browse R Packages

We want your feedback!

AlfonsoRReyes/RepDataPeerAssessment1 Reproducible Research Assignment 1. Developed as a package

In AlfonsoRReyes/RepDataPeerAssessment1: Reproducible Research Assignment 1. Developed as a package

Demonstration of imputeTestbench with examples

Sample

Applying on activities dataset

Adding an imputation function

R Package Documentation

Browse R Packages

We want your feedback!

AlfonsoRReyes/RepDataPeerAssessment1
Reproducible Research Assignment 1. Developed as a package