pgu.imputation: pgu.imputation

Description Format Details Active bindings Methods Author(s)

Description

Analyses and substitutes imputation sites in a data set.

Format

R6::R6Class object.

Details

Analyses imputation sites in a data set. Replaces imputation sites by missing values and substitutes NAs by classical and ML-powered substitution algorithms. This object is used by the shiny based gui and is not for use in individual R-scripts!

Active bindings

imputationStatistics

Returns the instance variable imputationStatistics. (tibble::tibble)

imputationSites

Returns the instance variable imputationSites. (tibble::tibble)

one_hot_df

Returns the positions of missings in one_hot encoding (tibble::tibble)

imputationSiteDistribution

Returns the instance variable imputationSiteDistribution. (matrix)

imputationAgentAlphabet

Returns the instance variable imputationagentAlphabet. (character)

imputationAgent

Returns the instance variable imputationAgent. (character)

setImputationAgent

Sets the instance variable imputationAgent. (character)

nNeighbors

Returns the instance variable nNeighbors. (integer)

setNNeighbors

Sets the instance variable nNeighbors. (integer)

flux_df

Returns the instance variable flux_df (tibble::tibble)

outflux_thr

Returns the instance variable outflux_thr. (numeric)

setOutflux_thr

Sets the instance variable outflux_thr. (numeric)

pred_frac

Returns the instance variable pred_frac. (numeric)

setPred_frac

Sets the instance variable pred_frac. (numeric)

pred_mat

Returns the instance variable pred_mat. (matrix)

exclude_vec

Returns the instance variable exclude_vec (character)

seed

Returns the instance variable seed. (numeric)

setSeed

Sets the instance variable seed. (numeric)

iterations

Returns the instance variable iterations. (numeric)

setIterations

Sets the instance variable iterations. (numeric)

amv

Returns the instance variable amv. (numeric)

success

Returns the instance variable success. (logical)

Methods

Public methods


Method new()

Creates and returns a new pgu.imputation object.

Usage
pgu.imputation$new(
  seed = 42,
  iterations = 4,
  imputationAgent = "none",
  nNeighbors = 3,
  pred_frac = 1,
  outflux_thr = 0.5
)
Arguments
seed

Initially sets the instance variable seed. Default is 42. (integer)

iterations

Initially sets the instance variable iterations. Default is 4. (integer)

imputationAgent

Initially sets the instance variable imputationAgent. Default is "none". Options are: ""none", "median", "mean", "expValue", "monteCarlo", "knn", "pmm", "cart", "randomForest", "M5P". (string)

nNeighbors

Initially sets the instance variable nNeighbors. (integer)

pred_frac

Initially sets the instance variable pred_frac. (numeric)

outflux_thr

Initially sets the instance fariable outflux_thr

Returns

A new pgu.imputation object. (pguIMP::pgu.imputation)


Method finalize()

Clears the heap and indicates that instance of pgu.imputation is removed from heap.

Usage
pgu.imputation$finalize()

Method print()

Prints instance variables of a pgu.imputation object.

Usage
pgu.imputation$print()
Returns

string


Method gatherImputationSites()

Gathers imputation sites from pguIMP's missings and outliers class.

Usage
pgu.imputation$gatherImputationSites(
  missings_df = "tbl_df",
  outliers_df = "tbl_df"
)
Arguments
missings_df

Dataframe comprising information about the imputation sites of pguIMP's missings class. (tibble::tibble)

outliers_df

Dataframe comprising information about the imputation sites of pguIMP's outliers class. (tibble::tibble)


Method gatherImputationSiteStatistics()

Gathers statistical information about imputation sites The information is stored within the classes instance variable imputationStatistics

Usage
pgu.imputation$gatherImputationSiteStatistics(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)


Method gatherImputationSiteDistribution()

Gathers the distribution of imputation sites within the data frame. The information is stored within the classes instance variable imputationSiteDistribution.

Usage
pgu.imputation$gatherImputationSiteDistribution(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

Returns

A data frame (tibble::tibble)


Method insertImputationSites()

Takes a dataframe, replaces the imputation sites indicated by the instance variable imputationsites by NA, and returns the mutated dataframe.

Usage
pgu.imputation$insertImputationSites(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

Returns

A mutated version of data_df. (tibble::tibble)


Method one_hot()

Gathers statistical information about missing values in one hot format. The result is stored in the instance variable one_hot_df.

Usage
pgu.imputation$one_hot(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)


Method analyzeImputationSites()

Takes a dataframe and analyses the imputation sites.

Usage
pgu.imputation$analyzeImputationSites(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)


Method imputationSiteIdxByFeature()

Returns the position of an attribute's imputation sites within a data frame.

Usage
pgu.imputation$imputationSiteIdxByFeature(featureName = "character")
Arguments
featureName

The attribute's name. (character)

Returns

The postion of the imputation sites. (numeric)


Method nanFeatureList()

Characterizes each row of the data frame as either complete or indicates which attribute are missing within the row. If multiple attributes' row entries are missing, the row is characterized by multiple.

Usage
pgu.imputation$nanFeatureList(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

Returns

Vector of row characteristics. (character)


Method average_number_of_predictors()

Calculates the average number of predictors for a given dataframe and minpuc and mincor variables using the mice::quickpred routine.

Usage
pgu.imputation$average_number_of_predictors(
  data_df = "tbl_df",
  minpuc = 0,
  mincor = 0.1
)
Arguments
data_df

The dataframe to be analyzed (tibble::tibble)

minpuc

Specifies the minimum threshold for the proportion of usable cases. (numeric)

mincor

Specifies the minimum threshold against which the absolute correlation in the dataframe is compared. (numeric)

Returns

Average_number_of_predictors. (numeric)


Method detectPredictors()

Identifies possible predictors for each feature. Analysis results are written to the instance variable pred_mat. Intermediate analysis results are an influx/outflux dataframe that is written to the instance variable flux_df and detect predictors and a list of features that is excluded from the search for possible predictors that is written to the instance variable exclude_vec.

Usage
pgu.imputation$detectPredictors(data_df = "tbl_df")
Arguments
data_df

The dataframe to be analyzed. (tibble::tibble)


Method handleImputationSites()

Chooses a cleaning method based upon the instance variable imputationAgent and handles the imputation sites in the dataframe. Returns a cleaned data set. Display the progress if shiny is loaded.

Usage
pgu.imputation$handleImputationSites(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored within this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByMedian()

Substitutes imputation sites by the median of the respective attribute. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByMedian(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByMean()

Substitutes imputation sites by the aritmertic mean of the respective attribute. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByMean(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByExpectationValue()

Substitutes imputation sites by the expectation value of the respective attribute. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByExpectationValue(
  data_df = "tbl_df",
  progress = "Progress"
)
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByMC()

Substitutes imputation sites by values generated by a monte carlo simulation. The procedure runs several times as defined by the instance variable iterations. The run with the best result is identified and used for substitution. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByMC(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByKnn()

Substitutes imputation sites by predictions of a KNN analysis of the whole dataframe. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByKnn(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByMice()

Substitutes imputation sites by values generated by a different methods of the mice package. The procedure runs several times as defined by the instance variable iterations. The run with the best result is identified and used for substitution. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByMice(data_df, progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputeByM5P()

Substitutes imputation sites by predictions of a M5P tree trained on the whole dataframe. Returns the cleaned dataframe. Display the progress if shiny is loaded.

Usage
pgu.imputation$imputeByM5P(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

Cleaned dataframe. (tibble:tibble)


Method imputationSiteHeatMap()

Displays the distribution of missing values in form of a heatmap.

Usage
pgu.imputation$imputationSiteHeatMap()
Returns

A heatmap plot. (ggplot2::ggplot)


Method featureBarPlot()

Displays the distribution of an attribute values as histogram.

Usage
pgu.imputation$featureBarPlot(data_df = "tbl_df", feature = "character")
Arguments
data_df

dataframe to be analyzed. (tibble::tibble)

feature

attribute to be shown. (character)

Returns

A histogram. (ggplot2::ggplot)


Method featureBoxPlotWithSubset()

Displays the distribution of an attribute's values as box plot.

Usage
pgu.imputation$featureBoxPlotWithSubset(
  data_df = "tbl_df",
  feature = "character"
)
Arguments
data_df

dataframe to be analyzed. (tibble::tibble)

feature

attribute to be shown. (character)

Returns

A box plot. (ggplot2::ggplot)


Method featurePlot()

Displays the distribution of an attribute's values as a composition of a box plot and a histogram.

Usage
pgu.imputation$featurePlot(data_df = "tbl_df", feature = "character")
Arguments
data_df

dataframe to be analyzed. (tibble::tibble)

feature

attribute to be shown. (character)

Returns

A composite plot. (ggplot2::ggplot)


Method fluxPlot()

Displays an influx/outflux plot

Usage
pgu.imputation$fluxPlot()
Returns

A composite plot. (ggplot2::ggplot)


Method clone()

The objects of this class are cloneable with this method.

Usage
pgu.imputation$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

Sebastian Malkusch, malkusch@med.uni-frankfurt.de


pguIMP documentation built on Sept. 30, 2021, 5:08 p.m.