pgu.outliers: pgu.outliers

Description Format Details Active bindings Methods Author(s)

Description

Detects and replaces possible outliers from data set.

Format

R6::R6Class object.

Details

Performes Grubb's test for outliers to detect outliers in the normalized and Z-score transfromed data set. Replace missing values with substitutes by classical and AI-powerd substitution algorithms. For this purpose outliers are handled as imputation sites.

Active bindings

outliersParameter

Returns the instance variable outliersParameter. (tibble::tibble)

outliers

Returns the instance variable outliers. (tibble::tibble)

one_hot_df

Returns the positions of missings in one_hot encoding (tibble::tibble)

outliersStatistics

Returns the instance variable outliersStatistics. (tibble::tibble)

outliersAgentAlphabet

Returns the instance variable of outliersAgentAlphabet (character)

outliersAgent

Returns the instance variable outliersAgent. (character)

setOutliersAgent

Sets the instance variable outliersAgent. (character)

featureData

Returns the instance variable featureData. (numeric)

alpha

Returns the instance variable alpha. (numeric)

setAlpha

Set the instance variable alpha. (numeric)

epsilon

Returns the instance variable epsilon. (numeric)

setEpsilon

Set the instance variable epsilon. (numeric)

minSamples

Returns the instance variable minSamples. (integer)

setMinSamples

Set the instance variable minSamples. (integer)

gamma

Returns the instance variable gamma. (numeric)

setGamma

Set the instance variable gamma. (numeric)

nu

Returns the instance variable nu. (numeric)

setNu

Set the instance variable nu. (numeric)

k

Returns the instance variable k (integer)

setK

Sets the instance variable k. (integer)

cutoff

Returns the instance variable cutoff. (numeric)

setCutoff

Sets the instance variable cutoff. (numeric)

seed

Returns the instance variable seed. (integer)

setSeed

Set the instance variable seed. (integer)

Methods

Public methods


Method new()

Creates and returns a new pgu.outliers object.

Usage
pgu.outliers$new(
  data_df = "tbl_df",
  alpha = 0.05,
  epsilon = 0.1,
  minSamples = 4,
  gamma = 0.05,
  nu = 0.1,
  k = 4,
  cutoff = 0.99,
  seed = 42
)
Arguments
data_df

The data to be cleaned. (tibble::tibble)

alpha

Initial definition of the instance variable alpha. (numeric)

epsilon

Initial definition of the instance variable epsilon. (numeric)

minSamples

Initial definition of the instance variable minSamples. (integer)

gamma

Initial definition of the instance variable gamma. (numeric)

nu

Initial definition of the instance variable nu. (numeric)

k

Initial definition of the instance variable k. (integer)

cutoff

Initial definition of the instance variable cutoff. (numeric)

seed

Initial definition of the instance variable seed. (integer)

Returns

A new pgu.outliers object. (pguIMP::pgu.outliers)


Method finalize()

Clears the heap and indicates that instance of pgu.outliers is removed from heap.

Usage
pgu.outliers$finalize()

Method print()

Prints instance variables of a pgu.outliers object.

Usage
pgu.outliers$print()
Returns

string


Method resetOutliers()

Resets instance variables and performes Grubb's test for outliers to detect outliers in the normalized and Z-score transfromed data set. Progresse is indicated by the progress object passed to the function.

Usage
pgu.outliers$resetOutliers(data_df = "tbl_df")
Arguments
data_df

Dataframe to be analyzed. (tibble::tibble)


Method filterFeatures()

Filters attributes from the given dataframe that are known to the class.

Usage
pgu.outliers$filterFeatures(data_df = "tbl_df")
Arguments
data_df

Dataframe to be filtered. (tibble::tibble)

Returns

A filterd dataframe. (tibble::tibble)


Method checkFeatureValidity()

Checks if the feature consists of a sufficient number of instances.

Usage
pgu.outliers$checkFeatureValidity(data_df = "tbl_df", feature = "character")
Arguments
data_df

Dataframe to be analyzed (tibble::tibble)

feature

The attribute to be analyzed. (character)


Method detectOutliersParameter()

determines the outliers parameter by analyzing the tibble data_df and the instance variable outliers. Results are stored to instance variable outliersParameter.

Usage
pgu.outliers$detectOutliersParameter(data_df = "tbl_df")
Arguments
data_df

Dataframe to be analyzed. (tibble::tibble)


Method outliersFeatureList()

Characterizes each row of the data frame as either complete or indicates which attribute has been identified as an outlier within the row. If multiple attributes' row entries were identified as outliers, the row is characterized by multiple.

Usage
pgu.outliers$outliersFeatureList(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

Returns

Vector of row characteristics. (character)


Method featureOutlier()

Returns the detected outliers of a given attribute.

Usage
pgu.outliers$featureOutlier(feature = "character")
Arguments
feature

The attribute to be analyzed (character)

Returns

The attribute's outliers (tibble::tibble)


Method one_hot()

Gathers statistical information about missing values in one hot format. The result is stored in the instance variable one_hot_df.

Usage
pgu.outliers$one_hot(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)


Method detectOutliers()

Chooses a method for identification of anomalies based upon the instance variable outliersAgent Detects anomalies in a data frame by one-dimensional analysis of each feature.

Usage
pgu.outliers$detectOutliers(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)


Method detectByGrubbs()

Identifies anomalies in the data frame based on Grubb's test. Iterates over the whole data frame. Calls the object's public function grubbs_numeric until no more anomalies are identified. The threshold for anomaly detection is defined in the instance variable alpha. Display the progress if shiny is loaded.

Usage
pgu.outliers$detectByGrubbs(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)


Method grubbs_numeric()

Performs Grubb's test for anomalies to detect a single outlier in the provided attributes data. If an outlier is found, it is added to the instance variable outliers. The threshold for anomaly detection is difined in the instance variable alpha. The function indicates a find by a positive feedback.

Usage
pgu.outliers$grubbs_numeric(data_df = "tbl_df", feature = "character")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

feature

The attribute within the data frame to be analyzed.

Returns

Feedback if an outlier was found. (logical)


Method detectByDbscan()

Identifies anomalies in the data frame based on DBSCAN. Iterates over the whole data frame. Calls the object's public function dbscan_numeric until all features are analyzed. The cluster hyper parameter are defined in the instance variables epsilon and minSamples. The results of the dbscan_numeric routine are added to the instance variable outliers. Display the progress if shiny is loaded.

Usage
pgu.outliers$detectByDbscan(data_df = "tbl_df", progress = "Progress")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)


Method dbscan_numeric()

Identifies anomalies in a single feature of a data frame based on DBSCAN. The cluster hyperparameter are defined in the instance variables epsilon and minSamples. Display the progress if shiny is loaded.

Usage
pgu.outliers$dbscan_numeric(data_df = "tbl_df", feature = "character")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

feature

Feature to be analyzed (character)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

A data frame comprising the information about detected anomalies of the feature. (tibble::tibble)


Method detectBySvm()

Identifies anomalies in the data frame based on one class SVM. Iterates over the whole data frame. Calls the object's public function svm_numeric until all features are analyzed. The cluster hyper parameter are defined in the instance variables gamma and nu. The results of the svm_numeric routine are added to the instance variable outliers. Display the progress if shiny is loaded.

Usage
pgu.outliers$detectBySvm(data_df = "tbl_df", progress = "Process")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)


Method svm_numeric()

Identifies anomalies in a single feature of a data frame based on one class SVM. The cluster hyperparameter are defined in the instance variables gamma and nu. Display the progress if shiny is loaded.

Usage
pgu.outliers$svm_numeric(data_df = "tbl_df", feature = "character")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

feature

Feature to be analyzed (character)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

A data frame comprising the information about detected anomalies of the feature. (tibble::tibble)


Method detectByKnn()

Identifies anomalies in the data frame based on knnO. Iterates over the whole data frame. Calls the object's public function svm_numeric until all features are analyzed. The cluster hyper parameter are defined in the instance variables alpha and minSamples. The results of the knn_numeric routine are added to the instance variable outliers. Display the progress if shiny is loaded.

Usage
pgu.outliers$detectByKnn(data_df = "tbl_df", progress = "Process")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)


Method knn_numeric()

Identifies anomalies in a single feature of a data frame based on knnO. The cluster hyperparameter are defined in the instance variables alpha and minSmaples. Display the progress if shiny is loaded.

Usage
pgu.outliers$knn_numeric(data_df = "tbl_df", feature = "character")
Arguments
data_df

Data frame to be analyzed. (tibble::tibble)

feature

Feature to be analyzed (character)

progress

If shiny is loaded, the analysis' progress is stored in this instance of the shiny Progress class. (shiny::Progress)

Returns

A data frame comprising the information about detected anomalies of the feature. (tibble::tibble)


Method setImputationSites()

Replaces the detected anomalies of a user provided data frame with NA for further imputation routines.

Usage
pgu.outliers$setImputationSites(data_df = "tbl_df")
Arguments
data_df

Data frame to be mutated. (tibble::tibble)

Returns

A data frame with anomalies replaced by NA. (tibble::tibble)


Method calcOutliersStatistics()

Calculates the statistics on the previously performed outlier detection analysis and stores the results in the instance variable outliersStatistcs.

Usage
pgu.outliers$calcOutliersStatistics(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)


Method outlierTable()

Creates a datatable with substituted outliers highlightes by colored background.

Usage
pgu.outliers$outlierTable(data_df = "tbl_df")
Arguments
data_df

The data frame to be analyzed. (tibble::tibble)

Returns

A colored datatable (DT::datatable)


Method plotOutliersDistribution()

Displays the occurrence of outlier candidates per attribute as bar plot.

Usage
pgu.outliers$plotOutliersDistribution()
Returns

A bar plot. (ggplot2::ggplot)


Method featureBarPlot()

Displays the distribution of an attribute's values as histogram.

Usage
pgu.outliers$featureBarPlot(data_df = "tbl_df", feature = "character")
Arguments
data_df

dataframe to be analyzed. (tibble::tibble)

feature

attribute to be shown. (character)

Returns

A histogram. (ggplot2::ggplot)


Method featureBoxPlotWithSubset()

Displays the distribution of an attribute's vlues as box plot.

Usage
pgu.outliers$featureBoxPlotWithSubset(
  data_df = "tbl_df",
  feature = "character"
)
Arguments
data_df

dataframe to be analyzed. (tibble::tibble)

feature

attribute to be shown. (character)

Returns

A box plot. (ggplot2::ggplot)


Method featurePlot()

Displays the distribution of an attribute's values as a composition of a box plot and a histogram.

Usage
pgu.outliers$featurePlot(data_df = "tbl_df", feature = "character")
Arguments
data_df

dataframe to be analyzed. (tibble::tibble)

feature

attribute to be shown. (character)

Returns

A composite plot. (ggplot2::ggplot)


Method clone()

The objects of this class are cloneable with this method.

Usage
pgu.outliers$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

Sebastian Malkusch, malkusch@med.uni-frankfurt.de


pguIMP documentation built on Sept. 30, 2021, 5:08 p.m.