compute_impact: A framework for analyzing the impact of discretization noise...

Description Usage Arguments Value

Description

The impact on the performance of the chosen classifier is given in terms of Accuracy, Precision, Recall, Brier Score, AUC, F-Measure and Mathew's Correlation Coefficient (MCC) Whereas the impact on the feature importance is given in terms of Likelihood of rank shifts

Usage

1
2
3
compute_impact(data, dep_var, classifier, limit, step_size,
  parallel = FALSE, n_cores = 1, boot_size = 100, cutpoint = NULL,
  save_interim_results = FALSE, dest_path = NULL)

Arguments

data

must be a object of type data.frame, with the continuous dependent variable

dep_var

a string giving the column name of continuous dependent variable supplied in the data parameter. This is the variable which creates the discretization noise.

classifier

a string, takes the name of the classifier.Currently supported classifiers are 'rf' - Random forest 'lrm' - Logistic regression 'CART' - Classification tree 'knn' - K-Nearest Neighbors

limit

a numeric value specifying the limit value to demarcate user/domain expert defined noisy area in the data. Typically limit determines the amount of data around the cutpoint being defined as the noisy area.

step_size

a numeric value determining in what steps must the noisy area impact must be analyzed. For faster runs, choose a larger step size, whereas for more accurate impact estimation use a smaller step-size.

parallel

a logical value indicating if the function must be executed in parallel –Recommended.

n_cores

a numeric value specifying the number of cores to be used for parallel execution. Defaults to 1.

boot_size

a numeric value. It specifies the number of bootstrap iterations to be used in the framework. Defaults to 100

cutpoint

a numeric value specifying the cutpoint to be used for discretizing the continuous dependent variable. This is the cutpoint around which discretization noise is to be analyzed. If not specified, median of the dependent variable is used as the cutpoint

save_interim_results

a logical value specifying if the intermediate performance and interpretation results are to be saved. Defaults to FALSE

dest_path

a string value specifying the desitination path in which the intermediate resutls are to be saved

Value

Returns a list constaining the performance and interpretation impact. Individual elemets of list are matrices


rgopikrishnan91/DiscNoise documentation built on May 6, 2019, 6:59 p.m.