IAUC: Influence Functions On AUC
In BoShiangKe/InfluenceAUC: Identify Influential Observations in Binary Classification

View source: R/influenceAUC.R

IAUC	R Documentation

Influence Functions On AUC

Description

Provide two sample versions (DEIF and SIF) of influence function on the AUC.

Usage

IAUC(
  score,
  binary,
  threshold = 0.5,
  hypothesis = FALSE,
  testdiff = 0.5,
  alpha = 0.05,
  name = NULL
)

Arguments

`score`	A vector containing the predictions (continuous scores) assigned by classifiers; Must be numeric.
`binary`	A vector containing the true class labels 1: positive and 0: negative. Must have the same dimensions as 'score.'
`threshold`	A numeric value determining the threshold to distinguish influential observations from normal ones; Must lie between 0 and 1; Defaults to 0.5.
`hypothesis`	Logical which controls the evaluation of SIF under asymptotic distribution.
`testdiff`	A numeric value determining the difference in the hypothesis testing; Must lie between 0 and 1; Defaults to 0.5.
`alpha`	A numeric value determining the significance level in the hypothesis testing; Must lie between 0 and 1; Defaults to 0.05.
`name`	A vector comprising the appellations for observations; Must have the same dimensions as 'score'.

Details

Apply two sample versions of influence functions on AUC:

deleted empirical influence function (DEIF)
sample influence function (SIF)

The concept of influence function focuses on the deletion diagnostics; nevertheless, such techniques may face masking effect due to multiple influential observations. To thoroughly investigate the potential cases in binary classification, we suggest end-users to apply ICLC and LAUC as well. For a complete discussion of these functions, please see the reference.

Value

A list of objects including (1) 'output': a list of results with 'AUC' (numeric), 'SIF' (a list of dataframes) and 'DEIF' (a list of dataframes)); (2) 'rdata': a dataframe of essential results for visualization (3) 'threshold': a used numeric value to distinguish influential observations from normal ones; (4) 'test_output': a list of dataframes for hypothesis testing result; (5) 'test_data': a dataframe of essential results in hypothesis testing for visualization (6) 'testdiff': a used numeric value to determine the difference in the hypothesis testing; (7) 'alpha': a used nuermic value to determine the significance level.

Author(s)

Bo-Shiang Ke and Yuan-chin Ivan Chang

References

Ke, B. S., Chiang, A. J., & Chang, Y. C. I. (2018). Influence Analysis for the Area Under the Receiver Operating Characteristic Curve. Journal of biopharmaceutical statistics, 28(4), 722-734.

Examples

library(ROCR)
data("ROCR.simple")
# print out IAUC results directly
IAUC(ROCR.simple$predictions,ROCR.simple$labels,hypothesis = "True")

data(mtcars)
glmfit <- glm(vs ~ wt + disp, family = binomial, data = mtcars)
prob <- as.vector( predict(glmfit, newdata = mtcars,type = "response"))
output <- IAUC(prob, mtcars$vs, threshold = 0.3, testdiff = 0.3,
               hypothesis = TRUE, name = rownames(mtcars))
# Show results
print(output)
# Visualize results
plot(output)

BoShiangKe/InfluenceAUC documentation built on Nov. 4, 2024, 2:48 a.m.