puSummary: puSummary

Description Usage Arguments Details Value References See Also Examples

View source: R/puSummary.R

Description

This function computes PU-performance metrics useful for model selection. NOTE: The metrics are prone of uncertainty due to the nature PU-data and the user should check the plausibility of any automatic selection.

Usage

1
puSummary(data, lev = NULL, model = NULL, calcAUC = TRUE)

Arguments

data

a data frame or matrix with columns obs, pred, and pos. The first two are the binary observed, predicted outcomes and the latter one the continous outcome for the positive class.

lev

a character vector of factors levels for the response (default is NULL).

model

a character string for the model name (as taken form the method argument of train (default is NULL).

Details

Calculates performance metrics based on positive and unlabeled data.

The following metrics are calculated and stored:

From the labeled positive data the true positive rate (TPR), and thus the false negative rate (FNR), can be estimated directly. Unfortunately it is not possible to estimate the true negative rate (TNR), and thus neither the false positive rate (FPR). But we can also estimate the probability of positive prediction (PPP), i.e. the fraction of unlabeled samples predicted as positives from our model. The TPR and the PPP are useful and intuitive quantities. Assume that the i) TPR can be estimated accurately and ii) we need to select one of several candidate models which all exhibit the same TPR. We can select the model with the lowest PPP because for a given TPR (and FNR), it leads to a higher TNR (and lower FPR). See plot_PPPvsTPR
Furthermore, the puAuc

Value

A vector of performance estimates.

References

Phillips, Steven J. and Dud\'ik, Miroslav (2008): Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 2.
Liu, Bing and Dai, Yang and Li, Xiaoli and Lee, Wee Sun and Yu, Philip S. (2003): Building text classifiers using positive and unlabeled examples. In: Intl. Conf. on Data Mining, 2003.

See Also

postResample

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 ## Not run: 
### create a PU-data example of random numbers
P <- rnorm( 30, 2 )
U <- c( rnorm( 80 ), rnorm( 20, 2 ) )
d <- data.frame( obs = puFactor( rep( c( 1, 0 ), c( 30, 100 ) ), positive = 1 ), 
                 pred = puFactor( c(P, U)>0, positive = TRUE ),
                 pos = c( P, U ) )
puSummary(d)

## End(Not run)

benmack/oneClass documentation built on Dec. 15, 2020, 7:38 p.m.