puSummary: puSummary
In benmack/oneClass: One-class classification in the absence of test data

Description Usage Arguments Details Value References See Also Examples

This function computes PU-performance metrics useful for model selection. NOTE: The metrics are prone of uncertainty due to the nature PU-data and the user should check the plausibility of any automatic selection.

1	puSummary(data, lev = NULL, model = NULL, calcAUC = TRUE)

`data`	a data frame or matrix with columns `obs`, `pred`, and `pos`. The first two are the binary observed, predicted outcomes and the latter one the continous outcome for the positive class.
`lev`	a character vector of factors levels for the response (default is `NULL`).
`model`	a character string for the model name (as taken form the method argument of train (default is `NULL`).

Calculates performance metrics based on positive and unlabeled data.

The following metrics are calculated and stored:

tpr true positive rate (from P-data)
ppp probability of positive prediction (from U-data)
puAuc The area under the ROC curve (form PU-data), see Phillips et al. (2008)
puF (tpr^2)/ppp, see Liu et al (2003)
negD01 -sqrt( (1-tpr)^2 + ppp^2 ), i.e. the distance between the point c(0,1) and the c(ppp, tpr)

From the labeled positive data the true positive rate (TPR), and thus the false negative rate (FNR), can be estimated directly. Unfortunately it is not possible to estimate the true negative rate (TNR), and thus neither the false positive rate (FPR). But we can also estimate the probability of positive prediction (PPP), i.e. the fraction of unlabeled samples predicted as positives from our model. The TPR and the PPP are useful and intuitive quantities. Assume that the i) TPR can be estimated accurately and ii) we need to select one of several candidate models which all exhibit the same TPR. We can select the model with the lowest PPP because for a given TPR (and FNR), it leads to a higher TNR (and lower FPR). See plot_PPPvsTPR
Furthermore, the puAuc

A vector of performance estimates.

Phillips, Steven J. and Dud\'ik, Miroslav (2008): Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 2.
Liu, Bing and Dai, Yang and Li, Xiaoli and Lee, Wee Sun and Yu, Philip S. (2003): Building text classifiers using positive and unlabeled examples. In: Intl. Conf. on Data Mining, 2003.

postResample

 ## Not run: 
### create a PU-data example of random numbers
P <- rnorm( 30, 2 )
U <- c( rnorm( 80 ), rnorm( 20, 2 ) )
d <- data.frame( obs = puFactor( rep( c( 1, 0 ), c( 30, 100 ) ), positive = 1 ), 
                 pred = puFactor( c(P, U)>0, positive = TRUE ),
                 pos = c( P, U ) )
puSummary(d)

## End(Not run)