createFoldsPu: createFoldsPu

Description Usage Arguments See Also Examples

Description

Training/test partitions createDataPartition Create training/test splits for Create test/training partition for PU data. The positive data is split into k groups while the unlabeled data is used completely in every fold. If you want to use standard cross-validation use createFolds, which served as template for this function.

Usage

1
createFoldsPu(y, k, positive = NULL, indepUn = NULL, seed = NULL)

Arguments

y

a vector of outcomes for the positive and the negative class

k

an integer for the number of folds (applied to the positive class)

positive

the positive class in y. if empty the label with the smaller frequency is assumed to be the positive class.

indepUn

optional, a fraction (0<indepUn<1) specifying the fraction of or a vector of indices specifying the unlabeled samples to be used for validation.

seed

an integer in order to set a seed point

See Also

createMultiFoldsPu, createFolds

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## Not run: 

## a synthetic data set
data(bananas)

## create a pu-adapted partiton:
## leave-one-out with the positive training samples
## independent training/validation sets with unlabeled samples
## note that the validation samples will not be included in the final model. 

idx <- createFoldsPu(bananas$tr$y, 20, positive=1, 
           indepUn=which(bananas$tr$y==0)[1:250]) 
fit <- trainOcc(x=bananas$tr[, -1], y=bananas$tr[, 1], 
                index=idx)
pred <- predict(fit, bananas$x)

### compare the TPR estimated from training/test data
hist(fit, pred, ylim=c(0, .25))
hop.pos <- holdOutPredictions(fit)$pos 
### compare the TPR derived from train and test data.
lines( quantile(hop.pos, seq(0, 1, 0.1)), 
       seq(0, 1, 0.1)*.25 )
lines( quantile(pred[bananas$y[]==1], seq(0, 1, 0.1)), 
       seq(0, 1, 0.1)*.25, col="red")
featurespace(fit)
### note: final model fitted without the unlabeled validation data
### specified above by indepUn=which(bananas$tr$y==0)[1:250]
rownames(fit$trainingData)
### note: you might want to aggregate the predictions first and then
### calcualte the performance metric based on aggregated hold-out predictions
fit.up <- update(fit, aggregatePredictions=TRUE)
plot(fit.up$results$puF, fit.up$results$puFAP)
fit.up$results[which.max(fit.up$results$puF), ]
fit.up$results[which.max(fit.up$results$puFAP), ]
featurespace(fit)
colnames(fit.up$results)  

## End(Not run)

benmack/oneClass documentation built on Dec. 15, 2020, 7:38 p.m.