createFoldsPu: createFoldsPu
In benmack/oneClass: One-class classification in the absence of test data

Description Usage Arguments See Also Examples

Training/test partitions createDataPartition Create training/test splits for Create test/training partition for PU data. The positive data is split into k groups while the unlabeled data is used completely in every fold. If you want to use standard cross-validation use createFolds, which served as template for this function.

1	createFoldsPu(y, k, positive = NULL, indepUn = NULL, seed = NULL)

`y`	a vector of outcomes for the positive and the negative class
`k`	an integer for the number of folds (applied to the positive class)
`positive`	the positive class in y. if empty the label with the smaller frequency is assumed to be the positive class.
`indepUn`	optional, a fraction (0<indepUn<1) specifying the fraction of or a vector of indices specifying the unlabeled samples to be used for validation.
`seed`	an integer in order to set a seed point

createMultiFoldsPu, createFolds

## Not run: 

## a synthetic data set
data(bananas)

## create a pu-adapted partiton:
## leave-one-out with the positive training samples
## independent training/validation sets with unlabeled samples
## note that the validation samples will not be included in the final model. 

idx <- createFoldsPu(bananas$tr$y, 20, positive=1, 
           indepUn=which(bananas$tr$y==0)[1:250]) 
fit <- trainOcc(x=bananas$tr[, -1], y=bananas$tr[, 1], 
                index=idx)
pred <- predict(fit, bananas$x)

### compare the TPR estimated from training/test data
hist(fit, pred, ylim=c(0, .25))
hop.pos <- holdOutPredictions(fit)$pos 
### compare the TPR derived from train and test data.
lines( quantile(hop.pos, seq(0, 1, 0.1)), 
       seq(0, 1, 0.1)*.25 )
lines( quantile(pred[bananas$y[]==1], seq(0, 1, 0.1)), 
       seq(0, 1, 0.1)*.25, col="red")
featurespace(fit)
### note: final model fitted without the unlabeled validation data
### specified above by indepUn=which(bananas$tr$y==0)[1:250]
rownames(fit$trainingData)
### note: you might want to aggregate the predictions first and then
### calcualte the performance metric based on aggregated hold-out predictions
fit.up <- update(fit, aggregatePredictions=TRUE)
plot(fit.up$results$puF, fit.up$results$puFAP)
fit.up$results[which.max(fit.up$results$puF), ]
fit.up$results[which.max(fit.up$results$puFAP), ]
featurespace(fit)
colnames(fit.up$results)  

## End(Not run)