Description Usage Arguments See Also Examples
Training/test partitions createDataPartition
Create training/test splits for Create test/training partition for PU data.
The positive data is split into k groups while the unlabeled data is used
completely in every fold. If you want to use standard cross-validation use
createFolds
, which served as template for this function.
1 | createFoldsPu(y, k, positive = NULL, indepUn = NULL, seed = NULL)
|
y |
a vector of outcomes for the positive and the negative class |
k |
an integer for the number of folds (applied to the positive class) |
positive |
the positive class in y. if empty the label with the smaller frequency is assumed to be the positive class. |
indepUn |
optional, a fraction (0<indepUn<1) specifying the fraction of or a vector of indices specifying the unlabeled samples to be used for validation. |
seed |
an integer in order to set a seed point |
createMultiFoldsPu
, createFolds
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ## Not run:
## a synthetic data set
data(bananas)
## create a pu-adapted partiton:
## leave-one-out with the positive training samples
## independent training/validation sets with unlabeled samples
## note that the validation samples will not be included in the final model.
idx <- createFoldsPu(bananas$tr$y, 20, positive=1,
indepUn=which(bananas$tr$y==0)[1:250])
fit <- trainOcc(x=bananas$tr[, -1], y=bananas$tr[, 1],
index=idx)
pred <- predict(fit, bananas$x)
### compare the TPR estimated from training/test data
hist(fit, pred, ylim=c(0, .25))
hop.pos <- holdOutPredictions(fit)$pos
### compare the TPR derived from train and test data.
lines( quantile(hop.pos, seq(0, 1, 0.1)),
seq(0, 1, 0.1)*.25 )
lines( quantile(pred[bananas$y[]==1], seq(0, 1, 0.1)),
seq(0, 1, 0.1)*.25, col="red")
featurespace(fit)
### note: final model fitted without the unlabeled validation data
### specified above by indepUn=which(bananas$tr$y==0)[1:250]
rownames(fit$trainingData)
### note: you might want to aggregate the predictions first and then
### calcualte the performance metric based on aggregated hold-out predictions
fit.up <- update(fit, aggregatePredictions=TRUE)
plot(fit.up$results$puF, fit.up$results$puFAP)
fit.up$results[which.max(fit.up$results$puF), ]
fit.up$results[which.max(fit.up$results$puFAP), ]
featurespace(fit)
colnames(fit.up$results)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.