SMM: Sample Mean Matching
In mlquantify: Algorithms for Class Distribution Estimation

Description Usage Arguments Value References Examples

SMM is a member of the DyS framework that uses simple means scores to represent the score distribution for positive, negative, and unlabelled scores. Therefore, the class distribution is given by a closed-form equation.

1	SMM(p.score, n.score, test)

`p.score`	a numeric `vector` of positive scores estimated either from a validation set or from a cross-validation method.
`n.score`	a numeric `vector` of negative scores estimated either from a validation set or from a cross-validation method.
`test`	a numeric `vector` containing the score estimated for the positive class from each test set instance.

A numeric vector containing the class distribution estimated from the test set.

Hassan, W., Maletzke, A., Batista, G. (2020). Accurately Quantifying a Billion Instances per Second. In IEEE International Conference on Data Science and Advanced Analytics (DSAA).

library(randomForest)
library(caret)
cv <- createFolds(aeAegypti$class, 3)
tr <- aeAegypti[cv$Fold1,]
validation <- aeAegypti[cv$Fold2,]
ts <- aeAegypti[cv$Fold3,]

# -- Getting a sample from ts with 80 positive and 20 negative instances --
ts_sample <- rbind(ts[sample(which(ts$class==1),80),],
                   ts[sample(which(ts$class==2),20),])
scorer <- randomForest(class~., data=tr, ntree=500)
scores <- cbind(predict(scorer, validation, type = c("prob")), validation$class)
test.scores <- predict(scorer, ts_sample, type = c("prob"))
SMM(p.score = scores[scores[,3]==1,1], n.score = scores[scores[,3]==2,1],
test = test.scores[,1])