selfTraining: Self-training method
In mabelc/SSC: Semi-Supervised Classification Methods

Description Usage Arguments Details Value References Examples

Self-training is a simple and effective semi-supervised learning classification method. The self-training classifier is initially trained with a reduced set of labeled examples. Then it is iteratively retrained with its own most confident predictions over the unlabeled examples. Self-training follows a wrapper methodology using a base supervised classifier to establish the possible class of unlabeled instances.

1
2
3

selfTraining(x, y, x.inst = TRUE, learner, learner.pars = NULL,
  pred = "predict", pred.pars = NULL, max.iter = 50,
  perc.full = 0.7, thr.conf = 0.5)

`x`	A object that can be coerced as matrix. This object has two possible interpretations according to the value set in the `x.inst` argument: a matrix with the training instances where each row represents a single instance or a precomputed (distance or kernel) matrix between the training examples.
`y`	A vector with the labels of the training instances. In this vector the unlabeled instances are specified with the value `NA`.
`x.inst`	A boolean value that indicates if `x` is or not an instance matrix. Default is `TRUE`.
`learner`	either a function or a string naming the function for training a supervised base classifier, using a set of instances (or optionally a distance matrix) and it's corresponding classes.
`learner.pars`	A list with additional parameters for the `learner` function if necessary. Default is `NULL`.
`pred`	either a function or a string naming the function for predicting the probabilities per classes, using the base classifier trained with the `learner` function. Default is `"predict"`.
`pred.pars`	A list with additional parameters for the `pred` function if necessary. Default is `NULL`.
`max.iter`	maximum number of iterations to execute the self-labeling process. Default is 50.
`perc.full`	A number between 0 and 1. If the percentage of new labeled examples reaches this value the self-training process is stopped. Default is 0.7.
`thr.conf`	A number between 0 and 1 that indicates the confidence threshold. At each iteration, only the newly labelled examples with a confidence greater than this value (`thr.conf`) are added to the training set.

For predicting the most accurate instances per iteration, selfTraining uses the predictions obtained with the learner specified. To train a model using the learner function, it is required a set of instances (or a precomputed matrix between the instances if x.inst parameter is FALSE) in conjunction with the corresponding classes. Additionals parameters are provided to the learner function via the learner.pars argument. The model obtained is a supervised classifier ready to predict new instances through the pred function. Using a similar idea, the additional parameters to the pred function are provided using the pred.pars argument. The pred function returns the probabilities per class for each new instance. The value of the thr.conf argument controls the confidence of instances selected to enlarge the labeled set for the next iteration.

The stopping criterion is defined through the fulfillment of one of the following criteria: the algorithm reaches the number of iterations defined in the max.iter parameter or the portion of the unlabeled set, defined in the perc.full parameter, is moved to the labeled set. In some cases, the process stops and no instances are added to the original labeled set. In this case, the user must assign a more flexible value to the thr.conf parameter.

A list object of class "selfTraining" containing:

model: The final base classifier trained using the enlarged labeled set.
instances.index: The indexes of the training instances used to train the model. These indexes include the initial labeled instances and the newly labeled instances. Those indexes are relative to x argument.
classes: The levels of y factor.
pred: The function provided in the pred argument.
pred.pars: The list provided in the pred.pars argument.

David Yarowsky.
Unsupervised word sense disambiguation rivaling supervised methods.
In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 189-196. Association for Computational Linguistics, 1995.

library(ssc)

## Load Wine data set
data(wine)

cls <- which(colnames(wine) == "Wine")
x <- wine[, -cls] # instances without classes
y <- wine[, cls] # the classes
x <- scale(x) # scale the attributes

## Prepare data
set.seed(20)
# Use 50% of instances for training
tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))
xtrain <- x[tra.idx,] # training instances
ytrain <- y[tra.idx]  # classes of training instances
# Use 70% of train instances as unlabeled set
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7))
ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances

# Use the other 50% of instances for inductive testing
tst.idx <- setdiff(1:length(y), tra.idx)
xitest <- x[tst.idx,] # testing instances
yitest <- y[tst.idx] # classes of testing instances

## Example: Training from a set of instances with 1-NN as base classifier.
m1 <- selfTraining(x = xtrain, y = ytrain, 
                   learner = caret::knn3, 
                   learner.pars = list(k = 1),
                   pred = "predict")
pred1 <- predict(m1, xitest)
table(pred1, yitest)

## Example: Training from a distance matrix with 1-NN as base classifier.
dtrain <- as.matrix(proxy::dist(x = xtrain, method = "euclidean", by_rows = TRUE))
m2 <- selfTraining(x = dtrain, y = ytrain, x.inst = FALSE,
                   learner = ssc::oneNN, 
                   pred = "predict",
                   pred.pars = list(distance.weighting = "none"))
ditest <- proxy::dist(x = xitest, y = xtrain[m2$instances.index,],
                      method = "euclidean", by_rows = TRUE)
pred2 <- predict(m2, ditest)
table(pred2, yitest)

## Example: Training from a set of instances with SVM as base classifier.
learner <- e1071::svm
learner.pars <- list(type = "C-classification", kernel="radial", 
                     probability = TRUE, scale = TRUE)
pred <- function(m, x){
  r <- predict(m, x, probability = TRUE)
  prob <- attr(r, "probabilities")
  prob
}
m3 <- selfTraining(x = xtrain, y = ytrain, 
                   learner = learner, 
                   learner.pars = learner.pars, 
                   pred = pred)
pred3 <- predict(m3, xitest)
table(pred3, yitest)

## Example: Training from a set of instances with Naive-Bayes as base classifier.
m4 <- selfTraining(x = xtrain, y = ytrain, 
                   learner = function(x, y) e1071::naiveBayes(x, y), 
                   pred = "predict",
                   pred.pars = list(type = "raw"))
pred4 <- predict(m4, xitest)
table(pred4, yitest)

## Example: Training from a set of instances with C5.0 as base classifier.
m5 <- selfTraining(x = xtrain, y = ytrain, 
                   learner = C50::C5.0, 
                   pred = "predict",
                   pred.pars = list(type = "prob"))
pred5 <- predict(m5, xitest)
table(pred5, yitest)