Self-training is a simple and effective semi-supervised learning classification method. The self-training classifier is initially trained with a reduced set of labeled examples. Then it is iteratively retrained with its own most confident predictions over the unlabeled examples. Self-training follows a wrapper methodology using a base supervised classifier to establish the possible class of unlabeled instances.
selfTraining(learner, max.iter = 50, perc.full = 0.7, thr.conf = 0.5)
model from parsnip package for training a supervised base classifier using a set of instances. This model need to have probability predictions (or optionally a distance matrix) and it's corresponding classes.
maximum number of iterations to execute the self-labeling process. Default is 50.
A number between 0 and 1. If the percentage of new labeled examples reaches this value the self-training process is stopped. Default is 0.7.
A number between 0 and 1 that indicates the confidence threshold.
At each iteration, only the newly labelled examples with a confidence greater than
this value (
For predicting the most accurate instances per iteration,
uses the predictions obtained with the learner specified. To train a model
learner function, it is required a set of instances
(or a precomputed matrix between the instances if
x.inst parameter is
in conjunction with the corresponding classes.
Additionals parameters are provided to the
learner function via the
learner.pars argument. The model obtained is a supervised classifier
ready to predict new instances through the
Using a similar idea, the additional parameters to the
are provided using the
pred.pars argument. The
pred function returns
the probabilities per class for each new instance. The value of the
thr.conf argument controls the confidence of instances selected
to enlarge the labeled set for the next iteration.
The stopping criterion is defined through the fulfillment of one of the following
criteria: the algorithm reaches the number of iterations defined in the
parameter or the portion of the unlabeled set, defined in the
is moved to the labeled set. In some cases, the process stops and no instances
are added to the original labeled set. In this case, the user must assign a more
flexible value to the
(When model fit) A list object of class "selfTraining" containing:
The final base classifier trained using the enlarged labeled set.
The indexes of the training instances used to
model. These indexes include the initial labeled instances
and the newly labeled instances.
Those indexes are relative to
The levels of
The function provided in the
The list provided in the
Unsupervised word sense disambiguation rivaling supervised methods.
In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 189-196. Association for Computational Linguistics, 1995.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
library(tidyverse) library(tidymodels) library(caret) library(SSLR) data(wine) set.seed(1) train.index <- createDataPartition(wine$Wine, p = .7, list = FALSE) train <- wine[ train.index,] test <- wine[-train.index,] cls <- which(colnames(wine) == "Wine") #% LABELED labeled.index <- createDataPartition(train$Wine, p = .2, list = FALSE) train[-labeled.index,cls] <- NA #We need a model with probability predictions from parsnip #https://tidymodels.github.io/parsnip/articles/articles/Models.html #It should be with mode = classification #For example, with Random Forest rf <- rand_forest(trees = 100, mode = "classification") %>% set_engine("randomForest") m <- selfTraining(learner = rf, perc.full = 0.7, thr.conf = 0.5, max.iter = 10) %>% fit(Wine ~ ., data = train) #Accuracy predict(m,test) %>% bind_cols(test) %>% metrics(truth = "Wine", estimate = .pred_class)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.