Description Usage Arguments Details Value References Examples
Co-Training by Committee (CoBC) is a semi-supervised learning algorithm
with a co-training style. This algorithm trains N
classifiers with the learning
scheme defined in the learner
argument using a reduced set of labeled examples. For
each iteration, an unlabeled
example is labeled for a classifier if the most confident classifications assigned by the
other N-1
classifiers agree on the labeling proposed. The unlabeled examples
candidates are selected randomly from a pool of size u
.
1 2 3 |
x |
An object that can be coerced to a matrix. This object has two possible
interpretations according to the value set in the |
y |
A vector with the labels of the training instances. In this vector
the unlabeled instances are specified with the value |
x.inst |
A boolean value that indicates if |
learner |
either a function or a string naming the function for training a supervised base classifier, using a set of instances (or optionally a distance matrix) and it's corresponding classes. |
learner.pars |
A list with additional parameters for the
|
pred |
either a function or a string naming the function for
predicting the probabilities per classes,
using the base classifiers trained with the |
pred.pars |
A list with additional parameters for the
|
N |
The number of classifiers used as committee members. All these classifiers
are trained using the |
perc.full |
A number between 0 and 1. If the percentage of new labeled examples reaches this value the self-labeling process is stopped. Default is 0.7. |
u |
Number of unlabeled instances in the pool. Default is 100. |
max.iter |
Maximum number of iterations to execute in the self-labeling process. Default is 50. |
This method trains an ensemble of diverse classifiers. To promote the initial diversity
the classifiers are trained from the reduced set of labeled examples by Bagging.
The stopping criterion is defined through the fulfillment of one of the following
criteria: the algorithm reaches the number of iterations defined in the max.iter
parameter or the portion of unlabeled set, defined in the perc.full
parameter,
is moved to the enlarged labeled set of the classifiers.
A list object of class "coBC" containing:
The final N
base classifiers trained using the enlarged labeled set.
List of N
vectors of indexes related to the training instances
used per each classifier. These indexes are relative to the y
argument.
The indexes of all training instances used to
train the N
models. These indexes include the initial labeled instances
and the newly labeled instances. These indexes are relative to the y
argument.
List of three vectors with the same information in model.index
but the indexes are relative to instances.index
vector.
The levels of y
factor.
The function provided in the pred
argument.
The list provided in the pred.pars
argument.
The value provided in the x.inst
argument.
Avrim Blum and Tom Mitchell.
Combining labeled and unlabeled data with co-training.
In Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, pages 92-100, New York, NY, USA, 1998. ACM.
ISBN 1-58113-057-0. doi: 10.1145/279943.279962.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | library(ssc)
## Load Wine data set
data(wine)
cls <- which(colnames(wine) == "Wine")
x <- wine[, -cls] # instances without classes
y <- wine[, cls] # the classes
x <- scale(x) # scale the attributes
## Prepare data
set.seed(20)
# Use 50% of instances for training
tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))
xtrain <- x[tra.idx,] # training instances
ytrain <- y[tra.idx] # classes of training instances
# Use 70% of train instances as unlabeled set
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7))
ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances
# Use the other 50% of instances for inductive testing
tst.idx <- setdiff(1:length(y), tra.idx)
xitest <- x[tst.idx,] # testing instances
yitest <- y[tst.idx] # classes of testing instances
## Example: Training from a set of instances with 1-NN as base classifier.
set.seed(1)
m1 <- coBC(x = xtrain, y = ytrain,
learner = caret::knn3,
learner.pars = list(k = 1),
pred = "predict")
pred1 <- predict(m1, xitest)
table(pred1, yitest)
## Example: Training from a distance matrix with 1-NN as base classifier.
dtrain <- proxy::dist(x = xtrain, method = "euclidean", by_rows = TRUE)
set.seed(1)
m2 <- coBC(x = dtrain, y = ytrain, x.inst = FALSE,
learner = ssc::oneNN,
pred = "predict",
pred.pars = list(distance.weighting = "none"))
ditest <- proxy::dist(x = xitest, y = xtrain[m2$instances.index,],
method = "euclidean", by_rows = TRUE)
pred2 <- predict(m2, ditest)
table(pred2, yitest)
## Example: Training from a set of instances with SVM as base classifier.
learner <- e1071::svm
learner.pars <- list(type = "C-classification", kernel="radial",
probability = TRUE, scale = TRUE)
pred <- function(m, x){
r <- predict(m, x, probability = TRUE)
prob <- attr(r, "probabilities")
prob
}
set.seed(1)
m3 <- coBC(x = xtrain, y = ytrain,
learner = learner,
learner.pars = learner.pars,
pred = pred)
pred3 <- predict(m3, xitest)
table(pred3, yitest)
## Example: Training from a set of instances with Naive-Bayes as base classifier.
set.seed(1)
m4 <- coBC(x = xtrain, y = ytrain,
learner = function(x, y) e1071::naiveBayes(x, y),
pred = "predict",
pred.pars = list(type = "raw"))
pred4 <- predict(m4, xitest)
table(pred4, yitest)
## Example: Training from a set of instances with C5.0 as base classifier.
set.seed(1)
m5 <- coBC(x = xtrain, y = ytrain,
learner = C50::C5.0,
pred = "predict",
pred.pars = list(type = "prob"))
pred5 <- predict(m5, xitest)
table(pred5, yitest)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.