sslCoTrain: Co-Training
In SSL: Semi-Supervised Learning

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/SSL.R

Co-Training

1 2	sslCoTrain(xl, yl, xu, method1 = "nb", method2 = "nb", nrounds1, nrounds2, portion = 0.5, n = 10, seed = 0, ...)

`xl`	a n * p matrix or data.frame of labeled data
`yl`	a n * 1 integer vector of labels.
`xu`	a m * p matrix or data.frame of unlabeled data
`method1, method2`	a string which specifies the first and second classification model to use.`xgb` means extreme gradient boosting,please refer to `xgb.train`.For other options,see more in `train`.
`nrounds1, nrounds2`	parameter needed when `method1` or `method2` =`xgb`. See more in `xgb.train`
`portion`	the percentage of data to split into two parts.
`n`	the number of unlabeled examples to add into label data in each iteration.
`seed`	an integer specifying random number generation state for data split
`...`	other parameters

sslCoTrain divides labeled data into two parts ,each part is trained with a classifier, then it chooses some unlabeled examples for prediction and adds them into labeled data. These new labeled data help the other classifer improve performance.

a m * 1 integer vector representing the predictions of unlabeled data.

Junxiang Wang

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory.

train xgb.train

data(iris)
xl<-iris[,1:4]
#Suppose we know the first twenty observations of each class
#and we want to predict the remaining with co-training
# 1 setosa, 2 versicolor, 3 virginica
yl<-rep(1:3,each=20)
known.label <-c(1:20,51:70,101:120)
xu<-xl[-known.label,]
xl<-xl[known.label,]
yu<-sslCoTrain(xl,yl,xu,method1="xgb",nrounds1 = 100,method2="xgb",nrounds2 = 100,n=60)