LearningCurveSSL: Compute Semi-Supervised Learning Curve

View source: R/LearningCurve.R

LearningCurveSSLR Documentation

Compute Semi-Supervised Learning Curve

Description

Evaluate semi-supervised classifiers for different amounts of unlabeled training examples or different fractions of unlabeled vs. labeled examples.

Usage

LearningCurveSSL(X, y, ...)

## S3 method for class 'matrix'
LearningCurveSSL(X, y, classifiers, measures = list(Accuracy
  = measure_accuracy), type = "unlabeled", n_l = NULL,
  with_replacement = FALSE, sizes = 2^(1:8), n_test = 1000,
  repeats = 100, verbose = FALSE, n_min = 1, dataset_name = NULL,
  test_fraction = NULL, fracs = seq(0.1, 0.9, 0.1), time = TRUE,
  pre_scale = FALSE, pre_pca = FALSE, low_level_cores = 1, ...)

Arguments

X

design matrix

y

vector of labels

...

arguments passed to underlying function

classifiers

list; Classifiers to crossvalidate

measures

named list of functions giving the measures to be used

type

Type of learning curve, either "unlabeled" or "fraction"

n_l

Number of labeled objects to be used in the experiments (see details)

with_replacement

Indicated whether the subsampling is done with replacement or not (default: FALSE)

sizes

vector with number of unlabeled objects for which to evaluate performance

n_test

Number of test points if with_replacement is TRUE

repeats

Number of learning curves to draw

verbose

Print progressbar during execution (default: FALSE)

n_min

Minimum number of labeled objects per class in

dataset_name

character; Name of the dataset

test_fraction

numeric; If not NULL a fraction of the object will be left out to serve as the test set

fracs

list; fractions of labeled data to use

time

logical; Whether execution time should be saved.

pre_scale

logical; Whether the features should be scaled before the dataset is used

pre_pca

logical; Whether the features should be preprocessed using a PCA step

low_level_cores

integer; Number of cores to use compute repeats of the learning curve

Details

classifiers is a named list of classifiers, where each classifier should be a function that accepts 4 arguments: a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects.

measures is a named list of performance measures. These are functions that accept seven arguments: a trained classifier, a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects, a numeric design matrix of the test objects and a factor of labels of the test objects. See measure_accuracy for an example.

This function allows for two different types of learning curves to be generated. If type="unlabeled", the number of labeled objects remains fixed at the value of n_l, where sizes controls the number of unlabeled objects. n_test controls the number of objects used for the test set, while all remaining objects are used if with_replacement=FALSE in which case objects are drawn without replacement from the input dataset. We make sure each class is represented by at least n_min labeled objects of each class. For n_l, additional options include: "enough" which takes the max of the number of features and 20, max(ncol(X)+5,20), "d" which takes the number of features or "2d" which takes 2 times the number of features.

If type="fraction" the total number of objects remains fixed, while the fraction of labeled objects is changed. frac sets the fractions of labeled objects that should be considered, while test_fraction determines the fraction of the total number of objects left out to serve as the test set.

Value

LearningCurve object

See Also

Other RSSL utilities: SSLDataFrameToMatrices(), add_missinglabels_mar(), df_to_matrices(), measure_accuracy(), missing_labels(), split_dataset_ssl(), split_random(), true_labels()

Examples

set.seed(1)
df <- generate2ClassGaussian(2000,d=2,var=0.6)

classifiers <- list("LS"=function(X,y,X_u,y_u) {
 LeastSquaresClassifier(X,y,lambda=0)}, 
  "Self"=function(X,y,X_u,y_u) {
    SelfLearning(X,y,X_u,LeastSquaresClassifier)}
)

measures <- list("Accuracy" =  measure_accuracy,
                 "Loss Test" = measure_losstest,
                 "Loss labeled" = measure_losslab,
                 "Loss Lab+Unlab" = measure_losstrain
)

# These take a couple of seconds to run
## Not run: 
# Increase the number of unlabeled objects
lc1 <- LearningCurveSSL(as.matrix(df[,1:2]),df$Class,
                        classifiers=classifiers,
                        measures=measures, n_test=1800,
                        n_l=10,repeats=3)

plot(lc1)

# Increase the fraction of labeled objects, example with 2 datasets
lc2 <- LearningCurveSSL(X=list("Dataset 1"=as.matrix(df[,1:2]),
                               "Dataset 2"=as.matrix(df[,1:2])),
                        y=list("Dataset 1"=df$Class,
                               "Dataset 2"=df$Class),
                        classifiers=classifiers,
                        measures=measures,
                        type = "fraction",repeats=3,
                        test_fraction=0.9)

plot(lc2)

## End(Not run)

RSSL documentation built on March 31, 2023, 7:27 p.m.