Description Usage Arguments Details Value See Also Examples

View source: R/LearningCurve.R

Evaluate semi-supervised classifiers for different amounts of unlabeled training examples or different fractions of unlabeled vs. labeled examples.

1 2 3 4 5 6 7 8 9 | ```
LearningCurveSSL(X, y, ...)
## S3 method for class 'matrix'
LearningCurveSSL(X, y, classifiers, measures = list(Accuracy
= measure_accuracy), type = "unlabeled", n_l = NULL,
with_replacement = FALSE, sizes = 2^(1:8), n_test = 1000,
repeats = 100, verbose = FALSE, n_min = 1, dataset_name = NULL,
test_fraction = NULL, fracs = seq(0.1, 0.9, 0.1), time = TRUE,
pre_scale = FALSE, pre_pca = FALSE, low_level_cores = 1, ...)
``` |

`X` |
design matrix |

`y` |
vector of labels |

`...` |
arguments passed to underlying function |

`classifiers` |
list; Classifiers to crossvalidate |

`measures` |
named list of functions giving the measures to be used |

`type` |
Type of learning curve, either "unlabeled" or "fraction" |

`n_l` |
Number of labeled objects to be used in the experiments (see details) |

`with_replacement` |
Indicated whether the subsampling is done with replacement or not (default: FALSE) |

`sizes` |
vector with number of unlabeled objects for which to evaluate performance |

`n_test` |
Number of test points if with_replacement is TRUE |

`repeats` |
Number of learning curves to draw |

`verbose` |
Print progressbar during execution (default: FALSE) |

`n_min` |
Minimum number of labeled objects per class in |

`dataset_name` |
character; Name of the dataset |

`test_fraction` |
numeric; If not NULL a fraction of the object will be left out to serve as the test set |

`fracs` |
list; fractions of labeled data to use |

`time` |
logical; Whether execution time should be saved. |

`pre_scale` |
logical; Whether the features should be scaled before the dataset is used |

`pre_pca` |
logical; Whether the features should be preprocessed using a PCA step |

`low_level_cores` |
integer; Number of cores to use compute repeats of the learning curve |

`classifiers`

is a named list of classifiers, where each classifier should be a function that accepts 4 arguments: a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects.

`measures`

is a named list of performance measures. These are functions that accept seven arguments: a trained classifier, a numeric design matrix of the labeled objects, a factor of labels, a numeric design matrix of unlabeled objects and a factor of labels for the unlabeled objects, a numeric design matrix of the test objects and a factor of labels of the test objects. See `measure_accuracy`

for an example.

This function allows for two different types of learning curves to be generated. If `type="unlabeled"`

, the number of labeled objects remains fixed at the value of `n_l`

, where `sizes`

controls the number of unlabeled objects. `n_test`

controls the number of objects used for the test set, while all remaining objects are used if `with_replacement=FALSE`

in which case objects are drawn without replacement from the input dataset. We make sure each class is represented by at least `n_min`

labeled objects of each class. For `n_l`

, additional options include: "enough" which takes the max of the number of features and 20, max(ncol(X)+5,20), "d" which takes the number of features or "2d" which takes 2 times the number of features.

If `type="fraction"`

the total number of objects remains fixed, while the fraction of labeled objects is changed. `frac`

sets the fractions of labeled objects that should be considered, while `test_fraction`

determines the fraction of the total number of objects left out to serve as the test set.

LearningCurve object

Other RSSL utilities:
`SSLDataFrameToMatrices()`

,
`add_missinglabels_mar()`

,
`df_to_matrices()`

,
`measure_accuracy()`

,
`missing_labels()`

,
`split_dataset_ssl()`

,
`split_random()`

,
`true_labels()`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ```
set.seed(1)
df <- generate2ClassGaussian(2000,d=2,var=0.6)
classifiers <- list("LS"=function(X,y,X_u,y_u) {
LeastSquaresClassifier(X,y,lambda=0)},
"Self"=function(X,y,X_u,y_u) {
SelfLearning(X,y,X_u,LeastSquaresClassifier)}
)
measures <- list("Accuracy" = measure_accuracy,
"Loss Test" = measure_losstest,
"Loss labeled" = measure_losslab,
"Loss Lab+Unlab" = measure_losstrain
)
# These take a couple of seconds to run
## Not run:
# Increase the number of unlabeled objects
lc1 <- LearningCurveSSL(as.matrix(df[,1:2]),df$Class,
classifiers=classifiers,
measures=measures, n_test=1800,
n_l=10,repeats=3)
plot(lc1)
# Increase the fraction of labeled objects, example with 2 datasets
lc2 <- LearningCurveSSL(X=list("Dataset 1"=as.matrix(df[,1:2]),
"Dataset 2"=as.matrix(df[,1:2])),
y=list("Dataset 1"=df$Class,
"Dataset 2"=df$Class),
classifiers=classifiers,
measures=measures,
type = "fraction",repeats=3,
test_fraction=0.9)
plot(lc2)
## End(Not run)
``` |

```
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.