Description Usage Arguments Details Value Examples
View source: R/uncertainty-sampling.r
The 'uncertainty sampling' approach to active learning determines the unlabeled observation which the user-specified supervised classifier is "least certain." The "least certain" observation should then be queried by the oracle in the "active learning" framework.
1 2 | uncertainty_sampling(x, y, uncertainty = "entropy", classifier,
num_query = 1, ...)
|
x |
a matrix containing the labeled and unlabeled data |
y |
a vector of the labels for each observation in x. Use NA for unlabeled. |
uncertainty |
a string that contains the uncertainty measure. See above for details. |
classifier |
a string that contains the supervised classifier as given
in the |
num_query |
the number of observations to be queried. |
... |
additional arguments that are sent to the |
The least certainty term is quite general, but we have implemented three of the most widely used methods:
query the unlabeled observation maximizing posterior probabilities of each class under the trained classifier
query the unlabeled observation with the least posterior probability under the trained classifier
query the unlabeled observation that minimizes the difference in the largest two posterior probabilities under the trained classifier
The uncertainty
argument must be one of the three: entropy
is
the default. Note that the three methods are equivalent (they yield the same
observation to be queried) with binary classification.
We require a user-specified supervised classifier from the
caret
R package. Furthermore, we assume that the classifier
returns posterior probabilities of class membership; otherwise, an error is
thrown. To obtain a list of valid classifiers, see the caret
vignettes, which are available on CRAN. Also, see the
modelLookup
.
Additional arguments to the specified caret
classifier can be
passed via ...
.
Unlabeled observations in y
are assumed to have NA
for a label.
It is often convenient to query unlabeled observations in batch. By default,
we query the unlabeled observations with the largest uncertainty measure
value. With the num_query
the user can specify the number of
observations to return in batch. If there are ties in the uncertainty
measure values, they are broken by the order in which the unlabeled
observations are given.
a list that contains the least_certain observation and miscellaneous results. See above for details.
1 2 3 4 5 6 7 8 9 | x <- iris[, -5]
y <- iris[, 5]
# For demonstration, suppose that few observations are labeled in 'y'.
y <- replace(y, -c(1:10, 51:60, 101:110), NA)
uncertainty_sampling(x=x, y=y, classifier="lda")
uncertainty_sampling(x=x, y=y, uncertainty="entropy",
classifier="qda", num_query=5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.