Description Usage Arguments Details Value Examples
View source: R/query-committee.r
The 'query by committee' approach to active learning uitilizes a committee of
C
classifiers that are each trained on the labeled training data. Our
goal is to "query the oracle" with the observations that have the maximum
disagreement among the C
trained classifiers.
1 2 3 | query_committee(x, y, fit_committee, predict_committee,
disagreement = c("kullback", "vote_entropy", "post_entropy"),
num_query = 1, ...)
|
x |
a matrix containing the labeled and unlabeled data |
y |
a vector of the labels for each observation in |
disagreement |
a string that contains the disagreement measure among the committee members. See above for details. |
num_query |
the number of observations to be queried. |
committee |
a list containing the committee of classifiers. See details for the required format. |
Note that this approach is similar to "Query by Bagging" (QBB), but each
committee member is specified by the user. With the QBB approach, only one
supervised classifier is specified by the user, and each committee member is
trained on a resampled subset of the labeled training data. Also, note that
we have implemented QBB as query_by_bagging
.
To determine maximum disagreement among committee committe members, we have implemented three approaches:
query the unlabeled observation that maximizes the Kullback-Leibler divergence between the label distributions of any one committe member and the consensus
query the unlabeled observation that maximizes the vote entropy among all commitee members
query the unlabeled observation that maximizes the entropy of average posterior probabilities of all committee members
The disagreement
argument must be one of the three: kullback
is
the default.
To calculate the committee disagreement, we use the formulae from Dr. Burr Settles' excellent "Active Learning Literature Survey" available at http://burrsettles.com/pub/settles.activelearning.pdf.
We require a user-specified supervised classifier from the caret
R package. Furthermore, we assume that the classifier returns posterior
probabilities of class membership; otherwise, an error is thrown. To obtain a
list of valid classifiers, see the caret
vignettes, which are
available on CRAN. Also, see the modelLookup
function in the
caret
package.
To specify the committee members, we require a character vector of classifiers
in the committee
argument with elements corresponding to each
supervised classifier (each committee member). The classifiers must match the
naming in the caret
package.
Unlabeled observations in y
are assumed to have NA
for a label.
It is often convenient to query unlabeled observations in batch. By default,
we query the unlabeled observation with the largest disagreement measure
value. With the num_query
the user can specify the number of
observations to return in batch. If there are ties in the disagreement measure
values, they are broken by the order in which the unlabeled observations are
given.
a list that contains the least_certain observation and miscellaneous results. See above for details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | x <- iris[, -5]
y <- iris[, 5]
# For demonstration, suppose that few observations are labeled in 'y'.
y <- replace(y, -c(1:12, 51:62, 101:112), NA)
fit_committee <- list(
lda=function(x, y) { MASS::lda(x, y) },
qda=function(x, y) { MASS::qda(x, y) },
random_forest=function(x, y) { randomForest::randomForest(x, y, ntree=50, maxnodes=5) }
)
predict_committee <- list(
lda=function(object, x) { predict(object, x)$posterior },
qda=function(object, x) { predict(object, x)$posterior },
random_forest=function(object, x) { predict(object, x, type="prob") }
)
query_committee(x=x, y=y, fit_committee=fit_committee,
predict_committee=predict_committee, num_query=3)$query
query_committee(x=x, y=y, fit_committee=fit_committee,
predict_committee=predict_committee,
disagreement="post_entropy", num_query=3)$query
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.