get_uncertain_docs | R Documentation |
Get documents that the previous iteration of the EM algorithm is least sure about.
get_uncertain_docs(
docs,
bound,
max_query,
index_name,
hand_labeled_index,
force_list = F,
query_type = "basic_entropy",
quantileBreaks = c(75, 20),
sampleProps = c(0.5, 0.3, 0.2),
mu = 0.001,
tau = 0.001,
regions = "both",
dfm = NULL,
seed = NULL,
n_cluster = NULL
)
docs |
[matrix] Matrix of labeled and unlabeled documents. |
bound |
[numeric] The choice of lower bound for entropy-based uncertainty selection. |
max_query |
[numeric] Maxmium number of uncertain documents that can be queried. |
index_name |
[character] Character string indicating the variable in 'docs' that denotes the index value of the documents . |
hand_labeled_index |
[vector] Vector of index values for hand labeled documents in |
force_list |
[logical] Switch indicating whether to force the filtering of documents with
no entropy. Set to |
query_type |
[string] String indicating which type of uncertainty sampling to use. Options are |
quantileBreaks |
[vector] Vector of break points to distinguish entropy zones. The first value is the break point between the first and second tier, the second is the break point between the second and third tier. |
sampleProps |
[vector] Vector of sampling proportions for each entropy zone. The first value is
the proportion of |
n_cluster |
[int] Number of clusters. |
[vector] Vector of id values of documents that the EM algorithm is uncertain about.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.