get_uncertain_docs: Get Uncertain Documents
In activetext/activeR: a semi-supervised active learning algorithm for text classification.

get_uncertain_docs

R Documentation

Get Uncertain Documents

Description

Get documents that the previous iteration of the EM algorithm is least sure about.

Usage

get_uncertain_docs(
  docs,
  bound,
  max_query,
  index_name,
  hand_labeled_index,
  force_list = F,
  query_type = "basic_entropy",
  quantileBreaks = c(75, 20),
  sampleProps = c(0.5, 0.3, 0.2),
  mu = 0.001,
  tau = 0.001,
  regions = "both",
  dfm = NULL,
  seed = NULL,
  n_cluster = NULL
)

Arguments

`docs`	[matrix] Matrix of labeled and unlabeled documents.
`bound`	[numeric] The choice of lower bound for entropy-based uncertainty selection.
`max_query`	[numeric] Maxmium number of uncertain documents that can be queried.
`index_name`	[character] Character string indicating the variable in 'docs' that denotes the index value of the documents .
`hand_labeled_index`	[vector] Vector of index values for hand labeled documents in `docs`.
`force_list`	[logical] Switch indicating whether to force the filtering of documents with no entropy. Set to `FALSE` by default.
`query_type`	[string] String indicating which type of uncertainty sampling to use. Options are `"standard_entropy"`, `"normalized_entropy"`, `"tiered_entropy"`, or `"tiered_entropy_weighted"`.
`quantileBreaks`	[vector] Vector of break points to distinguish entropy zones. The first value is the break point between the first and second tier, the second is the break point between the second and third tier.
`sampleProps`	[vector] Vector of sampling proportions for each entropy zone. The first value is the proportion of `max_query` to be sampled from the high entropy region, the second value is the proportion to be sampled from the middle entropy region, and the third value is the proportion to be sampled from the lowest entropy region.
`n_cluster`	[int] Number of clusters.

Value

[vector] Vector of id values of documents that the EM algorithm is uncertain about.

activetext/activeR documentation built on May 31, 2024, 10:21 a.m.

activetext/activeR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

activetext/activeR
a semi-supervised active learning algorithm for text classification.

get_uncertain_docs: Get Uncertain Documents
In activetext/activeR: a semi-supervised active learning algorithm for text classification.

Get Uncertain Documents

Description

Usage

Arguments

Value

Related to get_uncertain_docs in activetext/activeR...

R Package Documentation

Browse R Packages

We want your feedback!

activetext/activeR a semi-supervised active learning algorithm for text classification.

get_uncertain_docs: Get Uncertain Documents In activetext/activeR: a semi-supervised active learning algorithm for text classification.

Get Uncertain Documents

Description

Usage

Arguments

Value

Related to get_uncertain_docs in activetext/activeR...

R Package Documentation

Browse R Packages

We want your feedback!

activetext/activeR
a semi-supervised active learning algorithm for text classification.

get_uncertain_docs: Get Uncertain Documents
In activetext/activeR: a semi-supervised active learning algorithm for text classification.