View source: R/functions_active_helper.R
clean_data | R Documentation |
Structures data to prepare for Active-EM implementation. Options to filter documents by chosen character strings, as well as to add index value for each document.
clean_data(
docs,
n_class,
doc_name,
index_name,
labels_name = NULL,
filters = NULL,
add_index = T,
add_filter = T,
keep_labels = F
)
docs |
[matrix] Matrix of labeled and/or unlabeled documents. |
n_class |
[numeric] Number of classes to be considered. |
doc_name |
[string] Character string indicating the variable in 'docs' that denotes the text of the documents to be classified. |
index_name |
[character] Character string indicating the variable in 'docs' that denotes the index value of the document to be classified. |
labels_name |
[character] Character string indicating the variable in |
filters |
[character] A vector of regular expressions used to filter out unwanted documents. |
add_index |
[logical] Boolean logical value indicating whether or not add an index in the restructuring process. |
add_filter |
[logical] Boolean logical value indicating whether or not to filter documents in the restructuring process. |
keep_labels |
[logical] Boolean logical value indicating whether or not to keep an existing column of labels in the dataset. |
[matrix] Structured matrix of labeled and unlabeled documents, updated with labels for the documents in 'toLabel'.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.