clean_data: Structure Data

View source: R/functions_active_helper.R

clean_dataR Documentation

Structure Data

Description

Structures data to prepare for Active-EM implementation. Options to filter documents by chosen character strings, as well as to add index value for each document.

Usage

clean_data(
  docs,
  n_class,
  doc_name,
  index_name,
  labels_name = NULL,
  filters = NULL,
  add_index = T,
  add_filter = T,
  keep_labels = F
)

Arguments

docs

[matrix] Matrix of labeled and/or unlabeled documents.

n_class

[numeric] Number of classes to be considered.

doc_name

[string] Character string indicating the variable in 'docs' that denotes the text of the documents to be classified.

index_name

[character] Character string indicating the variable in 'docs' that denotes the index value of the document to be classified.

labels_name

[character] Character string indicating the variable in docs that denotes the already known labels of the documents. By default, value is set to NULL.

filters

[character] A vector of regular expressions used to filter out unwanted documents.

add_index

[logical] Boolean logical value indicating whether or not add an index in the restructuring process.

add_filter

[logical] Boolean logical value indicating whether or not to filter documents in the restructuring process.

keep_labels

[logical] Boolean logical value indicating whether or not to keep an existing column of labels in the dataset.

Value

[matrix] Structured matrix of labeled and unlabeled documents, updated with labels for the documents in 'toLabel'.


activetext/activeR documentation built on May 31, 2024, 10:21 a.m.