classifier_select_docs: Subsets a dataframe of documents based on a classifier.

View source: R/text_analysis.R

classifier_select_docsR Documentation

Subsets a dataframe of documents based on a classifier.

Description

Subsets a dataframe of documents based on a classifier.

Usage

classifier_select_docs(
  classifier,
  new_docs,
  text_field = "description",
  return_logical = FALSE,
  logical_to_prob = FALSE,
  class_to_keep = 1,
  boolean = FALSE,
  xgb.cutpoint = 0.5,
  stem = FALSE,
  ...
)

Arguments

classifier

A classifier to perform the classification: either a naive bayes (quanteda) or xgboost (xgboost)

new_docs

A data frame or dfm containing the documents to classify

text_field

The field containing the text to classify

return_logical

Should the function return the subset of documents (FALSE) or a logical vector indicating the subset of document (TRUE).

logical_to_prob

return_logical == TRUE class probabilities can be returned instead of class categories (TRUE).

class_to_keep

The classifier class to keep

stem

stem words (in preprocessing)

...

other arguments to be passed to preprocess_corpus

Value

Either the subset of the docs_df which is classified as class_to_keep or a logical vector indicating this subset (depending on value of return_logical).


gidonc/durhamevp documentation built on April 8, 2022, 10:31 a.m.