classifier_selection_keywords: Classify documents based on keywords.

View source: R/text_analysis.R

classifier_selection_keywordsR Documentation

Classify documents based on keywords.

Description

classifier_selection_keyword uses a classifier to select document from keywords. In 'select' mode the keywords are constructed from the archivesearchresults. In 'eval' mode the keywords are taken from the eval_options and a binary dfm is constructed from the document text. In 'eval_dfm' mode the keywords are taken from the dfm columns, and the eval_classify_var should be a docvar of the dfm.

Usage

classifier_selection_keywords(
  train,
  archivesearchresults,
  class_to_keep = 1,
  training_classify_var = "EV_article",
  prior = "docfreq",
  text_field = "ocr",
  classifier_type = "xgboost",
  mode = "select",
  eval_options = list(keywords = c("candidate", "poll", "election", "stone", "riot",
    "mob", "husting", "disturbance", "rough", "incident"), text_field = "ocr",
    eval_classify_var = "EV_article", eval_dfm_classifications = "foo")
)

Arguments

train

the training set of documents

classifier_type

The type of classifer to use ("nb" = naive bayes, "xgboost"=xgboost)

mode

Should the documents be selected ("select") or the document selection be evaluated from text field("eval"), or evaluated from a dfm ("eval_dfm") (evaluation assumes search results have been classified)


gidonc/durhamevp documentation built on April 8, 2022, 10:31 a.m.