CreateTextFeatures: Creates text-based features from a radiology report corpus
In wlktan/LireNLPSystem: LireNLPSystem

Description Usage Arguments Value Examples

View source: R/CreateTextFeatures.R

This function creates N-gram features from a radiology report corpus, where N-gram

CreateTextFeatures(segmented.reports, id_col = "imageid",
  text.cols = c("body", "impression"),
  all.stop.words = setdiff(stopwords(), c("no", "not", "nor")),
  finding.dictionary = NULL, docfreq = "prop", min_doc_prop = 0,
  max_doc_prop = 1, termfreq = "count", min_term_freq = 1,
  max_term_freq = NULL, tf_type = "boolean", df_type = "unary",
  n_gram_length = 1)

`segmented.reports`	Input data frame with
`id_col`	The ID column in segmented.reports, defaults to imageid
`text.cols`	Vector of findings text column names in segmented.reports, defaults to c("body","impression")
`all.stop.words`	List of stop words, defaults to English stopword list excluding negation
`finding.dictionary`	Dictionary object to map findings, defaults to NULL
`docfreq`	See quanteda::dfm_trim; One of "count", "inverse", "inversemax", "inverseprob", "unary"
`min_doc_prop`	minimum/maximum values of a feature's document frequency, below/above which features will be removed
`termfreq`	See quanteda::dfm_trim; One of "count", "prop", "propmax", "logcount", "boolean", "augmented", "logave"
`min_term_freq`	minimum/maximum values of feature frequencies across all documents, below/above which features will be removed
`max_term_freq`	Above
`tf_type`	See quanteda::dfm_weight; One of "count", "prop", "propmax", "logcount", "boolean", "augmented", "logave"
`df_type`	See quanteda::docfreq; One of "count", "inverse", "inversemax", "inverseprob", "unary"
`n_gram_length`	Unigram, bigram, or trigram features; defaults to 3 (trigrams)