CreateTextFeatures: Creates text-based features from a radiology report corpus

Description Usage Arguments Value Examples

View source: R/CreateTextFeatures.R

Description

This function creates N-gram features from a radiology report corpus, where N-gram

Usage

1
2
3
4
5
6
7
CreateTextFeatures(segmented.reports, id_col = "imageid",
  text.cols = c("body", "impression"),
  all.stop.words = setdiff(stopwords(), c("no", "not", "nor")),
  finding.dictionary = NULL, docfreq = "prop", min_doc_prop = 0,
  max_doc_prop = 1, termfreq = "count", min_term_freq = 1,
  max_term_freq = NULL, tf_type = "boolean", df_type = "unary",
  n_gram_length = 1)

Arguments

segmented.reports

Input data frame with

id_col

The ID column in segmented.reports, defaults to imageid

text.cols

Vector of findings text column names in segmented.reports, defaults to c("body","impression")

all.stop.words

List of stop words, defaults to English stopword list excluding negation

finding.dictionary

Dictionary object to map findings, defaults to NULL

docfreq

See quanteda::dfm_trim; One of "count", "inverse", "inversemax", "inverseprob", "unary"

min_doc_prop

minimum/maximum values of a feature's document frequency, below/above which features will be removed

termfreq

See quanteda::dfm_trim; One of "count", "prop", "propmax", "logcount", "boolean", "augmented", "logave"

min_term_freq

minimum/maximum values of feature frequencies across all documents, below/above which features will be removed

max_term_freq

Above

tf_type

See quanteda::dfm_weight; One of "count", "prop", "propmax", "logcount", "boolean", "augmented", "logave"

df_type

See quanteda::docfreq; One of "count", "inverse", "inversemax", "inverseprob", "unary"

n_gram_length

Unigram, bigram, or trigram features; defaults to 3 (trigrams)

Value

A document frequency matrix with each row as a unique report, each column is a feature, and the cells are the counts in the document.

Examples

1
CreateTextFeatures(segmented.reports)

wlktan/LireNLPSystem documentation built on May 27, 2019, 12:13 p.m.