Description Usage Arguments Value Examples
View source: R/CreateTextFeatures.R
This function creates N-gram features from a radiology report corpus, where N-gram
1 2 3 4 5 6 7 | CreateTextFeatures(segmented.reports, id_col = "imageid",
text.cols = c("body", "impression"),
all.stop.words = setdiff(stopwords(), c("no", "not", "nor")),
finding.dictionary = NULL, docfreq = "prop", min_doc_prop = 0,
max_doc_prop = 1, termfreq = "count", min_term_freq = 1,
max_term_freq = NULL, tf_type = "boolean", df_type = "unary",
n_gram_length = 1)
|
segmented.reports |
Input data frame with |
id_col |
The ID column in segmented.reports, defaults to imageid |
text.cols |
Vector of findings text column names in segmented.reports, defaults to c("body","impression") |
all.stop.words |
List of stop words, defaults to English stopword list excluding negation |
finding.dictionary |
Dictionary object to map findings, defaults to NULL |
docfreq |
See quanteda::dfm_trim; One of "count", "inverse", "inversemax", "inverseprob", "unary" |
min_doc_prop |
minimum/maximum values of a feature's document frequency, below/above which features will be removed |
termfreq |
See quanteda::dfm_trim; One of "count", "prop", "propmax", "logcount", "boolean", "augmented", "logave" |
min_term_freq |
minimum/maximum values of feature frequencies across all documents, below/above which features will be removed |
max_term_freq |
Above |
tf_type |
See quanteda::dfm_weight; One of "count", "prop", "propmax", "logcount", "boolean", "augmented", "logave" |
df_type |
See quanteda::docfreq; One of "count", "inverse", "inversemax", "inverseprob", "unary" |
n_gram_length |
Unigram, bigram, or trigram features; defaults to 3 (trigrams) |
A document frequency matrix with each row as a unique report, each column is a feature, and the cells are the counts in the document.
1 | CreateTextFeatures(segmented.reports)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.