get_dfm | R Documentation |
Builds document feature matrix using quanteda package.
get_dfm(
docs,
doc_name = "text",
index_name = "id",
stem = T,
ngrams = 1,
trimPct = 1e-04,
min_doc_freq = 2,
idfWeight = F,
removeStopWords = T,
minChar = 4
)
docs |
[matrix] Matrix of labeled and unlabeled documents. |
doc_name |
[character] Character string indicating the variable in 'docs' that denotes the text of the documents to be classified. |
index_name |
[character] Character string indicating the variable in 'docs' that denotes the index value of the document to be classified. |
stem |
[logical] Switch indicating whether or not to stem terms. |
ngrams |
[integer] Integer value indicating the size of the ngram to use to build the dfm. |
trimPct |
[numeric] Numeric value indicating the threshold of percentage of document
membership at which to remove terms from the data-term matrix.
E.g., if |
min_doc_freq |
[integer] Minimum number of documents a term must be in to stay in the document term matrix. |
idfWeight |
[logical] Switch indicating whether to weight the document term matrix by the frequency of
word counts. Only works if |
[matrix] Document term matrix.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.