Description Usage Arguments Value See Also Examples
If there are certain (typically very short) documents that occur frequently in your data, and you wish to remove them from the data before you fit the LDA model, this function can be used to flag those documents. It's a trivial operation, but it's a useful reminder that users should visually inspect their data before running LDA (so as to throw out documents that don't require topic modeling in the first place).
1 | flag.exact(data, exact, verbose = FALSE, quiet = FALSE)
|
data |
a character vector containing the raw corpus. Each element should correspond to a 'document'. |
exact |
a character vector in which each element is a string, phrase, or longer snippet of text that you wish to discard, if the element matches the entire content of a document. |
verbose |
logical. Track the categories of exact
matches. For instance, if a document exactly matches the
third element of |
quiet |
logical. Should a summary of the preprocessing steps be printed to the screen? |
category an integer vector of the same length as
data
, where, if verbose=TRUE, 0 indicates that the
document did not match any of the strings in exact
,
and an integer j = 1, ..., K indicates that a document was
an exact match to the jth element of exact
, and if
verbose=FALSE, an indicator vector of whether the document
exactly matched any of the elements of exact
(without indicating which element it matched).
flag.partial
1 2 3 4 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.