Description Usage Arguments Value See Also Examples
Often a data set contains documents that you wish to remove
before you fit the LDA model, and these documents share a
common "boilerplate" string or phrase (along with
potentially unique information). This function can be used
to flag those documents. Similar to the function
flag.exact
, this is a very simple operation that may
be more useful as a signal to the user that he or she
should visually inspect the data before running LDA (so as
to remove documents that don't require topic modeling in
the first place).
1 | flag.partial(data, partial, verbose, quiet = FALSE)
|
data |
a character vector containing the raw corpus. Each element should correspond to a 'document'. |
partial |
a character vector in which each element is a string, phrase, or longer snippet of text that you wish to discard, if the element matches a subset of a document. |
verbose |
logical. Track the categories of partial
matches. For instance, if a document partially matches
the third element of |
quiet |
logical. Should a summary of the preprocessing steps be printed to the screen? |
category an integer vector of the same length as
data
, where, if verbose=TRUE, 0 indicates that the
document did not match any of the strings in
partial
, and an integer j = 1, ..., K indicates that
a document was a partial match to the jth element of
partial
, and if verbose=FALSE, an indicator vector
of whether the document partially matched any of the
elements of partial
(without indicating which
element it matched).
flag.exact
1 2 3 4 5 6 7 | data <- c("Automatic Message: Account 12 ...",
"Automatic Message: Account 314 ...",
"A document with unknown content",
"Boilerplate text: Customer 1532 ...")
match.exact <- c("Automatic Message:", "Auto Text:", "Boilerplate text")
flag.partial(data, match.partial, verbose=FALSE, quiet=FALSE) # c(1, 1, 0, 1)
flag.partial(data, match.partial, verbose=TRUE, quiet=FALSE) # c(1, 1, 0, 3)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.