View source: R/remove_boilerplate.R
Remove repetitive "boilerplate" text from documents to minimize noise in the STM analysis.
1 2 | remove_boilerplate(input_dir, ngram_dir, output_dir, rep_text_dir,
header_footer_dir, language = "en")
|
input_dir |
Directory containing text files to extract ngrams from. |
ngram_dir |
Directory in which to find ngrams. |
output_dir |
Directory in which to save texts with boilerplate removed. |
rep_text_dir |
Directory in which to save repetitive text for review. |
header_footer_dir |
Directory in which to save header and footer text for review. |
language |
Language in which documents are written. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.