Description Usage Arguments Value Note Examples
Removes sentences from a corpus or a character vector shorter than a specified length.
1 2 3 4 5 6 7 8 9 10 11 12 13 14  | corpus_trimsentences(
  x,
  min_length = 1,
  max_length = 10000,
  exclude_pattern = NULL,
  return_tokens = FALSE
)
char_trimsentences(
  x,
  min_length = 1,
  max_length = 10000,
  exclude_pattern = NULL
)
 | 
x | 
 corpus or character object whose sentences will be selected.  | 
min_length, max_length | 
 minimum and maximum lengths in word tokens (excluding punctuation)  | 
exclude_pattern | 
 a stringi regular expression whose match (at the sentence level) will be used to exclude sentences  | 
return_tokens | 
 if   | 
a corpus or character vector equal in length to the input, or
a tokenized set of sentences if .  If the input was a corpus, then the all
docvars and metadata are preserved.  For documents whose sentences have
been removed entirely, a null string ("") will be returned.
This function has been superseded by corpus_trim(); use
that function instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  | txt <- c("PAGE 1. A single sentence.  Short sentence. Three word sentence.",
         "PAGE 2. Very short! Shorter.",
         "Very long sentence, with three parts, separated by commas.  PAGE 3.")
corp <- corpus(txt, docvars = data.frame(serial = 1:3))
texts(corp)
# exclude sentences shorter than 3 tokens
texts(corpus_trimsentences(corp, min_length = 3))
# exclude sentences that start with "PAGE <digit(s)>"
texts(corpus_trimsentences(corp, exclude_pattern = "^PAGE \\d+"))
# on a character
char_trimsentences(txt, min_length = 3)
char_trimsentences(txt, min_length = 3)
char_trimsentences(txt, exclude_pattern = "sentence\\.")
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.