Uncleaned text may result in errors, warnings, and incorrect results in
check_text checks text for potential problems
and suggests possible fixes. Potential text anomalies that are detected
include: factors, missing ending punctuation, empty cells, double punctuation,
non-space after comma, no alphabetic characters, non-ascii, missing value,
and potentially misspelled words.
The text variable.
A connection, or a character string naming the file to print to.
Returns a list with the following potential text faults reports:
non_character- Text that is non-character.
missing_ending_punctuation- Text with no endmark at the end of the string.
empty- Text that contains an empty element (i.e.,
double_punctuation- Text that contains two qdap punctuation marks in the same string.
non_space_after_comma- Text that contains commas with no space after them.
no_alpha- Text that contains string elements with no alphabetic characters.
non_ascii- Text that contains non-ASCII characters.
missing_value- Text that contains missing values (i.e.,
containing_escaped- Text that contains escaped (see
containing_digits- Text that contains digits.
indicating_incomplete- Text that contains endmarks that are indicative of incomplete/trailing sentences (e.g.,
potentially_misspelled- Text that contains potentially misspelled words.
The output is a list but prints as a pretty formatted output with potential problem elements, the accompanying text, and possible suggestions to fix the text.
1 2 3 4 5 6 7 8 9 10 11
## Not run: x <- c("i like", "i want. thet them .", "I am ! that|", "", NA, "they,were there", ".", " ", "?", "3;", "I like goud eggs!", "i 4like...", "\\tgreat", "She said \"yes\"") check_text(x) print(check_text(x), include.text=FALSE) y <- c("A valid sentence.", "yet another!") check_text(y) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.