Description Usage Arguments Value Note
check_text
- Uncleaned text may result in errors, warnings, and
incorrect results in subsequent analysis. check_text
checks text for
potential problems and suggests possible fixes. Potential text anomalies
that are detected include: factors, missing ending punctuation, empty cells,
double punctuation, non-space after comma, no alphabetic characters,
non-ASCII, missing value, and potentially misspelled words.
available_check
- Provide a data.frame view of all the available
checks in the check_text
function.
1 2 3 | check_text(x, file = NULL, checks = NULL, n = 10, ...)
available_checks()
|
x |
The text variable. |
file |
A connection, or a character string naming the file to print to.
If |
checks |
A vector of checks to include from |
n |
The number of affected elements to print out (the rest are truncated). |
... |
ignored. |
Returns a list with the following potential text faults report:
contraction- Text elements that contain contractions
date- Text elements that contain dates
digit- Text elements that contain digits/numbers
email- Text elements that contain email addresses
emoticon- Text elements that contain emoticons
empty- Text elements that contain empty text cells (all white space)
escaped- Text elements that contain escaped back spaced characters
hash- Text elements that contain Twitter style hash tags (e.g., #rstats)
html- Text elements that contain HTML markup
incomplete- Text elements that contain incomplete sentences (e.g., uses ending punctuation like ...)
kern- Text elements that contain kerning (e.g., 'The B O M B!')
list_column- Text variable that is a list column
missing_value- Text elements that contain missing values
misspelled- Text elements that contain potentially misspelled words
no_alpha- Text elements that contain elements with no alphabetic (a-z) letters
no_endmark- Text elements that contain elements with missing ending punctuation
no_space_after_comma- Text elements that contain commas with no space afterwards
non_ascii- Text elements that contain non-ASCII text
non_character- Text variable that is not a character column (likely factor
)
non_split_sentence- Text elements that contain unsplit sentences (more than one sentence per element)
tag- Text elements that contain Twitter style handle tags (e.g., @trinker)
time- Text elements that contain timestamps
url- Text elements that contain URLs
The output is a list containing meta checks and elemental checks but prints as a pretty formatted output with potential problem elements, the accompanying text, and possible suggestions to fix the text.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.