stri_process | R Documentation |
A wrapper function for various preprocessing options for strings
stri_process(x, force_encoding = "UTF-8", alltolower = FALSE, erase_patterns = NULL, token_exclude_length = NULL, rm_diacritics = FALSE, replace_dashes_hyphens_by = NULL, rm_roman_numeral_listing = FALSE, replace_by_blank_regex = NULL, erase_regex = NULL, harmonize_blanks = FALSE)
x |
A |
force_encoding |
The encdoding to be forced on the string. |
alltolower |
Turn all letters to lower case. |
erase_patterns |
Fixed non-regex patterns to be erased from text as is.
The |
token_exclude_length |
Remove tokens that have specified number of characters or less, enclosed by word boundaries. |
rm_diacritics |
Turn diacritics into their ASCII pendnant. |
replace_dashes_hyphens_by |
Various forms of dashes and hyphens, e.g., long dash, dash, hyphen, etc., defined in Unicode table are replaced by the sepcified fixed pattern. |
rm_roman_numeral_listing |
Erase all brackets and their content if bracket includes a combination of i,v, and x. There are also higher number that require M and C, however, functions aims at listing of lower numbers usually used in reports. More sophisticated regex replacements possible with below parameter. |
replace_by_blank_regex |
A regex pattern to be replaced by a blank. Use "|" to replace more than one pattern. |
erase_regex |
A regex pattern to be replaced by nothing, i.e., "". Use "|" to replace more than one pattern. |
harmonize_blanks |
Remove blanks at the begining and end of a string and collapses sequences of multiple blanks into one. |
The processed string.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.