string_utility: String processing utility functions.

Description Usage Arguments Details Value See Also Examples

Description

Short utility functions to clean certain characteristics of strings. These are combined in string_clean.

Usage

1
2
3
4
5

Arguments

string

A character vector.

Details

Replace any characters that do not belong to Regex classes \w or \d, or are a literal whitespace, by a single whitespace. The function preserves German Umlaute and diacritical letters.

Elaboration on the Regex classes: https://stackoverflow.com/a/2998550/13542638.

Replace German Umlaute by their ASCII representations: "ä"->"ae", "ö"->"oe", and "ü"->"ue". "ß" is diacritical and handled by .remove_diacritics.

Replace diacritical letters(é, ç, ...) with their "plain" versions. This function can only handle diacritical letters from latin-based alphabets. Elements in string containinig non-latin letters (e.g. cyrillic), will be replaced by NA and a warning will be given.

Reference: https://stackoverflow.com/a/20495866/13542638

Value

.remove_special_chars returns string with non-letter Unicode characters replaced by a whitespace.

.replace_umlaute returns string with any German Umlaute replaced.

.remove_diacritics returns string with diacritical letters replaced by their ASCII versions.

See Also

string_clean and string_redund_ws

Examples

1
2
3
thoremisc:::.remove_special_chars("This will be modified: hello-world.")
thoremisc:::.replace_umlaute("Äh, trörö in Überlingen, nicht auf dem Darß.")
thoremisc:::.remove_diacritics("Åll thëşé fūñny leŧters wîll be nørmalised.")

thorepet/thoremisc documentation built on Oct. 8, 2021, 7:48 a.m.