remove.nonASCII: Remove non ASCII characters

Description Usage Arguments Value

Description

A function to remove non ASCII characters from a vector. ASCII has 128 characters (2^7 combinations), which covers pretty much everything we use regularly in the Western alphabet. UTF came along later and has millions of possible values - which is very useful for Arabic, Cryllic and other alphabets, as well as diacritical marks and emojis. However, whilst UTF-8 is backwards compatible with ASCII it can play merry havoc with your text processing. I've found that in many cases it is just easier to remove these characters if they only constitute a small proportion of your text. Et voila, this function was made.

Usage

1
remove.nonASCII(text.clean)

Arguments

text.clean

vector containing one or more strings (i.e. length is equal to or greater than 1)

Value

A list, where the first item is a vector with all non ASCII characters removed, and the second item is a vector of indexes showing which entries contained non ASCII characters


bvidgen/tc documentation built on May 9, 2019, 2:21 a.m.