rm_diacretics: Remove diacretics from letters

View source: R/util.R

rm_diacreticsR Documentation

Remove diacretics from letters

Description

rm_diacretics replaces letters with diacretics (like "é") with letters without diacretics (like "e"). iconv(..., to = ASCII//TRANSLIT) might also work, but fails for me sometimes.

Usage

rm_diacretics(strings)

Arguments

strings

string containing names, seperated by spaces or periods (or both). Vectorised.

Details

Removing diacretics might be useful if they are being used inconsistently in the data being linked. Differences in diacretics count in the calculation of string distances.

The list of diacretics is currently far from complete, only what I encountered in the baptism and marriage records I looked at.

Value

Initials in the form JF, so no spaces and no periods.

Examples

rm_diacretics(strings = "éå")
iconv("éå", "UTF-8", "ASCII//TRANSLIT") # bit unpredicatble for me


rijpma/capelinker documentation built on Nov. 7, 2024, 3:06 a.m.