string_std: String standardization prior to matching
In epicentre-msf/hmatch: Tools for Cleaning and Matching Hierarchically-Structured Data

string_std

R Documentation

String standardization prior to matching

Description

Standardizes strings prior to performing a match, using the following transformations:

standardize case (base::tolower)
remove sequences of non-alphanumeric characters at start or end of string
replace remaining sequences of non-alphanumeric characters with "_"
remove diacritics (stringi::stri_trans_general)
(optional) convert roman numerals (I, II, ..., XLIX) to arabic (1, 2, ..., 49)

Usage

string_std(x, convert_roman = FALSE)

Arguments

`x`	a string
`convert_roman`	logical indiciating whether to convert roman numerals (I, II, ..., XLIX) to arabic (1, 2, ..., 49)

Value

The standardized version of x

Examples

string_std("United STATES")
string_std("R\u00e9publique  d\u00e9mocratique du  Congo")

# convert roman numerals to arabic
string_std("Mungindu-II (Sud)")
string_std("Mungindu-II (Sud)", convert_roman = TRUE)

# note the conversion only works if the numeral is separated from other
# alphanumeric characters by punctuation or space characters
string_std("MunginduII", convert_roman = TRUE) # roman numeral not recognized

epicentre-msf/hmatch documentation built on Nov. 15, 2023, 1:47 a.m.

epicentre-msf/hmatch index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

epicentre-msf/hmatch
Tools for Cleaning and Matching Hierarchically-Structured Data

string_std: String standardization prior to matching
In epicentre-msf/hmatch: Tools for Cleaning and Matching Hierarchically-Structured Data

String standardization prior to matching

Description

Usage

Arguments

Value

See Also

Examples

Related to string_std in epicentre-msf/hmatch...

R Package Documentation

Browse R Packages

We want your feedback!

epicentre-msf/hmatch Tools for Cleaning and Matching Hierarchically-Structured Data

string_std: String standardization prior to matching In epicentre-msf/hmatch: Tools for Cleaning and Matching Hierarchically-Structured Data

String standardization prior to matching

Description

Usage

Arguments

Value

See Also

Examples

Related to string_std in epicentre-msf/hmatch...

R Package Documentation

Browse R Packages

We want your feedback!

epicentre-msf/hmatch
Tools for Cleaning and Matching Hierarchically-Structured Data

string_std: String standardization prior to matching
In epicentre-msf/hmatch: Tools for Cleaning and Matching Hierarchically-Structured Data