str_norm: String normalize-transform

View source: R/str_norm.R

str_normR Documentation

String normalize-transform

Description

Strips special chars, lowercases, optionally re-encode to ASCII

Usage

str_norm(x, lower = FALSE, ..., to_ASCII = TRUE)

Arguments

x

character vector, or something coercible to chr

lower

Lowercase the output? Defaults to FALSE

...

args to str_replace_all, i.e. pattern and replacement

to_ASCII

Logical. Should function first transform input to ASCII (this is often helpful for otherwise stubborn special characters)? Defaults to TRUE

Details

This is a convenience function designed to streamline e.g. fuzzy chr matching, particularly with scraped text. Therefore, some options are intentionally hard-coded, i.e. any whitespace repeats >1 are truncated to 1, and the output is ws-trimmed on both sides.

If no args are passed to ... for stringr::str_replace_all(), generic defaults are used. These defaults are meant to provide a potentially more useful output than just an error message, but this practice somewhat violates error handling paradigms by still trying to return something, which might not be expected As such, this behavior might change in future versions, and explicit arguments to str_replace_all (again, via ...) should always be provided.

Value

A character vector normalized according to input args and ws-normalized of length equal to x. See details for what ws-normalized means.

Examples

x <- "Corrosion Survey Database (COR•SUR)"
str_norm(x, "\\W", " ")
str_norm(x, "\\s", " ") #keep parentheses
str_norm(x, "\\W", " ", to_ASCII = FALSE) #iconv option not used
str_norm(x, "[A-Za-z]", " ", to_ASCII = FALSE) #inverse

str_norm(Sys.Date(), "\\W", " ")
str_norm(1:10, "\\d", "-")

## Not run: 
str_norm(x) #will try to use default pattern and replacement. Read the error message!

## End(Not run)

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.