standardize_strings: Standardize the character columns of a 'DataFrame' object.

Description Usage Arguments Value Examples

Description

Given a DataFrame, a set of columns and a list of replacements, will perform a standard pre-processing of strings: a) lowercasing, b) characters replacements, c) white spaces squishing (removes whitespace from start and end of string, also reduces repeated whitespace inside a string).

Usage

1
standardize_strings(df, columns, replacements = "ghbusiness")

Arguments

df

A DataFrame object.

columns

A character vector of the columns' names to perform the operation on.

replacements

The replacements to operate in the string.

A list of element-wise mapping of patterns to be replaced with their replacements. A the name of a pre-defined set of replacements, contained in the replacements_ environmental list object variable.

Value

A DataFrame object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
columns <- c("address", "trading_name")
standardize_strings(businesses, columns, replacements='ghbusiness')

columns <- c("address", "trading_name")
from <- c(',', '\\(.*\\)', '&')
to <- c('', '', ' and ')
standardize_strings(
  businesses,
  columns,
  replacements=list(from, to)
  )

xavier-gilbert/sabre documentation built on May 7, 2021, 12:40 p.m.