pm_parse: Parse Street Addresses
In slu-openGIS/postmastr: Tidy Tools for Standardizing and Parsing Street Addresses

Description Usage Arguments Details Value Examples

A wrapper around the parse functions that can be used to shorten all of postmastr's core code down to a single function call once dictionaries have been created and tested against the data.

pm_parse(.data, input, address, output, new_address, ordinal = TRUE,
    operator = "at", unnest = FALSE, include_commas = FALSE, include_units = TRUE,
    keep_parsed = "no", side = "right", left_vars, keep_ids = FALSE, houseSuf_dict,
    dir_dict, street_dict, suffix_dict, unit_dict, city_dict, state_dict,
    locale = "us")

`.data`	A source data set to be parsed
`input`	Describes the format of the source address. One of either `"full"` or `"short"`. A short address contains, at the most, a house number, street directionals, a street name, a street suffix, and a unit type and number. A full address contains all of the selements of a short address as well as a, at the most, a city, state, and postal code.
`address`	A character variable containing address data to be parsed
`output`	Describes the format of the output address. One of either `"full"` or `"short"`. A short address contains, at the most, a house number, street directionals, a street name, a street suffix, and a unit type and number. A full address contains all of the selements of a short address as well as a, at the most, a city, state, and postal code.
`new_address`	Name of new variable to store rebuilt address in.
`ordinal`	A logical scalar; if `TRUE`, street names that contain numeric words values (i.e. "Second") will be converted and standardized to ordinal values (i.e. "2nd"). The default is `TRUE` because it returns much more compact clean addresses (i.e. "168th St" as opposed to "One Hundred Sixty Eigth St").
`operator`	A character scalar to be used as the intersection operator (between the 'x' and 'y' sides of the intersection).
`unnest`	A logical scalar; if `TRUE`, house ranges will be unnested (i.e. a house range that has been expanded to cover four addresses with `pm_houseRange_parse` will be converted from a single observation to four observations, one for each house number). If `FALSE` (default), the single observation will remain.
`include_commas`	A logical scalar; if `TRUE`, a comma is added both before and after the city name in rebuild addresses. If `FALSE` (default), no punctuation is added.
`include_units`	A logical scalar; if `TRUE` (default), the unit name and number (if given) will be included in the output string. Otherwise if `FALSE`, the unit name and number will not be included.
`keep_parsed`	Character string; if `"yes"`, all parsed elements will be added to the source data after replacement. If `"limited"`, only the `pm.city`, `pm.state`, and postal code variables will be retained. Otherwise, if `"no"`, only the rebuilt address will be added to the source data (default).
`side`	One of either `"left"` or `"right"` - should parsed data be placed to the left or right of the original data? Placing data to the left may be useful in particularly wide data sets.
`left_vars`	A character scalar or vector of variables to place on the left-hand side of the output when `side` is equal to `"middle"`.
`keep_ids`	Logical scalar; if `TRUE`, the identification numbers will be kept in the source data after replacement. Otherwise, if `FALSE`, they will be removed (default).
`houseSuf_dict`	Optional; name of house suffix dictionary object. Standardizationl and parsing are skipped if none is specified.
`dir_dict`	Optional; name of directional dictionary object. If none is specified, the full default directional dictionary will be used.
`street_dict`	Optional; name of street dictionary object. Standardizationl is skipped if none is specified.
`suffix_dict`	Optional; name of street suffix dictionary object. If none is specified, the full default street suffix dictionary will be used.
`unit_dict`	Optional; name of unit dictionary object - NOT CURRENTLY ENABLED
`city_dict`	Required for `"full"` addresses; name of city dictionary object.
`state_dict`	Optional; name of state dictionary object. If none is specified, the full default state dictionary will be used.
`locale`	A string indicating the country these data represent; the only current option is "us" but this is included to facilitate future expansion.

This function does not currently return countries. If a country identifier is present in the data to be parsed, it will be trimmed off the address and not returned.

An updated version of the source data with, at a minimum, a new variable containing standardized street addresses for each observation. Options allow for columns containing parsed elements to be returned as well.

# construct dictionaries
dirs <- pm_dictionary(type = "directional", filter = c("N", "S", "E", "W"), locale = "us")
sufs <- pm_dictionary(type = "suffix", locale = "us")
mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us")
cities <- pm_append(type = "city",
    input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood", "St. Louis",
              "SAINT LOUIS", "Webster Groves"),
    output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA))

# add example data
df <- sushi1

# identify
df <- pm_identify(df, var = address)

# temporary code to subset unit
df <- dplyr::filter(df, name != "Drunken Fish - Ballpark Village")

# parse, full output
pm_parse(df, input = "full", address = address, output = "full", keep_parsed = "no",
    dir_dict = dirs, suffix_dict = sufs, city_dict = cities, state_dict = mo)

# parse, short output
pm_parse(df, input = "full", address = address, output = "short", keep_parsed = "no",
    new_address = clean_address, dir_dict = dirs, suffix_dict = sufs,
    city_dict = cities, state_dict = mo)