regex_addr: String replacement cleaning of a data frame of NYC addresses.
In gmculp/rNYCclean: A package to clean NYC addresses

Description Usage Arguments Value Examples

View source: R/regex_addr.R

The regex_addr function performs string replacement cleaning on a data frame of NYC addresses with a look-up dataset of locations. The locations dataset was constructed from NYC Department of City Planning's (DCP) PAD (Property Address Directory) and SND (Street Name Dictionary). In addition, the function attempts to reconcile addresses containing post office box information or indicators of missing addresses (e.g., "UNKNOWN", "HOMELESS").

1 2	regex_addr(in_df, new_addr_col_name, addr1_col_name, addr2_col_name = NULL)

`in_df`	a data frame containing NYC addresses. Required.
`new_addr_col_name`	the name of output addresses column as string. Required.
`addr1_col_name`	the name of the input address line one column as string. Required.
`addr2_col_name`	the name of the input address line two column as string. Optional.

A data frame containing the input data frame plus the cleaned address column.

# create a data frame of addresses
ADDR1 <- c("80 CENTRE S","125 WORTH S","42-09 28 ST",
    "250 BEDFORD PARK BLV","30 LAFAYETTE A","125","1545 ATLANTIC")
ADDR2 <- c("","UNIT 329","1st FLR","SUITE 212B","ROOM 3","WORTH STREET","")
BORO_CODE <- c(rep(1,length(ADDR1)-1),3)
u_id <- 1:length(ADDR1)
df = data.frame(u_id, ADDR1, ADDR2, BORO_CODE)

#get version of DCP PAD used to build package data
rNYCclean::pad_version

#one address input column
df1 <- regex_addr(in_df = df, new_addr_col_name = "regex.ADDR", 
    addr1_col_name = "ADDR1")

#preview records
head(df1)

#two address input column
df2 <- regex_addr(in_df = df, new_addr_col_name = "regex.ADDR", 
    addr1_col_name = "ADDR1", addr2_col_name = "ADDR2")

#preview records
head(df2)