regex_addr: String replacement cleaning of a data frame of NYC addresses.

Description Usage Arguments Value Examples

View source: R/regex_addr.R

Description

The regex_addr function performs string replacement cleaning on a data frame of NYC addresses with a look-up dataset of locations. The locations dataset was constructed from NYC Department of City Planning's (DCP) PAD (Property Address Directory) and SND (Street Name Dictionary). In addition, the function attempts to reconcile addresses containing post office box information or indicators of missing addresses (e.g., "UNKNOWN", "HOMELESS").

Usage

1
2
regex_addr(in_df, new_addr_col_name, addr1_col_name,
    addr2_col_name = NULL)

Arguments

in_df

a data frame containing NYC addresses. Required.

new_addr_col_name

the name of output addresses column as string. Required.

addr1_col_name

the name of the input address line one column as string. Required.

addr2_col_name

the name of the input address line two column as string. Optional.

Value

A data frame containing the input data frame plus the cleaned address column.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# create a data frame of addresses
ADDR1 <- c("80 CENTRE S","125 WORTH S","42-09 28 ST",
    "250 BEDFORD PARK BLV","30 LAFAYETTE A","125","1545 ATLANTIC")
ADDR2 <- c("","UNIT 329","1st FLR","SUITE 212B","ROOM 3","WORTH STREET","")
BORO_CODE <- c(rep(1,length(ADDR1)-1),3)
u_id <- 1:length(ADDR1)
df = data.frame(u_id, ADDR1, ADDR2, BORO_CODE)

#get version of DCP PAD used to build package data
rNYCclean::pad_version

#one address input column
df1 <- regex_addr(in_df = df, new_addr_col_name = "regex.ADDR", 
    addr1_col_name = "ADDR1")

#preview records
head(df1)

#two address input column
df2 <- regex_addr(in_df = df, new_addr_col_name = "regex.ADDR", 
    addr1_col_name = "ADDR1", addr2_col_name = "ADDR2")

#preview records
head(df2)

gmculp/rNYCclean documentation built on July 14, 2020, 5:07 a.m.