NYC.CleanGeoZip: Clean and geocode NYC addresses by zip code.

Description Usage Arguments Value Examples

Description

The NYC.CleanGeoZip function utilizes the rGBAT, rUSPS, and rNYCclean packages to clean and geocode NYC addresses by five digit zip code.

Usage

1
2
3
NYC.CleanGeoZip(in_df, id_colname, addr1_colname, 
    addr2_colname=NULL, zip_colname, source_cols, geocode_fields, 
    GBAT_name, in_clus=1, USPS_verify=TRUE)

Arguments

in_df

a data frame containing NYC addresses. Required.

id_colname

the name of the unique identifier column as string. Required.

addr1_colname

the name of the input address line one column as string. Required.

addr2_colname

the name of the input address line two column as string. Optional.

zip_colname

the name of the input five digit zip code as string. Required.

source_cols

vector of column names from the input data frame to be returned with geocoder results. Required.

geocode_fields

vector of field names generated by the geocoder to be returned with geocoder results. Required.

GBAT_name

the release or version of DCP's Geosupport geocoding software as string. Required.

in_clus

the number of clusters available to the function as integer. Optional.

USPS_verify

if TRUE, addresses will be run through the IBM Infosphere address verification service. Optional.

Value

A data frame or data table (depending on format of in_df) of cleaned and geocoded NYC addresses.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# create a data frame of addresses
ADDR <- c("80 CENTRE","125 WORTH S","42 09 28 S","253 BROADW",
    "620 ATLANT","125 WOR","1 FRANKLIN","1 FRANKLIN",
    "1 1 1 AVE","1 1 1 AVE")
CITY <- c("NEW YORK","NEW YORK","LONG ISLAND CITY","NEW YORK",
    "BROOKLYN","NEW YORK","BROOKLYN","BROOKLYN","NEW YORK",
    "NEW YORK")
ZIP_CODE <- c('10013','10013','11101','10007','11217','10013',
    '11222','11249','10003','10014')
u_id <- 1:length(ADDR)
df = data.frame(u_id, ADDR, CITY, ZIP_CODE)

#specify columns from input data frame to retain
source_cols <- c('u_id')

#specify geocoder return fields
geocode_fields <- c('F1E.output.bin','F1E.output.bbl','F1E.longitude',
    'F1E.latitude','JN.ZCTA_10','F1E.output.ret_code','F1E.output.msg')

#clean and geocode by zip code
gc_df <- NYC.CleanGeoZip(in_df=df,id_colname="u_id",
    addr1_colname="ADDR", city_colname="CITY",
    zip_colname="ZIP_CODE", source_cols=source_cols, 
    geocode_fields=geocode_fields, GBAT_name="18B")

#preview results
head(gc_df)

#view metadata
NYC.CleanGeoZip_metadata

gmculp/rBES documentation built on May 25, 2019, 11:31 p.m.