Description Usage Arguments Value Examples
View source: R/parallel.seqsplt_addr.R
The parallel.seqsplt_addr
function is a more efficient way to split address strings in a large data frame (+10,000 records) into sequential combinations of words using parallel processing.
1 2 | parallel.seqsplt_addr(in_clus, in_df, new_addr_col_name, id_col_name,
addr_col_name, third_col_name, remove_orig = TRUE)
|
in_clus |
the number of clusters available to the function as integer. Required. |
in_df |
a data frame containing addresses. Required. |
new_addr_col_name |
the name of output addresses column as string. Required. |
id_col_name |
the name of the unique identifier column as string. Required. |
addr_col_name |
the name of the input addresses column as string. Required. |
third_col_name |
the name of either the borough code or zip code column as string. Required. |
remove_orig |
option to exclude original address from output as binary. Optional. |
A data frame containing id_col_name
, third_col_name
, and a column of address strings split into sequential combinations of words.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # create a data frame of addresses
ADDR <- c("ROOM 326 125 WORTH STREET","253 BROADWAY FLR 3",
"C/O DOHMH 42-09 28 STREET")
BORO_CODE <- c(1,1,4)
u_id <- 1:length(ADDR)
df = data.frame(u_id, ADDR, BORO_CODE)
#split address column into sequential combinations
df1 <- parallel.seqsplt_addr(in_clus = 1,in_df = df,
new_addr_col_name = "ADDR.seqsplt", id_col_name = "u_id",
addr_col_name = "ADDR", third_col_name = "BORO_CODE")
#preview records
head(df1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.