chunked_pmap: Apply a function to each row of a database result set.

Description Usage Arguments Details Value See Also Examples

Description

chunked_pmap pulls chunks from a database result set and applies a function to each row of the result set. All transformed chunks are row bound to return a single dataframe.

Usage

1

Arguments

res

DBI::dbSendQuery result set.

f

The function that will be applied to each row of the database result set. f is assumed to be constructed using a dplyr pipeline where a dataframe is passed as the sole parameter and a transformed dataframe is returned.

n

The number of rows to pull back from the database result set for each chunk.

gc

The type of garbage collection to run. Default is NA or no garbage collection. Choices include

  • r = R garbage collection

  • j = Java garbage collection

  • rj = R and Java garbage collection

Details

This function is useful when processing database result sets that are extremely large. For example, if XML docs, JSON docs, or blobs of any kind are stored in a database column, then chunked_pmap can be used to iteratively process and transform the data.

Value

A dataframe that has been transformed by f.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Not run: 
# create in-memory SQLite database
dbcon <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
DBI::dbWriteTable(dbcon, "mtcars", mtcars)

# create and send a query
sql_query <- "SELECT * FROM mtcars where hp > 120"
res <- DBI::dbSendQuery(dbcon, sql_query)

# create a function to operate on chunks
# the function should accept a dataframe and return a dataframe
f <- function(df) {
  df %>%
    dplyr::mutate(hp_to_cyl = hp / cyl) %>%
    dplyr::arrange(hp_to_cyl)
}

# process 5 rows at a time
mtcars_new <- chunked_pmap(res, f, 5)

# cleanup
DBI::dbClearResult(res)
DBI::dbDisconnect(dbcon)

## End(Not run)

curtisalexander/CRAmisc documentation built on May 14, 2019, 12:52 p.m.