flow_dfr: Row-wise caching of operations on data frame

Description Usage Arguments Details Value Examples

Description

Row-wise caching of operations on data frame

Usage

1
2

Arguments

...

Named arguments to pass to fn. The first argument must be a data.frame or tibble. Row names are not supported.

fn

The function to apply to the data frame. It must accept a data frame as the first argument.

fn_id

Optional id to uniquely identify the function. By default, rflow functions reuse the cache if the same function is given. The id allows the user to suppress console messages and to explicitly indicate whether to reuse the old cache or create a new one.

flow_options

List of options created using get_flow_options.

Details

Function fn operates on a data frame received as argument. fn will receive only the rows changed; it may drop some of the rows, but will not add any new rows. The function fn may return fewer or more columns or modify existing columns as long it always returns a consistent schema (i.e., the same column data types and names) for all calls. The data frame df passed to fn will include one additional column ..row_hash.. that must be returned as is in order to identify changes.

Arguments fn, fn_id and flow_options, when provided, must be named. Argument fn must be always provided.

Value

The flow object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
df_fn <- function(df, i = NULL) {
    if (is.null(i)) {
        dfi <- df
        dfi$rm <- rowMeans(dfi[1:10])
    } else {
        dfi <- df[i, , drop = FALSE]
    }
    dfi
}

# the flow element can also become input for another flow_df function 
# in order to allow multiple, chained computations
dfr_flow <- flow_dfr(mtcars, 1, fn = df_fn)
collected_dfr <- dfr_flow %>%
    collect()

numeract/rflow documentation built on May 28, 2019, 3:39 p.m.