Description Usage Arguments Details Value See Also Examples
The do
verb converts the data to a data frame before running the operations. The doXdf
verb keeps the data in Xdf format, so is not (as) limited by memory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ## S3 method for class 'RxFileData'
do(.data, ...)
## S3 method for class 'grouped_tbl_xdf'
do(.data, ...)
do_xdf(.data, ...)
doXdf(.data, ...)
## S3 method for class 'RxFileData'
do_xdf(.data, ...)
## S3 method for class 'grouped_tbl_xdf'
do_xdf(.data, ...)
## S3 method for class 'RxDataSource'
do(.data, ...)
## S3 method for class 'RxDataSource'
do_xdf(.data, ...)
|
.data |
A tbl for an Xdf data source; or a raw Xdf data source. |
... |
Expressions to apply. |
The difference between the do
and do_xdf
verbs is that the former converts the data into a data frame before running the expressions on it; while the latter passes the data as Xdf files. do
is thus more flexible in the expressions it can run (basically anything that works with data frames), whereas do_xdf
is better able to handle large datasets. The final output from do_xdf
must still be able to fit in memory (see below).
do_xdf
was called doXdf
in previous versions of this package; it has been renamed to match dplyr's snake_case naming convention.
To run expressions on a grouped Xdf tbl, do
and do_xdf
split the data into one file per group, and the arguments are called on each file. Note however this may be slow if you have a large number of groups; and, for do
, the operation will be limited by memory if the number of rows per group is large.
The do
and do_xdf
verbs always return a data frame, unlike the other verbs for Xdf objects. This is because they are meant to execute code that can return arbitrarily complex objects, and Xdf files can only store atomic data.
do
in package dplyr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | mtx <- as_xdf(mtcars, overwrite=TRUE)
# unnamed arg
do(mtx, {
mpg2 <- 2 * .$mpg
cyl2 <- 2 * .$cyl
.
})
do_xdf(mtx, rxDataStep(., transformFunc=function(.data) {
.data$mpg2 <- 2 * .data$mpg
.data$cyl2 <- 2 * .data$cyl
.data
}))
# named arg
do(mtx, m=lm(mpg ~ cyl, data=.))
do_xdf(mtx, m=rxLinMod(mpg ~ cyl, data=.))
# fitting multiple models to subsets of the data
if(require("nycflights13")) {
flx <- as_xdf(flights, overwrite=TRUE)
flx %>%
group_by(carrier) %>%
do(m=lm(arr_delay ~ dep_time, data=.))
# with do_xdf: useful if each subset is very large, but called code must be Xdf-aware
flx %>%
group_by(carrier) %>%
do_xdf(m2=rxLinMod(arr_delay ~ dep_time, data=.))
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.