Description Usage Arguments Details Value See Also Examples
Download a dataset to the local machine
1 2 3 4 5 6 7 8 9 10 11 |
x |
An Xdf data source object. |
as_data_frame |
For the |
... |
If the output is to be a data frame, further arguments to the |
name |
For the |
RevoScaleR does not have an exact analogue of the dplyr concept of a src, and because of this, the dplyrXdf implementations of collect
and compute
are somewhat different. In dplyrXdf, these functions serve two related, overlapping purposes:
Copy an arbitrary data source from a backend to an Xdf file or data frame. The data source can be any (non-Xdf) RevoScaleR data source, such as a SQL Server table (class RxSqlServerData
).
Download an Xdf file from a remote filesystem, such as the HDFS filesystem of a Hadoop or Spark cluster.
The code will handle both the cases where you are logged into the edge node of a Hadoop/Spark cluster, and if you are a remote client. For the latter case, the downloading is a two-stage process: the data is first transferred from HDFS to the native filesystem of the edge node, and then downloaded from the edge node to the client.
If you want to look at the first few rows of a small Xdf file in HDFS, it may be faster to use compute
) to copy the entire file to the native filesystem, and then run head
, than to run head
on the original. This is due to RevoScaleR overhead in Spark and Hadoop.
For the RxDataSource
methods, collect
returns a data frame, and compute
returns a tbl_xdf data source. For the RxXdfData
methods, either a data frame or tbl_xdf based on the as_data_frame
argument.
as_xdf
, as_data_frame
, copy_to
, compute
in package dplyr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | mtx <- as_xdf(mtcars, overwrite=TRUE)
# all of these return a data frame (or a tbl_df) for input in the native filesystem
as.data.frame(mtx)
as_data_frame(mtx) # returns a tbl_df
collect(mtx)
compute(mtx)
# collect and compute are meant for downloading data from remote backends
## Not run:
# downloading from a database
connStr <- "SERVER=hostname;DATABASE=RevoTestDB;TRUSTED_CONNECTION=yes"
mtdb <- RxSqlServerData("mtcars", connectionString=connStr)
copy_to(mtdb, mtcars)
as.data.frame(mtdb)
collect(mtdb) # returns a data frame
compute(mtdb) # returns a tbl_xdf
# downloading from HDFS
mtc <- copy_to_hdfs(mtcars)
as.data.frame(mtc)
collect(mtc) # returns a data frame
compute(mtc) # returns a tbl_xdf
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.