Description Usage Arguments Details Value Note on composite Xdf See Also Examples
Upload a dataset to a remote backend
1 2 3 4 5 6 7 |
dest |
The destination source: either a RevoScaleR data source object, or a filesystem object of class |
df |
A dataset. For the |
... |
Further arguments to lower-level functions; see below. |
name |
The filename, optionally including the path, for the uploaded Xdf file. The default upload location is the user's home directory ( |
host, port |
HDFS hostname and port number to connect to. You should need to set these only if you have an attached Azure Data Lake Store that you are accessing via HDFS. |
RevoScaleR does not have an exact analogue of the dplyr concept of a src, and because of this, the dplyrXdf implementation of copy_to
is somewhat different. In dplyrXdf, the function serves two related, overlapping purposes:
First, it can be used to copy a dataset to a different format, for example from an Xdf file to a SQL Server database. To do this, dest
should be a data source object of the target class (RxSqlServerData
for SQL Server), specifying the name/location of the copied data.
Second, it can be used to upload a dataset to a different filesystem
, such as the HDFS filesystem of a Hadoop or Spark cluster. The dataset will be saved in Xdf format. For this, dest
should be a RxHdfsFileSystem
object.
The copy_to_hdfs
function is a simple wrapper to the HDFS upload method that avoids having to create an explicit filesystem object. Its arguments other than host
and port
are simply passed as-is to copy_to.RxHdfsFileSystem
.
The method for uploading to HDFS can handle both the cases where you are logged into the edge node of a Hadoop/Spark cluster, and where you are a remote client. For the latter case, the uploading is a two-stage process: the data is first transferred to the native filesystem of the edge node, and then copied from the edge node into HDFS. Similarly, it can handle uploading both to the host HDFS filesystem, and to an attached Azure Data Lake Store. If dest
points to an ADLS host, the file will be uploaded there. You can override this by supplying an explicit an explicit URI for the uploaded file, in the form adl://azure.host.name/path
. The name for the host HDFS filesystem is adl://host/
.
For the HDFS upload method, any arguments in ...
are passed to hdfs_upload
, and ultimately to the Hadoop fs -copytoLocal
command. For the data source copy method, arguments in ...
are passed to rxDataStep
.
copy_to
is meant for copying datasets to different backends. If you are simply copying a file to HDFS, consider using hdfs_upload
; or if you are copying an Xdf file to a different location in the same filesystem, use copy_xdf
or file.copy
.
An Xdf data source object pointing to the uploaded data.
There are actually two kinds of Xdf files: standard and composite. A composite Xdf file is a directory containing multiple data and metadata files, which the RevoScaleR functions treat as a single dataset. Xdf files in HDFS must be composite in order to work properly; copy_to
will convert an existing Xdf file into composite, if it's not already in that format. Non-Xdf datasets (data frames and other RevoScaleR data sources, such as text files) will similarly be uploaded as composite.
rxHadoopCopyFromClient
, rxHadoopCopyFromLocal
,
collect
and compute
for downloading data from HDFS,
as_xdf
, as_composite_xdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ## Not run:
# copy a data frame to SQL Server
connStr <- "SERVER=hostname;DATABASE=RevoTestDB;TRUSTED_CONNECTION=yes"
mtdb <- RxSqlServerData("mtcars", connectionString=connString)
copy_to(mtdb, mtcars)
# copy an Xdf file to SQL Server: will overwrite any existing table with the same name
mtx <- as_xdf(mtcars, overwrite=TRUE)
copy_to(mtdb, mtx)
# copy a data frame to HDFS
hd <- RxHdfsFileSystem()
mth <- copy_to(hd, mtcars)
# assign a new filename on copy
mth2 <- copy_to(hd, mtcars, "mtcars_2")
# copy an Xdf file to HDFS
mth3 <- copy_to(hd, mtx, "mtcars_3")
# same as copy_to(hd, ...)
delete_xdf(mth)
copy_to_hdfs(mtcars)
# copying to attached ADLS storage
copy_to_hdfs(mtcars, host="adl://adls.host.name")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.