copy_to: Upload a dataset to a remote backend
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description Usage Arguments Details Value Note on composite Xdf See Also Examples

Upload a dataset to a remote backend

## S3 method for class 'RxDataSource'
copy_to(dest, df, ...)

## S3 method for class 'RxHdfsFileSystem'
copy_to(dest, df, name = NULL, ...)

copy_to_hdfs(..., host = hdfs_host(), port = rxGetOption("hdfsPort"))

`dest`	The destination source: either a RevoScaleR data source object, or a filesystem object of class `RxHdfsFileSystem`.
`df`	A dataset. For the `RxDataSource` method, this can be any RevoScaleR data source object, presumably of a different class to the destination. For the `RxHdfsFileSystem` method, this can be the filename of an Xdf file, a RevoScaleR data source, or anything that can be coerced to a data frame.
`...`	Further arguments to lower-level functions; see below.
`name`	The filename, optionally including the path, for the uploaded Xdf file. The default upload location is the user's home directory (`user/<username>`) in the filesystem pointed to by `dest`. Not used for the `RxDataSource` method.
`host, port`	HDFS hostname and port number to connect to. You should need to set these only if you have an attached Azure Data Lake Store that you are accessing via HDFS.

RevoScaleR does not have an exact analogue of the dplyr concept of a src, and because of this, the dplyrXdf implementation of copy_to is somewhat different. In dplyrXdf, the function serves two related, overlapping purposes:

First, it can be used to copy a dataset to a different format, for example from an Xdf file to a SQL Server database. To do this, dest should be a data source object of the target class (RxSqlServerData for SQL Server), specifying the name/location of the copied data.
Second, it can be used to upload a dataset to a different filesystem, such as the HDFS filesystem of a Hadoop or Spark cluster. The dataset will be saved in Xdf format. For this, dest should be a RxHdfsFileSystem object.

The copy_to_hdfs function is a simple wrapper to the HDFS upload method that avoids having to create an explicit filesystem object. Its arguments other than host and port are simply passed as-is to copy_to.RxHdfsFileSystem.

The method for uploading to HDFS can handle both the cases where you are logged into the edge node of a Hadoop/Spark cluster, and where you are a remote client. For the latter case, the uploading is a two-stage process: the data is first transferred to the native filesystem of the edge node, and then copied from the edge node into HDFS. Similarly, it can handle uploading both to the host HDFS filesystem, and to an attached Azure Data Lake Store. If dest points to an ADLS host, the file will be uploaded there. You can override this by supplying an explicit an explicit URI for the uploaded file, in the form adl://azure.host.name/path. The name for the host HDFS filesystem is adl://host/.

For the HDFS upload method, any arguments in ... are passed to hdfs_upload, and ultimately to the Hadoop fs -copytoLocal command. For the data source copy method, arguments in ... are passed to rxDataStep.

copy_to is meant for copying datasets to different backends. If you are simply copying a file to HDFS, consider using hdfs_upload; or if you are copying an Xdf file to a different location in the same filesystem, use copy_xdf or file.copy.

An Xdf data source object pointing to the uploaded data.

There are actually two kinds of Xdf files: standard and composite. A composite Xdf file is a directory containing multiple data and metadata files, which the RevoScaleR functions treat as a single dataset. Xdf files in HDFS must be composite in order to work properly; copy_to will convert an existing Xdf file into composite, if it's not already in that format. Non-Xdf datasets (data frames and other RevoScaleR data sources, such as text files) will similarly be uploaded as composite.

rxHadoopCopyFromClient, rxHadoopCopyFromLocal, collect and compute for downloading data from HDFS, as_xdf, as_composite_xdf

## Not run: 
# copy a data frame to SQL Server
connStr <- "SERVER=hostname;DATABASE=RevoTestDB;TRUSTED_CONNECTION=yes"
mtdb <- RxSqlServerData("mtcars", connectionString=connString)
copy_to(mtdb, mtcars)

# copy an Xdf file to SQL Server: will overwrite any existing table with the same name
mtx <- as_xdf(mtcars, overwrite=TRUE)
copy_to(mtdb, mtx)

# copy a data frame to HDFS
hd <- RxHdfsFileSystem()
mth <- copy_to(hd, mtcars)
# assign a new filename on copy
mth2 <- copy_to(hd, mtcars, "mtcars_2")

# copy an Xdf file to HDFS
mth3 <- copy_to(hd, mtx, "mtcars_3")

# same as copy_to(hd, ...)
delete_xdf(mth)
copy_to_hdfs(mtcars)

# copying to attached ADLS storage
copy_to_hdfs(mtcars, host="adl://adls.host.name")

## End(Not run)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.

RevolutionAnalytics/dplyrXdf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

copy_to: Upload a dataset to a remote backend
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

Note on composite Xdf

See Also

Examples

Related to copy_to in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf Tools for working with Microsoft R Server Xdf files and the dplyr package

copy_to: Upload a dataset to a remote backend In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

Note on composite Xdf

See Also

Examples

Related to copy_to in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

copy_to: Upload a dataset to a remote backend
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package