Description Usage Arguments Details Value See Also Examples
Functions for working with files in HDFS: directory listing; file copy, move and delete; directory create and delete; test for file/directory existence; check if in HDFS; expunge Trash.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | hdfs_dir(path = ".", ..., full_path = FALSE, include_dirs = FALSE,
recursive = FALSE, dirs_only = FALSE, pattern = NULL,
host = hdfs_host())
## S3 method for class 'dplyrXdf_hdfs_dir'
print(x, ...)
hdfs_host(object = NULL)
hdfs_dir_exists(path, host = hdfs_host())
hdfs_file_exists(path, host = hdfs_host())
hdfs_dir_create(path, ..., host = hdfs_host())
hdfs_dir_remove(path, ..., host = hdfs_host())
hdfs_file_copy(src, dest, ..., host = hdfs_host())
hdfs_file_move(src, dest, ..., host = hdfs_host())
hdfs_file_remove(path, ..., host = hdfs_host())
hdfs_expunge()
in_hdfs(object)
|
path |
A HDFS pathname. |
... |
For |
full_path |
For |
include_dirs |
For |
recursive |
For |
dirs_only |
For |
pattern |
For |
host |
The HDFS hostname as a string, in the form |
object |
For |
src, dest |
For |
These are utility functions to simplify working with files and directories in HDFS. For the most part, they wrap lower-level functions provided by RevoScaleR, which in turn wrap various Hadoop file system commands. They work with any file that is stored in HDFS, not just Xdf files.
The hdfs_dir function is analogous to dir for the native filesystem. Like that function, and unlike rxHadoopListFiles, it returns a vector of filenames (rxHadoopListFiles returns a vector of printed output from the hadoop fs -ls command, which is not quite the same thing). Again unlike rxHadoopListFiles, it does not print anything by default (the print method takes care of that).
hdfs_dir_exists and hdfs_file_exists test for the existence of a given directory and file, respectively. They are analogous to dir.exists and file.exists for the native filesystem.
hdfs_dir_create and hdfs_dir_remove create and remove directories. They are analogous to dir.create and unlink(recursive=TRUE) for the native filesystem.
hdfs_file_copy and hdfs_file_move copy and move files. They are analogous to file.copy and file.rename for the native filesystem. Unlike rxHadoopCopy and rxHadoopMove, they are vectorised in both src and dest.
Currently, RevoScaleR has only limited support for accessing multiple HDFS filesystems simultaneously. In particular, src and dest should both be on the same HDFS filesystem, whether host or ADLS.
hdfs_file_remove deletes files. It is analogous to file.remove and unlink for the native filesystem.
hdfs_expunge empties the HDFS trash.
hdfs_dir returns a vector of filenames, optionally with the full path attached.
hdfs_host returns the hostname of the HDFS filesystem for the given object. If no object is specified, or if the object is not in HDFS, it returns the hostname of the currently active HDFS filesystem. This is generally "default" unless you are in the RxHadoopMR or RxSpark compute context and using an Azure Data Lake Store, in which case it returns the ADLS name node.
hdfs_dir_exists and hdfs_file_exists return TRUE or FALSE depending on whether the directory or file exists.
The other hdfs_* functions return TRUE or FALSE depending on whether the operation succeeded.
in_hdfs returns whether the given object is stored in HDFS. This will be TRUE for an Xdf data source or file data source in HDFS, or a Spark data source. Classes for the latter include RxHiveData, RxParquetData and RxOrcData.
dir, dir.exists, file.exists, dir.create,
file.copy, file.rename, file.remove, unlink,
rxHadoopListFiles, rxHadoopFileExists,
rxHadoopMakeDir, rxHadoopRemoveDir,
rxHadoopCopy, rxHadoopMove, rxHadoopRemove
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ## Not run:
hdfs_host()
mtx <- as_xdf(mtcars, overwrite=TRUE)
mth <- copy_to_hdfs(mtx)
in_hdfs(mtx)
in_hdfs(mth)
hdfs_host(mth)
# always TRUE
hdfs_dir_exists("/")
# should always be TRUE if Microsoft R is installed on the cluster
hdfs_dir_exists("/user/RevoShare")
# listing of home directory: /user/<username>
hdfs_dir()
# upload an arbitrary file
desc <- system.file("DESCRIPTION", package="dplyrXdf")
hdfs_upload(desc, "dplyrXdf_description")
hdfs_file_exists("dplyrXdf_description")
# creates /user/<username>/foo
hdfs_dir_create("foo")
hdfs_file_copy("dplyrXdf_description", "foo")
hdfs_file_exists("foo/dplyrXdf_description")
hdfs_file_remove("dplyrXdf_description")
hdfs_dir_remove("foo")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.