Description Usage Arguments Details Value See Also Examples
Functions for working with files in HDFS: directory listing; file copy, move and delete; directory create and delete; test for file/directory existence; check if in HDFS; expunge Trash.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | hdfs_dir(path = ".", ..., full_path = FALSE, include_dirs = FALSE,
recursive = FALSE, dirs_only = FALSE, pattern = NULL,
host = hdfs_host())
## S3 method for class 'dplyrXdf_hdfs_dir'
print(x, ...)
hdfs_host(object = NULL)
hdfs_dir_exists(path, host = hdfs_host())
hdfs_file_exists(path, host = hdfs_host())
hdfs_dir_create(path, ..., host = hdfs_host())
hdfs_dir_remove(path, ..., host = hdfs_host())
hdfs_file_copy(src, dest, ..., host = hdfs_host())
hdfs_file_move(src, dest, ..., host = hdfs_host())
hdfs_file_remove(path, ..., host = hdfs_host())
hdfs_expunge()
in_hdfs(object)
|
path |
A HDFS pathname. |
... |
For |
full_path |
For |
include_dirs |
For |
recursive |
For |
dirs_only |
For |
pattern |
For |
host |
The HDFS hostname as a string, in the form |
object |
For |
src, dest |
For |
These are utility functions to simplify working with files and directories in HDFS. For the most part, they wrap lower-level functions provided by RevoScaleR, which in turn wrap various Hadoop file system commands. They work with any file that is stored in HDFS, not just Xdf files.
The hdfs_dir
function is analogous to dir
for the native filesystem. Like that function, and unlike rxHadoopListFiles
, it returns a vector of filenames (rxHadoopListFiles
returns a vector of printed output from the hadoop fs -ls
command, which is not quite the same thing). Again unlike rxHadoopListFiles
, it does not print anything by default (the print
method takes care of that).
hdfs_dir_exists
and hdfs_file_exists
test for the existence of a given directory and file, respectively. They are analogous to dir.exists
and file.exists
for the native filesystem.
hdfs_dir_create
and hdfs_dir_remove
create and remove directories. They are analogous to dir.create
and unlink(recursive=TRUE)
for the native filesystem.
hdfs_file_copy
and hdfs_file_move
copy and move files. They are analogous to file.copy
and file.rename
for the native filesystem. Unlike rxHadoopCopy
and rxHadoopMove
, they are vectorised in both src
and dest
.
Currently, RevoScaleR has only limited support for accessing multiple HDFS filesystems simultaneously. In particular, src
and dest
should both be on the same HDFS filesystem, whether host or ADLS.
hdfs_file_remove
deletes files. It is analogous to file.remove
and unlink
for the native filesystem.
hdfs_expunge
empties the HDFS trash.
hdfs_dir
returns a vector of filenames, optionally with the full path attached.
hdfs_host
returns the hostname of the HDFS filesystem for the given object. If no object is specified, or if the object is not in HDFS, it returns the hostname of the currently active HDFS filesystem. This is generally "default" unless you are in the RxHadoopMR
or RxSpark
compute context and using an Azure Data Lake Store, in which case it returns the ADLS name node.
hdfs_dir_exists
and hdfs_file_exists
return TRUE or FALSE depending on whether the directory or file exists.
The other hdfs_*
functions return TRUE or FALSE depending on whether the operation succeeded.
in_hdfs
returns whether the given object is stored in HDFS. This will be TRUE for an Xdf data source or file data source in HDFS, or a Spark data source. Classes for the latter include RxHiveData
, RxParquetData
and RxOrcData
.
dir
, dir.exists
, file.exists
, dir.create
,
file.copy
, file.rename
, file.remove
, unlink
,
rxHadoopListFiles
, rxHadoopFileExists
,
rxHadoopMakeDir
, rxHadoopRemoveDir
,
rxHadoopCopy
, rxHadoopMove
, rxHadoopRemove
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ## Not run:
hdfs_host()
mtx <- as_xdf(mtcars, overwrite=TRUE)
mth <- copy_to_hdfs(mtx)
in_hdfs(mtx)
in_hdfs(mth)
hdfs_host(mth)
# always TRUE
hdfs_dir_exists("/")
# should always be TRUE if Microsoft R is installed on the cluster
hdfs_dir_exists("/user/RevoShare")
# listing of home directory: /user/<username>
hdfs_dir()
# upload an arbitrary file
desc <- system.file("DESCRIPTION", package="dplyrXdf")
hdfs_upload(desc, "dplyrXdf_description")
hdfs_file_exists("dplyrXdf_description")
# creates /user/<username>/foo
hdfs_dir_create("foo")
hdfs_file_copy("dplyrXdf_description", "foo")
hdfs_file_exists("foo/dplyrXdf_description")
hdfs_file_remove("dplyrXdf_description")
hdfs_dir_remove("foo")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.