hdfs: Utilities for HDFS
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description Usage Arguments Details Value See Also Examples

Functions for working with files in HDFS: directory listing; file copy, move and delete; directory create and delete; test for file/directory existence; check if in HDFS; expunge Trash.

hdfs_dir(path = ".", ..., full_path = FALSE, include_dirs = FALSE,
  recursive = FALSE, dirs_only = FALSE, pattern = NULL,
  host = hdfs_host())

## S3 method for class 'dplyrXdf_hdfs_dir'
print(x, ...)

hdfs_host(object = NULL)

hdfs_dir_exists(path, host = hdfs_host())

hdfs_file_exists(path, host = hdfs_host())

hdfs_dir_create(path, ..., host = hdfs_host())

hdfs_dir_remove(path, ..., host = hdfs_host())

hdfs_file_copy(src, dest, ..., host = hdfs_host())

hdfs_file_move(src, dest, ..., host = hdfs_host())

hdfs_file_remove(path, ..., host = hdfs_host())

hdfs_expunge()

in_hdfs(object)

`path`	A HDFS pathname.
`...`	For `hdfs_dir`, further switches, prefixed by `"-"`, to pass to the Hadoop `fs -ls` command. For other functions, further arguments to pass to `rxHadoopCommand.`
`full_path`	For `hdfs_dir`, whether to prepend the directory path to filenames to give a full path. If FALSE, only file names are returned.
`include_dirs`	For `hdfs_dir`, if subdirectory names should be included. Always TRUE for non-recursive listings.
`recursive`	For `hdfs_dir`, if the listing should recurse into subdirectories.
`dirs_only`	For `hdfs_dir`, if only subdirectory names should be included.
`pattern`	For `hdfs_dir`, an optional regular expression. Only file names that match will be returned.
`host`	The HDFS hostname as a string, in the form `adl://host.name`. You should need to set this only if you have an attached Azure Data Lake Store that you are accessing via HDFS. Can also be an `RxHdfsFileSystem` object, in which case the hostname will be taken from the object.
`object`	For `in_hdfs` and `hdfs_host`, An R object, typically a RevoScaleR data source object.
`src, dest`	For `hdfs_file_copy` and `hdfs_file_move`, the source and destination paths.

These are utility functions to simplify working with files and directories in HDFS. For the most part, they wrap lower-level functions provided by RevoScaleR, which in turn wrap various Hadoop file system commands. They work with any file that is stored in HDFS, not just Xdf files.

The hdfs_dir function is analogous to dir for the native filesystem. Like that function, and unlike rxHadoopListFiles, it returns a vector of filenames (rxHadoopListFiles returns a vector of printed output from the hadoop fs -ls command, which is not quite the same thing). Again unlike rxHadoopListFiles, it does not print anything by default (the print method takes care of that).

hdfs_dir_exists and hdfs_file_exists test for the existence of a given directory and file, respectively. They are analogous to dir.exists and file.exists for the native filesystem.

hdfs_dir_create and hdfs_dir_remove create and remove directories. They are analogous to dir.create and unlink(recursive=TRUE) for the native filesystem.

hdfs_file_copy and hdfs_file_move copy and move files. They are analogous to file.copy and file.rename for the native filesystem. Unlike rxHadoopCopy and rxHadoopMove, they are vectorised in both src and dest.

Currently, RevoScaleR has only limited support for accessing multiple HDFS filesystems simultaneously. In particular, src and dest should both be on the same HDFS filesystem, whether host or ADLS.

hdfs_file_remove deletes files. It is analogous to file.remove and unlink for the native filesystem.

hdfs_expunge empties the HDFS trash.

hdfs_dir returns a vector of filenames, optionally with the full path attached.

hdfs_host returns the hostname of the HDFS filesystem for the given object. If no object is specified, or if the object is not in HDFS, it returns the hostname of the currently active HDFS filesystem. This is generally "default" unless you are in the RxHadoopMR or RxSpark compute context and using an Azure Data Lake Store, in which case it returns the ADLS name node.

hdfs_dir_exists and hdfs_file_exists return TRUE or FALSE depending on whether the directory or file exists.

The other hdfs_* functions return TRUE or FALSE depending on whether the operation succeeded.

in_hdfs returns whether the given object is stored in HDFS. This will be TRUE for an Xdf data source or file data source in HDFS, or a Spark data source. Classes for the latter include RxHiveData, RxParquetData and RxOrcData.

dir, dir.exists, file.exists, dir.create, file.copy, file.rename, file.remove, unlink, rxHadoopListFiles, rxHadoopFileExists, rxHadoopMakeDir, rxHadoopRemoveDir, rxHadoopCopy, rxHadoopMove, rxHadoopRemove

## Not run: 
hdfs_host()

mtx <- as_xdf(mtcars, overwrite=TRUE)
mth <- copy_to_hdfs(mtx)
in_hdfs(mtx)
in_hdfs(mth)
hdfs_host(mth)

# always TRUE
hdfs_dir_exists("/")
# should always be TRUE if Microsoft R is installed on the cluster
hdfs_dir_exists("/user/RevoShare")

# listing of home directory: /user/<username>
hdfs_dir()

# upload an arbitrary file
desc <- system.file("DESCRIPTION", package="dplyrXdf")
hdfs_upload(desc, "dplyrXdf_description")
hdfs_file_exists("dplyrXdf_description")

# creates /user/<username>/foo
hdfs_dir_create("foo")
hdfs_file_copy("dplyrXdf_description", "foo")
hdfs_file_exists("foo/dplyrXdf_description")

hdfs_file_remove("dplyrXdf_description")
hdfs_dir_remove("foo")

## End(Not run)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.

RevolutionAnalytics/dplyrXdf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

hdfs: Utilities for HDFS
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to hdfs in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf Tools for working with Microsoft R Server Xdf files and the dplyr package

hdfs: Utilities for HDFS In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to hdfs in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

hdfs: Utilities for HDFS
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package