Description Usage Arguments Details Value Author(s) References Examples
Functions providing high-level access to the Hadoop Distributed File System (HDFS).
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | DFS_cat( file, con = stdout(), henv = hive() )
DFS_delete( file, recursive = FALSE, henv = hive() )
DFS_dir_create( path, henv = hive() )
DFS_dir_exists( path, henv = hive() )
DFS_dir_remove( path, recursive = TRUE, henv = hive() )
DFS_file_exists( file, henv = hive() )
DFS_get_object( file, henv = hive() )
DFS_read_lines( file, n = -1L, henv = hive() )
DFS_rename( from, to, henv = hive() )
DFS_list( path = ".", henv = hive() )
DFS_tail( file, n = 6L, size = 1024L, henv = hive() )
DFS_put( files, path = ".", henv = hive() )
DFS_put_object( obj, file, henv = hive() )
DFS_write_lines( text, file, henv = hive() )
 | 
| henv | An object containing the local Hadoop configuration. | 
| file | a character string representing a file on the DFS. | 
| files | a character string representing files located on the local file system to be copied to the DFS. | 
| n | an integer specifying the number of lines to read. | 
| obj | an R object to be serialized to/from the DFS. | 
| path | a character string representing a full path name in the
DFS (without the leading  | 
| recursive | logical. Should elements of the path other than the last be deleted recursively? | 
| size | an integer specifying the number of bytes to be read. Must
be sufficiently large otherwise  | 
| text | a (vector of) character string(s) to be written to the DFS. | 
| con | A connection to be used for printing the output provided by
 | 
| from | a character string representing a file or directory on the DFS to be renamed. | 
| to | a character string representing the new filename on the DFS. | 
The Hadoop Distributed File System (HDFS) is typically part of a Hadoop cluster or can be used as a stand-alone general purpose distributed file system (DFS). Several high-level functions provide easy access to distributed storage.
DFS_cat is useful for producing output in user-defined
functions. It reads from files on the DFS and typically prints the
output to the standard output. Its behaviour is similar to the base
function cat.
DFS_dir_create creates directories with the given path names if
they do not already exist. It's behaviour is similar to the base
function dir.create.
DFS_dir_exists and DFS_file_exists return a logical
vector indicating whether the directory or file respectively named by
its argument exist. See also function file.exists.
DFS_dir_remove attempts to remove the directory named in its
argument and if recursive is set to TRUE also attempts
to remove subdirectories in a recursive manner.
DFS_list produces a character vector of the names of files
in the directory named by its argument.
DFS_read_lines is a reader for (plain text) files stored on the
DFS. It returns a vector of character strings representing lines in
the (text) file. If n is given as an argument it reads that
many lines from the given file. It's behaviour is similar to the base
function readLines.
DFS_put copies files named by its argument to a given path in
the DFS.
DFS_put_object serializes an R object to the DFS.
DFS_write_lines writes a given vector of character strings to a
file stored on the DFS. It's behaviour is similar to the base
function writeLines.
DFS_delete(), DFS_dir_create(), and DFS_dir_remove return a logical value indicating if the 
operation succeeded for the given argument.
DFS_dir_exists() and DFS_file_exists() return TRUE if
the named directories or files exist in the HDFS.
DFS_get__object() returns the deserialized object stored in a
file on the HDFS.
DFS_list() returns a character vector representing the directory listing of the corresponding
path on the HDFS.
DFS_read_lines() returns a character vector of length the
number of lines read.
DFS_tail() returns a character vector of length the number of
lines to read until the end of a file on the HDFS.
Stefan Theussl
The Hadoop Distributed File System (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html).
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## Do we have access to the root directory of the DFS?
## Not run: DFS_dir_exists("/")
## Some self-explanatory DFS interaction
## Not run: 
DFS_list( "/" )
DFS_dir_create( "/tmp/test" )
DFS_write_lines( c("Hello HDFS", "Bye Bye HDFS"), "/tmp/test/hdfs.txt" )
DFS_list( "/tmp/test" )
DFS_read_lines( "/tmp/test/hdfs.txt" )
## End(Not run)
## Serialize an R object to the HDFS
## Not run: 
foo <- function()
"You got me serialized."
sro <- "/tmp/test/foo.sro"
DFS_put_object(foo, sro)
DFS_get_object( sro )()
## End(Not run)
## finally (recursively) remove the created directory
## Not run: DFS_dir_remove( "/tmp/test" )
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.