In this package, most code is same with original rwebhdfs package. But I modify the "http" address to "https" address and change token to delagation token for user to be easier to use.
Additional function added: read_all() to allow users to load whole directory files into variable
This R package provides access to HDFS via WebHDFS REST API. For more information, please see: http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
Ensure that WebHDFS is enabled in the
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
More exmaples will arrive in the function help pages but for now, here's a brief guide on how to use
I'm recommend HDP 2.0 for quick demo and testing: http://hortonworks.com/hdp/downloads/
WebHDFS is a S3 object and can be created using
hdfs <- webhdfs("localhost", 50070, "hue",token="your delegation token")
write_file(hdfs, "test") file_stat(hdfs, "test")
data <- read_all(hdfs, "dirPath")
foo <- tempfile() writeLines("foobar", foo) write_file(hdfs, "foo", foo) read_file(hdfs, "foo")
mkdir(hdfs, "bar") rename_file(hdfs, "foo", "bar/foo")
delete_file(hdfs, "test") delete_file(hdfs, "bar", recursive=TRUE)
rwebhdfs is not on CRAN yet. I plan to play with it in a couple Hadoop projects before submission to CRAN. So that I can decide if all functions are intuitive and well designed.
To get latest version on Github:
webhdfs has been implemented as a S3 object and all common FileSystem related functions are coded as S3 methods. Since R provides some basic FileSystem functions like
write.* and etc, I try to name my functions in a similar logic but easy to find using auto-completion when actually typing. So you will find functions like
rename_file and etc.
It seems that in Hadoop itself, WebHDFS has been implemented as a subclass of FileSystem and there are a lot of others like FTP, S3 and (regular) HDFS that extend to this interface. I think it would be awesome if we do the same in R so data can be fetched/stored in a more transparent way from different FileSystem.
Discussion is more than welcomed on design decisions and choice on OO System. I have zero experience on OO programing in R and chose S3 based on the suggestions here: http://adv-r.had.co.nz/OO-essentials.html
Both Kerberos and delegation token security are implemented. Use the
securityON flag in
webhdfs constructor to enable security, if in addition
token is also supplied then delegation token will be used, otherwise Kerberos is assumed. However, I have not tested this feature just yet. Please report any issues you see.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.