FastRWeb - infratructure to serve web pages with R scripts efficiently

Description

FastRWeb is not just a package, but an entire infrastructure allowing the use of R scripts to create web pages and graphics.

The basic idea is that an URL of the form http://server/cgi-bin/R/foo?bar=value will be processed by FastRWeb such as to result in sourcing of the foo.R script and running the function run(bar="value") which is expected to be defined in that script. The results of a script can be anything from HTML pages to bitmap graphics or PDF document.

FastRWeb uses CGI or PHP as front-end and Rserve server as the back-end. For details see Urbanek, S. (2008) FastRWeb: Fast Interactive Web Framework for Data Mining Using R, IASC 2008.

The R code in the package itself provides R-side tools that facilitate the delivery of results to a browser - such as WebResult, WebPlot, out, done - more in detail below.

Installation

The default configuration of FastRWeb assumes that the project root will be in /var/FastRWeb and that the server is a unix machine. It is possible to install FastRWeb in other settings, but it will require modification of the configuration.

First, the FastRWeb package should be installed (typically using install.packages("FastRWeb") in R). The installed package contains shell script that will setup the environment in /var/FastRWeb. To run the script, use

system(paste("cd",system.file(package="FastRWeb"),"&& install.sh"))

For the anatomy of the /var/FastRWeb project root see below.

Once created, you can inspect the Rserve configuration file /var/FastRWeb/code/rserve.conf and adjust it for your needs if necessary. You can also look a the Rserve initialization script located in /var/FastRWeb/code/rserve.R which is used to pre-load data, packages etc. into Rserve. If you are happy with it, you can start Rserve using /var/FastRWeb/code/start

In order to tell your webserver to use FastRWeb, you have two options: CGI script or PHP script. The former is more common as it works with any web server. The FastRWeb R package builds and installs the Rcgi script as part of its installation process into the cgi-bin directory of the package, but it has no way of knowing about the location of your server's cgi-bin directory, so it is left to the user to copy the script in the proper location. Use system.file("cgi-bin", package="FastRWeb") in R to locate the package directory - it will contain an executable Rcgi (or Rcgi.exe on Windows) and copy that executable into you server's cgi-bin directory (on Debian/Ubuntu this is typically /usr/lib/cgi-bin, on Mac OS X it is /Library/WebServer/CGI-Executables). Most examples in FastRWeb assume that you have renamed the script to R instead of Rcgi, but you can choose any name.

With Rserve started and the CGI script in place, you should be able to open a browser and run your first script, the URL will probably look something like http://my.server/cgi-bin/R/main. This will invoke the script /var/FastRWeb/web.R/main.R by sourcing it and running the run() function.

For advanced topics, please see Rserve documentation. For production systems we encourage the use of gid, uid, sockmod and umask configuration directives to secure the access to the Rserve according to your web server configuration.

Project root anatomy

The project root (typically var/FastRWeb) contains various directories:

  • web.R - this directory contains the R scripts that will be served by FastRWeb. The URL is parsed such that the path part after the CGI binary is taken, .R appended and serves to locate the file in the web.R directory. Once located, it is sourced and the run() function is called with query strang parsed into its arguments. The default installation also sources common.R in addition to the specified script (see code/rserve.R and the init() function for details on how this is achieved - you can modify the behavior as you please).

  • web - this directory can contain static content that can be referenced using the "file" command in WebResult.

  • code - this directory contains supporting infrastructure and configurations files in association with the Rserve back-end. If the start script in this directory is used, it loads the rserve.conf configuration file and sources rserve.R as initialization of the Rserve master. The init() function (if present, e.g., defined in rserve.R) is run on every request.

  • tmp - this directory is used for temporary files. It should be purged occasionally to prevent accumulation of temporary files. FastRWeb provides ways of cleanup (e.g., see "tmpfile" command in WebResult), but crashed or aborted requests may still leave temporary files around. Onyl files from this directory can be served using the "tmpfile" WebResult command.

  • logs - this directory is optional and if present, the Rcgi script will log requests in the cgi.log file in this directory. It records the request time, duration, IP address, WebResult command, payload, optional cookie filter and the user-agent. If you want to enable logging, simply create the logs directory with sufficient permissions to allow the Rcgi script to write in it.

  • run - this directory is optional as well and used for run-time systems such as global login authorization etc. It is not populated or used in the CRAN version of FastRWeb, but we encourage this structure for any user-defined subsystems.

In addition, the default configuration uses a local socket of the name socket to communicate with the Rserve instance. Note that you can use regular unix permissions to limit the access to Rserve this way.

See Also

WebResult, WebPlot, out, done, add.header