knitr::opts_chunk$set(echo = TRUE)
The purpose of this workspace organization is to allow script running on different settings (platform, files organization,...).
It also propose a project oriented organization of files to facilitate way of sharing scripts.
This way to organize script is not mandatory to use this package, but it provides helper functions to handle this organization. From our experience it really helps to share scripts, projects and make easier to run a script on different installation without modifying it.
The principle is to separate the running context (how machine is configured, what is the connexion to the DB and where the scripts are located on the computer !).
A summary of the following:
A Working environment is organized in two levels, workspace and projects :
A workspace is a directory containing several R analysis projects. They share a workspace configuration (for example DB settings, a platform type, ...). The R scripts live in the projects directories (never at the workspace level).
Using git, a workspace is typically a repository, and project are sub directory, or sometimes submodules.
Using subversion (and external repositories feature), it is possible to share projects between workspaces. If the project uses the coding convention of the framework, it will run in the new workspace without any modification if the workspaces are for the same platform otherwise modifications will be limited to the differences between the platforms (new questions, ...) (if not, it's time to improve the framework).
On the root of the workspace directory, there are 2 files:
system.R : this is the global configuration of the workspace and the file called by all the projects in the workspace. It is under version. This file is minimal (just to load the local conf and the bootstrap system file). So all the code should be abstract from the running env (see "location" file). You shouldn't have to modify it
location.R : This is the environment specific configuration file, containing all the local configurations. It's not under version. We defined here variables specific of the worspace (and machine if they are not defined in .Rprofile).
```r{eval=FALSE} share.lib('mylib', platform=T) # load the platform specific library named mylib
Others directories are projects. ## Organization of a project. A *project* is whatever you want (all scripts for a kind of data, of analysis, or a theme...). You can organize projects as you want. Sometimes a project is a time delimited analysis (for an article) or a project will regroup all scripts run by the web server to producces the results of a platform, ... To load the configuration and make its functions available, you just have to load the file in the workspace directory (upper level of the directory of a project). So the first line of a script in a project should be to include it. (see next section running a script of a project) By convention, all runable scripts must live in a project directory. If a file contains only function or variable definition it should be in a lib/ sub-directory or into a plateform specific directory. ```r source('../system.R') # Caution, use a relative path, never an absolute path
The recommended convention is to create a little file named "conf.r" in each project directory.
This conf.R file will contain all the project-specific configuration. The first line (and sometimes the only one) of this file should be 'source("../system.R")'
Then, all the scripts in this project will simple load this little file in first place. The first line of all scripts in this project will become:
source('conf.R')
All scripts should be run using their own directory as working directory, all paths must be relative to the project directory !
In other words, a project should only know (or directly access) one file outside its own directory : the "system.R" file, all others files should not be reached directly : never write something like .
# BAD source("../share/lib/survey.R") # Better share.lib("survey") # share.lib() function provided by the package is dedicated to load library in share/lib/ wherever it is located.
If a project needs a file outside its directory, it should be a shared library, so use share.lib
() or share.data.path
function or get the path in a variable defined in "location.R" (or a function that return the path).
Think that a project should not "know" that another project exists in the same workspace. If two projects need the same file, this is a good evidence that this file should be placed in "share/" (as a framework's global library or as a platform specific one)
# BAD read.csv("/Users/User/Documents/very-important-data.csv") # It will never work on someone else's computer # GOOD read.csv(paste0(MY_FILE_LOCATION, 'very-important-data.csv')) # MY_FILE_LOCATION is defined in the location.R at the workspace root dir. # GOOD also read.csv( path_to_my_data('very-important-data.csv') ) # path_to_my_data() returns the path of a file in a location, it's defined in location.R
Constants and path function are just convention (names are defined by the developpers). Someone who wants to run the scripts just has to set where are the files on his computer.
To run a script, the working directory of R must be the directory of the project. All paths should be relative to this directory (never use absolute path).
R --vanilla < myscript.r
If you want to handle command line parameters, you should use Rscript to run the script
Rscript myscript.r param=value
Using Rscript to run the script will allow to catch the parameters passed on the command line (using parseArgs()
from swMisc package for example).
If you use RStudio, you have to create a RStudio project for each project in your workspace. By doing this, RStudio will always use the project directory as the working directory of R, and you will be able to use the "Run" or "Source" buttons directly.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.