In cturbelin/ifnBase: InfluenzaNet Data Analysis Base

knitr::opts_chunk$set(echo = TRUE)

Workspace organisation

Purpose

The purpose of this workspace organization is to allow script running on different settings (platform, files organization,...).

It also propose a project oriented organization of files to facilitate way of sharing scripts.

This way to organize script is not mandatory to use this package, but it provides helper functions to handle this organization. From our experience it really helps to share scripts, projects and make easier to run a script on different installation without modifying it.

The principle is to separate the running context (how machine is configured, what is the connexion to the DB and where the scripts are located on the computer !).

A summary of the following:

All scripts must never use absolute file path.
Scripts are in subdirectory and they are run in their own directory as working directory
location.R must contains all workspace specific variables (path to file outside the workspace, db connexion,...)

Global Organization

A Working environment is organized in two levels, workspace and projects :

workspace1
- project1
- ...
- project
- [share]

A workspace is a directory containing several R analysis projects. They share a workspace configuration (for example DB settings, a platform type, ...). The R scripts live in the projects directories (never at the workspace level).

Using git, a workspace is typically a repository, and project are sub directory, or sometimes submodules.

Using subversion (and external repositories feature), it is possible to share projects between workspaces. If the project uses the coding convention of the framework, it will run in the new workspace without any modification if the workspaces are for the same platform otherwise modifications will be limited to the differences between the platforms (new questions, ...) (if not, it's time to improve the framework).

On the root of the workspace directory, there are 2 files:

system.R : this is the global configuration of the workspace and the file called by all the projects in the workspace. It is under version. This file is minimal (just to load the local conf and the bootstrap system file). So all the code should be abstract from the running env (see "location" file). You shouldn't have to modify it
location.R : This is the environment specific configuration file, containing all the local configurations. It's not under version. We defined here variables specific of the worspace (and machine if they are not defined in .Rprofile).

The "share" directory is shared by all projects !

lib/ can contains shared library of functions, that are not (yet) in a package.
data/ should contains some data files
platform/ contains platform specific configuration files and internationalization a subdirectory with the same name as the platform file could also contain platform specific library

```r{eval=FALSE} share.lib('mylib', platform=T) # load the platform specific library named mylib

Others directories are projects.

## Organization of a project.

A *project* is whatever you want (all scripts for a kind of data, of analysis, or a theme...). You can organize projects as you want. Sometimes a project is a time delimited analysis (for an article) or a project will regroup all scripts run by the web server to producces the results of a platform, ...

To load the configuration and make its functions available, you just have to load the file in the workspace directory (upper level of the directory of a project).

So the first line of a script in a project should be to include it. (see next section running a script of a project)

By convention, all runable scripts must live in a project directory. If a file contains only function or variable definition it should be in a lib/ sub-directory or into a plateform specific directory.

```r
 source('../system.R')  # Caution, use a relative path, never an absolute path

Using "conf.R" file

The recommended convention is to create a little file named "conf.r" in each project directory.

This conf.R file will contain all the project-specific configuration. The first line (and sometimes the only one) of this file should be 'source("../system.R")'

Then, all the scripts in this project will simple load this little file in first place. The first line of all scripts in this project will become:

 source('conf.R')

Writing scripts of a project

All scripts should be run using their own directory as working directory, all paths must be relative to the project directory !

In other words, a project should only know (or directly access) one file outside its own directory : the "system.R" file, all others files should not be reached directly : never write something like .

 # BAD
source("../share/lib/survey.R")
 # Better
 share.lib("survey") # share.lib() function provided by the package is dedicated to load library in share/lib/ wherever it is located.

If a project needs a file outside its directory, it should be a shared library, so use share.lib() or share.data.path function or get the path in a variable defined in "location.R" (or a function that return the path).

Think that a project should not "know" that another project exists in the same workspace. If two projects need the same file, this is a good evidence that this file should be placed in "share/" (as a framework's global library or as a platform specific one)

 # BAD
read.csv("/Users/User/Documents/very-important-data.csv") # It will never work on someone else's computer
 # GOOD
read.csv(paste0(MY_FILE_LOCATION, 'very-important-data.csv')) 
# MY_FILE_LOCATION is defined in the location.R at the workspace root dir.

# GOOD also
read.csv( path_to_my_data('very-important-data.csv') ) 
# path_to_my_data() returns the path of a file in a location, it's defined in location.R

Constants and path function are just convention (names are defined by the developpers). Someone who wants to run the scripts just has to set where are the files on his computer.

Running a script of a project

To run a script, the working directory of R must be the directory of the project. All paths should be relative to this directory (never use absolute path).

    R --vanilla < myscript.r

If you want to handle command line parameters, you should use Rscript to run the script

 Rscript myscript.r param=value

Using Rscript to run the script will allow to catch the parameters passed on the command line (using parseArgs() from swMisc package for example).

Using RStudio

If you use RStudio, you have to create a RStudio project for each project in your workspace. By doing this, RStudio will always use the project directory as the working directory of R, and you will be able to use the "Run" or "Source" buttons directly.

cturbelin/ifnBase documentation built on Nov. 5, 2023, 12:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com