project-package: Organise a R data analysis project

Description Details Author(s)

Description

When working with R on a data analysis project, data files, scripts and output files multiply very quickly.

The purpose of this package is to help the user to keep things organised and improve readibility, maintainability an reusability of its R projects. Moreover it improves the productivity by saving a few seconds very often: it reduces the time spent searching, loading and saving data and scripts.

This package is designed to be used with the Rstudio projects system.

Details

Package: project
Type: Package
Version: 0.5
Date: 2016-05-19
License: GPL (>= 2)

A data analysis project uses one or several data sources and a bunch of scripts to answer questions about some subject and to generate material that will help the diffusion of the results of the project. Ths quality of the answers is not the only critorion of quality of a project: it should also be easy to read and the results should be easy to reproduce.

The purpose of this package is to improve these two aspects by encouraging the user to adopt some simple conventions: every project has the same structure, the scripts always do the same type of operations and are heavily commented, the data is always in the same folder and is also documented, etc.

Project setup

To begin with this package, one has to first create an empty project in Rstudio. Then one can use function prInit to setup the project and create a file tree. More precisely prInit generates three directories: data for storing data files, scripts for storing R scripts and output for storing output files.

Additionally, three scripts are created and opened in the script editor: data.R for the import and processing of data, main for the analysis of the data and start for all instructions that have to be exetuted everytime the project is opened (load libraries, define constants and functions, etc.).

When the user opens the project in Rstudio, the script start is automatically executed. By convention, all scripts starting with the prefix "tools" are also executed. It can be useful to automatically load important specific functions, constants or data.

Script management

The user can open and/or create scripts with the function prScript. He just has to give to the function the name of the script (without the extention ".R") and eventually a subdirectory. The fucntion searches in the folder "scripts" the corresponding file. If it does not exists, it is created. In all cases it is opened in the script editor.

The scripts created by prScript contains some comments that encourages the user to explain what the script does and eventually what were the results.

although it is not mandatory, the user should use some conventions in the naming of its scripts: a script that process data should start with the prefix "data" while a script that analyses data should start with prefix "analysis". Other scripts generally contain helper functions and their name should start with "tools". They will be sourced when the project is opened.

When possible, data operations and analysis operations should be put in different scripts. If the data operations are long, then the data script should save the final data objects and the analysis script should load the saved data.

Scripts can be sourced with function prSource. It is a wrapper to the function source but with the same interface as prScript.

Data management

Every R object can be saved with prSave. The function just needs the name of the object (quoted) and it will save it in the right place so one has not to remember the location where objects are saved. prSave can also attach to an object a description that will be displayed when the object is loaded.

To load data, simply use prLoad.

Author(s)

François Guillem

Maintainer: François Guillem <guillem.francois@gmail.com>


cuche27/project documentation built on May 14, 2019, 12:50 p.m.