Repana is an opinionated framework, meaning that the project's structure
must be predefined to determine where different types of files are
stored. The structure of repana is governed by the config.yml
file,
and the repana::make_structure()
function aids in constructing the
directory layout. If no config.yml
is present, make_structure()
generates one.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The default structure is established using the make_structure()
function, which creates a config.yml file with predefined items for the
Repana package.
default: dirs: data: _data functions: _functions handmade: handmade database: database reports: reports logs: logs clean_before_new_analysis: - database - reports - logs defaultdb: package: duckdb dbconnect: duckdb read_only: FALSE template: _template.txt
The dirs
section defines the directories that the structure should
maintain. Each entry consists of a nickname for the directory and its
corresponding physical location. The get_dirs()
function returns the
physical location within programs.
For example, using the default definition, get_dirs("data") returns "_data". This abstraction allows program logic to remain separate from the actual physical directory names, enabling different users to use the same programs without modification, even if the physical locations differ.
By default, six directories are defined, each serving a specific purpose:
| Entry | Purpose | |------ ----|----------------------------------------------------------------| | data | Input data to the project | | functions | Functions used in the project | | handmade | Files created not using programs in the project | | database | Database and other secondary files created by the project | | reports | Reports, graphs, files and other output created by the project | | logs | Log of executed files |
: Directories defined in config.yml
Note: The handmade directory is crucial for maintaining the spirit of reproducible analysis. While all project output should ideally stem from program actions on inputs, the handmade directory serves as a space for files modified by hand or kept for reference.
As mentioned earlier, the essence of reproducible analysis involves
being able to reproduce project outputs with the same inputs. To ensure
outputs are produced by a new analysis, it is recommended to delete
existing outputs before recreating them. The clean_before_new_analysis
section specifies the directories deleted before a new analysis. The
make_structure()
function updates the .gitignore file to exclude these
directories from git version control.
WARNING: The clean_structure()
function will delete all directories listed
under the clean_before_new_analysis
entry.
This section defines the arguments needed to create a connection with a
database using the DBI
system. Multiple connections can be defined under
new entries. The get_con()
function establishes a connection based on
the information in the config.yml file. Refer to the
Database configuration
Vignette for detailed instructions on setting up and using
database connections.
If using the RStudio IDE, the package installs an addin named "Repana insert template," which inserts a default template for program documentation. This default template can be modified, and if a different file is used, the template section informs the system of its location. See the Modifying the template on how to use and modify the template.
A workflow using GitHub and repana in RStudio would be
Create the project in GitHub
Update the README.md file
Copy the URL link of the project
In RStudio, create a new project from "Version Control", Select Git and fill in the URL link of the project and the location
Once the project is created, run repana::make_structure()
function
Your new project is ready.
Share the config.yml file to your collaborators so they can adapt to local conditions. The config.yml is included in .gitignore and not uploaded to GitHub to allow each collaborator to have its own definition.
Update the project and create new programs (e.g. 01_xxx
, 02_xxx
, etc.)
Run the project programs using repana::master()
WARNING by default, the _data
directory is not include in the .gitignore file.
Consider to include it if the _data
directory contains sensitive
information that should not be uploaded to GitHub. This directory could be
shared between collaborators using a different method.
For more information, see the Repana Documentation.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.