knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
A research compendium is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain:
DESCRIPTION,roxygen2, vignettes),testthat),SCIproj automates the creation of such a compendium, adding opinionated defaults for reproducible workflows (targets), dependency snapshots (renv), and FAIR-compliant metadata (CITATION.cff).
Install SCIproj from GitHub:
# install.packages("remotes") remotes::install_github("saskiaotto/SCIproj")
Create a new project with a single call:
library(SCIproj) create_proj("~/projects/my_analysis")
This creates a fully scaffolded research compendium with renv and targets enabled by default.
create_proj("~/projects/baltic_cod", add_license = "MIT", license_holder = "Jane Doe", orcid = "0000-0001-2345-6789", use_docker = TRUE, use_git = TRUE )
Directory names with underscores or hyphens are fine --- the R package name in DESCRIPTION is automatically sanitized (e.g., baltic_cod becomes baltic.cod).
After creation, the project directory looks like this:
your-project/ ├── DESCRIPTION # Project metadata, dependencies, and author info (with ORCID). ├── README.Rmd # Top-level project description. ├── your-project.Rproj # RStudio project file. ├── CITATION.cff # Machine-readable citation metadata for FAIR compliance. ├── CONTRIBUTING.md # Contribution guidelines. ├── LICENSE.md # Full license text (here: MIT). ├── NAMESPACE # Auto-generated by roxygen2 (do not edit by hand). │ ├── data-raw/ # Raw data files and pre-processing scripts. │ ├── clean_data.R # Script template for data cleaning. │ ├── DATA_SOURCES.md # Data provenance: source, license, DOI, download date. │ └── ... │ ├── data/ # Cleaned datasets stored as .rda files. │ ├── R/ # Custom R functions and dataset documentation. │ ├── function_ex.R # Template for custom functions. │ ├── data.R # Template for dataset documentation. │ └── ... │ ├── analyses/ # R scripts or R Markdown/Quarto documents for analyses. │ ├── figures/ # Generated plots. │ └── ... │ ├── docs/ # Publication-ready documents (article, report, presentation). ├── trash/ # Temporary files that can be safely deleted. │ ├── _targets.R # Pipeline definition for reproducible workflow. ├── renv/ # renv library and settings. ├── renv.lock # Lockfile for reproducible package versions. └── Dockerfile # Container definition for full reproducibility.
| Directory / File | Purpose |
|---------------------|------------------------------------------------------|
| R/ | Reusable R functions (documented with roxygen2) |
| data/ | Cleaned, analysis-ready datasets (.rda format) |
| data-raw/ | Raw data files and the script that cleans them |
| analyses/ | Analysis scripts, R Markdown reports, figures |
| docs/ | Manuscripts, presentations, supplementary material |
| trash/ | Temporary files not under version control |
| _targets.R | Pipeline definition for targets |
| CITATION.cff | Machine-readable citation metadata |
| CONTRIBUTING.md | Guidelines for collaborators |
SCIproj encourages FAIR (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features:
A Citation File Format file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations.
create_proj("my_project", license_holder = "Jane Doe", orcid = "0000-0001-2345-6789", add_license = "MIT" )
When data_raw = TRUE (the default), a DATA_SOURCES.md template is placed in data-raw/. Use it to document the provenance of every dataset: source, URL, DOI, license, download date, and file names.
Pass your ORCID iD via the orcid parameter to embed it in CITATION.cff, making your authorship unambiguously machine-readable.
By default (use_targets = TRUE), SCIproj adds a _targets.R pipeline template. The targets package provides:
_targets/ data store.tar_visnetwork() shows the pipeline as a graph.A typical workflow:
# 1. Define targets in _targets.R # 2. Inspect the pipeline targets::tar_manifest() targets::tar_visnetwork() # 3. Run the pipeline targets::tar_make() # 4. Read a result targets::tar_read(my_result)
Edit _targets.R to define your data-loading, analysis, and reporting steps. Each step is a target that depends on upstream targets and R functions in R/.
By default (use_renv = TRUE), SCIproj initializes renv with the "explicit" snapshot type.
This means renv discovers dependencies from DESCRIPTION rather than scanning all R files, which is the recommended approach for package-based compendia.
Key commands:
renv::status() # check if lockfile is in sync renv::snapshot() # update the lockfile after adding packages renv::restore() # reinstall packages from the lockfile
The renv.lock file should be committed to version control so collaborators can reproduce your exact package versions.
Set use_docker = TRUE to add a Dockerfile and .dockerignore. The Dockerfile provides a template for building a container that reproduces your computational environment, independent of the host system.
Set create_github_repo = TRUE to create a GitHub repository (requires a configured GITHUB_PAT). Add ci = "gh-actions" to include a GitHub Actions workflow for automated R CMD check on push.
create_proj("my_project", use_git = TRUE, create_github_repo = TRUE, ci = "gh-actions" )
Choose from "MIT", "GPL", "AGPL", "LGPL", "Apache", "CCBY", or"CC0" via the add_license parameter. The selected license is applied to DESCRIPTION and recorded in CITATION.cff.
Set testthat = TRUE to add testing infrastructure (tests/testthat.R and tests/testthat/). Writing tests for your analysis functions helps catch regressions early.
Set makefile = TRUE to add a makefile.R script as an alternative to targets for orchestrating your workflow.
r
SCIproj::create_proj("~/projects/my_study", add_license = "MIT",
license_holder = "Your Name").Rproj file in RStudio.data-raw/ and document it in DATA_SOURCES.md.data-raw/clean_data.R; save cleaned data to data/ with usethis::use_data().R/ and document them with roxygen2._targets.R to connect data, functions, and reports.targets::tar_make() to execute the pipeline.analyses/ using R Markdown or Quarto, reading results with targets::tar_read().renv::snapshot() before sharing.R CMD check automatically.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.