
An R package for the initialization and organization of a scientific project following reproducible research and FAIR principles.
SCIproj is an R package that allows users to initialize a project
through its function create_proj() and manage a scientific project as
an R package or a research compendium. This combines structure,
where files are located, and workflow, how analyses are reproduced
or replicated.
The package is built on modern reproducibility standards and guidelines such as:
The package has some default settings to ensure reproducibility. These include:
your-project/
├── DESCRIPTION # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd # Top-level project description.
├── your-project.Rproj # RStudio project file.
├── CITATION.cff # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md # Contribution guidelines.
├── LICENSE.md # Full license text (optional, requires add_license).
├── NAMESPACE # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/ # Raw data files and pre-processing scripts.
│ ├── clean_data.R # Script template for data cleaning.
│ ├── DATA_SOURCES.md # Data provenance: source, license, DOI, download date.
│ └── ...
│
├── data/ # Cleaned datasets stored as .rda files.
│
├── R/ # Custom R functions and dataset documentation.
│ ├── function_ex.R # Template for custom functions.
│ ├── data.R # Template for dataset documentation.
│ └── ...
│
├── analyses/ # R scripts or R Markdown/Quarto documents for analyses.
│ ├── figures/ # Generated plots.
│ └── ...
│
├── docs/ # Publication-ready documents (article, report, presentation).
├── trash/ # Temporary files that can be safely deleted.
│
├── _targets.R # Pipeline definition for reproducible workflow (default).
├── renv/ # renv library and settings (default).
├── renv.lock # Lockfile for reproducible package versions (default).
└── Dockerfile # Container definition for full reproducibility (optional).
targets pipeline tracks dependencies
automatically - only re-run what changed.renv ensures the exact same package
versions are used everywhere.CITATION.cff makes the project
machine-readable and citable, DATA_SOURCES.md documents data
provenance.devtools::load_all() instantly makes all clean
datasets and custom functions available.Install the development version from GitHub:
### Using remotes
# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")
### Or better: using the new pak package
# install.packages("pak")
pak::pkg_install("saskiaotto/SCIproj")
library("SCIproj")
create_proj("my_research_project")
This creates a project with renv, targets, CITATION.cff, and
DATA_SOURCES.md by default.
Customize with parameters:
### Full-featured project with GitHub, CI, and ORCID
create_proj("my_research_project",
add_license = "MIT",
license_holder = "Jane Doe",
orcid = "0000-0001-2345-67893",
create_github_repo = TRUE,
ci = "gh-actions"
)
### Minimal project without workflow tools
create_proj("my_research_project",
use_renv = FALSE,
use_targets = FALSE
)
| Parameter | Default | Description |
|:---|:---|:---|
| data_raw | TRUE | Add data-raw/ folder with templates |
| makefile | FALSE | Add makefile.R template |
| testthat | FALSE | Add testthat infrastructure |
| use_pipe | FALSE | Add magrittr pipe (native \|> recommended) |
| add_license | NULL | License type: "MIT", "GPL", "Apache", etc. |
| license_holder | "Your name" | License holder / project author |
| orcid | NULL | ORCID iD for CITATION.cff |
| use_git | TRUE | Initialize local git repo |
| create_github_repo | FALSE | Create GitHub repo (needs GITHUB_PAT) |
| ci | "none" | CI type: "none" or "gh-actions" |
| use_renv | TRUE | Initialize renv for dependency management |
| use_targets | TRUE | Add _targets.R pipeline template |
| use_docker | FALSE | Add Dockerfile template |
| open_proj | FALSE | Open new project in RStudio |
Create the project with create_proj().
Edit DESCRIPTION with project metadata: title, summary,
contributors (with ORCID), license, dependencies.
Edit README.Rmd with project details: objectives, timeline,
workflow.
Document your data provenance in data-raw/DATA_SOURCES.md:
source, license, download date, DOI for each dataset.
Place original (raw) data in data-raw/. Use clean_data.R (or
more scripts) for pre-processing. Store clean datasets with
usethis::use_data().
Document clean datasets using roxygen in R/ (see template
data.R). For details, see Documenting
data.
Place custom functions in R/ with roxygen documentation. See the
documentation chapter in the R
Packages book.
Write tests for your functions in tests/ (set testthat = TRUE in
create_proj()). See Testing
basics.
Place analysis scripts/notebooks in analyses/. Save plots in
analyses/figures/.
Place final manuscripts, reports, and presentations in docs/. Use
R Markdown, Quarto, or templates from
rticles,
thesisdown, or Quarto
journal
extensions.
Keep dependencies in sync: usethis::use_package() for DESCRIPTION,
renv::snapshot() for the lockfile.
Update CITATION.cff when you archive your project or publish.
devtools::load_all() or Ctrl/Cmd +
Shift + L in RStudio.devtools::document() or Ctrl/Cmd +
Shift + D.devtools::test() or Ctrl/Cmd + Shift + T.targets::tar_make() to execute all targets.
targets::tar_visnetwork() to visualize dependencies.renv::snapshot() after installing or updating
packages.For a detailed introduction to targets, see the user
manual.
For maximum reproducibility, consider also using
Docker (use_docker = TRUE). See the Rocker
Project for R-specific Docker images.
When your project is finalized:
CITATION.cff with the DOI.codemeta.json with
codemetar::write_codemeta() for richer metadata.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.