R/wflow_start.R

Defines functions check_for_existing_git_directory check_rstudio_version print.wflow_start wflow_start_ wflow_start

Documented in wflow_start

#' Start a new workflowr project
#'
#' \code{wflow_start} creates a directory with the essential files for
#' a workflowr project. The default behavior is to add these files to
#' a new directory, but it is also possible to populate an existing
#' directory. By default, the working directory is changed to the
#' workflowr project directory.
#'
#' This is recommended function to set up the file infrastructure for
#' a workflowr project. If you are using RStudio, you can also create
#' a new workflowr project as an "RStudio Project Template". Go to
#' "File" -> "New Project..." then select "workflowr project" from the
#' list of project types. In the future, you can return to your
#' project by choosing menu option "Open Project..." and selecting the
#' \code{.Rproj} file located at the root of the workflowr project
#' directory. In RStudio, opening this file will change the working
#' directory to the appropriate location, set the file navigator to
#' the workflowr project directory, and configure the Git pane.
#'
#' \code{wflow_start} populates the chosen directory with the
#' following files:
#'
#' \preformatted{|--- .gitignore
#' |--- .Rprofile
#' |--- _workflowr.yml
#' |--- analysis/
#' |   |--- about.Rmd
#' |   |--- index.Rmd
#' |   |--- license.Rmd
#' |   |--- _site.yml
#' |--- code/
#' |   |--- README.md
#' |--- data/
#' |   |--- README.md
#' |--- docs/
#' |--- <directory>.Rproj
#' |--- output/
#' |   |--- README.md
#' |--- README.md
#' }
#'
#' The two \bold{required} subdirectories are \code{analysis/} and
#' \code{docs/}. These directories should never be removed from the
#' workflowr project.
#'
#' \code{analysis/} contains all the source R Markdown files that
#' implement the analyses for your project. It contains a special R
#' Markdown file, \code{index.Rmd}, that typically does not include R
#' code, and is will be used to generate \code{index.html}, the
#' homepage for the project website.  Additionally, this directory
#' contains the important configuration file \code{_site.yml}. The
#' website theme, navigation bar, and other properties can be
#' controlled through this file (for more details see the
#' documentation on
#' \href{https://bookdown.org/yihui/rmarkdown/rmarkdown-site.html}{R
#' Markdown websites}). Do not delete \code{index.Rmd} or
#' \code{_site.yml}.
#'
#' \code{docs/} will contain all the webpages generated from the R
#' Markdown files in \code{analysis/}. Any figures generated by
#' rendering the R Markdown files are also stored here. Each figure is
#' saved according to the following convention:
#' \code{docs/figure/<Rmd-filename>/<chunk-name>-#.png}, where
#' \code{#} corresponds to which of the plots the chunk generated (one
#' chunk can produce several plots).
#'
#' \code{_workflowr.yml} is an additional configuration file used only
#' by workflowr. It is used to apply the workflowr reproducibility
#' checks consistently across all R Markdown files. The most important
#' setting is \code{knit_root_dir} which determines the directory
#' where the scripts in \code{analysis/} are executed. The default is
#' to run code from the project root (\emph{i.e.,} \code{"."}). To
#' execute the code from \code{analysis/}, for example, change the
#' setting to \code{knit_root_dir: "analysis"}. See
#' \code{\link{wflow_html}} for more details.
#'
#' Another required file is the RStudio project file (ending in
#' \code{.Rproj}). \emph{Do not delete this file even if you do not
#' use RStudio; among other essential tasks, it is used to determine
#' the project root directory.}
#'
#' The \bold{optional} directories are \code{data/}, \code{code/}, and
#' \code{output/}. These directories are suggestions for organizing
#' your workflowr project and can be removed if you do not find them
#' relevant to your project.
#'
#' \code{data/} should be used to store "raw" (unprocessed) data
#' files.
#'
#' \code{code/} should be used to store additional code that might not
#' be appropriate to include in R Markdown files (e.g., code to
#' preprocess the data, long-running scripts, or functions that are
#' used in multiple R Markdown files).
#'
#' \code{output/} should be used to store processed data files and
#' other outputs generated from the code and analyses. For example,
#' scripts in \code{code/} that pre-process raw data files from
#' \code{data/} should save the processed data files in
#' \code{output/}.
#'
#' All these subdirectories except for \code{docs/} include a README
#' file summarizing the contents of the subdirectory, and can be
#' modified as desired, for example, to document the files stored in
#' each directory.
#'
#' \code{.Rprofile} is an optional file in the root directory of the
#' workflowr project containing R code that is executed whenever the
#' \code{.Rproj} file is loaded in RStudio, or whenever R is started
#' up inside the project root directory. This file includes the line
#' of code \code{library("workflowr")} to ensure that the workflowr
#' package is loaded.
#'
#' Finally, \code{.gitignore} is an optional file that indicates to
#' Git which files should be ignored---that is, files that are never
#' committed to the repository. Some suggested files to ignore such as
#' \code{.Rhistory} and \code{.Rdata} are listed here.
#'
#' @note Do not delete the file \code{.Rproj} even if you do not use
#'   RStudio; workflowr will not work correctly unless this file is
#'   there.
#'
#' @param directory character. The directory where the workflowr
#'   project files will be added, e.g., "~/my-wflow-project". When
#'   \code{existing = FALSE}, the directory will be created.
#'
#' @param name character (default: \code{NULL}). The name of the
#'   project, e.g. "My Workflowr Project". When \code{name = NULL}, the
#'   project name is automatically determined based on
#'   \code{directory}. For example, if \code{directory =
#'   "~/projects/my-wflow-project"}, then \code{name} is set to
#'   \code{"my-wflow-project"}. The project name is displayed on the
#'   website's navigation bar and in the \code{README.md} file.
#'
#' @param git logical (default: \code{TRUE}). Should the workflowr files be
#'   committed with Git? If \code{git = TRUE} and no existing Git repository is
#'   detected, \code{wflow_start} will initialize the repository and make an
#'   initial commit. If a Git repository already exists in the chosen directory,
#'   \code{wflow_start} will commit any newly created or modified files to the
#'   existing repository (also need to set \code{existing = TRUE}). If \code{git
#'   = FALSE}, \code{wflow_start} will not perform any Git commands.
#'
#' @param existing logical (default: \code{FALSE}). Indicate whether
#'   \code{directory} already exists. This argument is added to prevent
#'   accidental creation of files in an existing directory; setting
#'   \code{existing = FALSE} prevents files from being created if the
#'   specified directory already exists.
#'
#' @param overwrite logical (default: \code{FALSE}). Similar to
#'   \code{existing}, this argument prevents files from accidentally
#'   being overwritten when \code{overwrite = FALSE}. When
#'   \code{overwrite = TRUE}, any existing file in \code{directory} that
#'   has the same name as a workflowr file will be replaced by the
#'   workflowr file. When \code{git = TRUE}, all the standard workflowr
#'   files will be added and committed (regardless of whether they were
#'   overwritten or still contain the original content).
#'
#' @param change_wd logical (default: \code{TRUE}). Change the working
#'   directory to the \code{directory}.
#'
#' @param disable_remote logical (default: \code{FALSE}). Create a Git
#'   \href{https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks}{pre-push
#'   hook} that prevents pushing to a remote Git repository (i.e. using
#'   \code{\link{wflow_git_push}}). This is useful for extremely confidential
#'   projects that cannot be shared via an online Git hosting service (e.g.
#'   GitHub or GitLab). The hook is saved in the file
#'   \code{.git/hooks/pre-push}. If you change your mind and want to push the
#'   repository, you can delete that file. Note that this option is only
#'   available if \code{git = TRUE}. Note that this is currently only supported
#'   for Linux and macOS.
#'
#' @param dry_run logical (default: \code{FALSE}). When \code{dry_run
#'   = TRUE}, the actions are previewed without executing them.
#'
#' @param user.name character (default: \code{NULL}). The user name
#'   used by Git to sign commits, e.g., "Ada Lovelace". This setting
#'   only applies to the workflowr project being created. To specify the
#'   global setting for the Git user name, use
#'   \code{\link{wflow_git_config}} instead. When \code{user.name =
#'   NULL}, no user name is recorded for the project, and the global
#'   setting will be used. This setting can be modified later
#'   by running \code{git config --local} in the Terminal.
#'
#' @param user.email character (default: \code{NULL}). The email
#'   address used by Git to sign commits, e.g.,
#'   "ada.lovelace@ox.ac.uk". This setting only applies to the workflowr
#'   project being created. To specify the global setting for the Git
#'   email address, use \code{\link{wflow_git_config}} instead. When
#'   \code{user.name = NULL}, no email address is recorded for the
#'   project, and the global setting will be used. This setting can be
#'   modified later by running \code{git config --local} in the Terminal.
#'
#' @return An object of class \code{wflow_start}, which is a list with the
#'   following elements:
#'
#'    \item{directory}{The input argument \code{directory}.}
#'
#'    \item{name}{The input argument \code{name}.}
#'
#'    \item{git}{The input argument \code{git}.}
#'
#'    \item{existing}{The input argument \code{existing}.}
#'
#'    \item{overwrite}{The input argument \code{overwrite}.}
#'
#'    \item{change_wd}{The input argument \code{change_wd}.}
#'
#'    \item{disable_remote}{The input argument \code{disable_remote}.}
#'
#'    \item{dry_run}{The input argument \code{dry_run}.}
#'
#'    \item{user.name}{The input argument \code{user.name}.}
#'
#'    \item{user.email}{The input argument \code{user.email}.}
#'
#'    \item{commit}{The object returned by
#'    \link{git2r}::\code{\link[git2r]{commit}}, or \code{NULL} if \code{git =
#'    FALSE}.}
#'
#' @seealso vignette("wflow-01-getting-started")
#'
#' @examples
#' \dontrun{
#'
#' wflow_start("path/to/new-project")
#'
#' # Provide a custom name for the project.
#' wflow_start("path/to/new-project", name = "My Project")
#'
#' # Preview what wflow_start would do
#' wflow_start("path/to/new-project", dry_run = TRUE)
#'
#' # Add workflowr files to an existing project.
#' wflow_start("path/to/current-project", existing = TRUE)
#'
#' # Add workflowr files to an existing project, but do not automatically
#' # commit them.
#' wflow_start("path/to/current-project", git = FALSE, existing = TRUE)
#' }
#'
#' @export
wflow_start <- function(directory,
                        name = NULL,
                        git = TRUE,
                        existing = FALSE,
                        overwrite = FALSE,
                        change_wd = TRUE,
                        disable_remote = FALSE,
                        dry_run = FALSE,
                        user.name = NULL,
                        user.email = NULL) {

  # Check input arguments ------------------------------------------------------

  if (!is.character(directory) | length(directory) != 1)
    stop("directory must be a one element character vector: ", directory)
  if (!(is.null(name) | (is.character(name) | length(name) != 1)))
    stop("name must be NULL or a one element character vector: ", name)
  assert_is_flag(git)
  assert_is_flag(existing)
  assert_is_flag(overwrite)
  if (overwrite && !existing) {
    stop("Cannot overwrite non-existent project. Set existing = TRUE if you wish to overwrite existing workflowr files.")
  }
  assert_is_flag(change_wd)
  assert_is_flag(disable_remote)
  assert_is_flag(dry_run)
  if (!(is.null(user.name) | (is.character(user.name) | length(user.name) != 1)))
    stop("user.name must be NULL or a one element character vector: ", user.name)
  if (!(is.null(user.email) | (is.character(user.email) | length(user.email) != 1)))
    stop("user.email must be NULL or a one element character vector: ", user.email)
  if ((is.null(user.name) && !is.null(user.email)) ||
      (!is.null(user.name) && is.null(user.email)))
    stop("Must specify both user.name and user.email, or neither.")

  check_wd_exists()

  if (!existing & fs::dir_exists(directory)) {
    stop("Directory already exists. Set existing = TRUE if you wish to add workflowr files to an already existing project.")
  } else if (existing & !fs::dir_exists(directory)) {
    stop("Directory does not exist. Set existing = FALSE to create a new directory for the workflowr files.")
  }

  directory <- absolute(directory)

  # A workflowr directory cannot be created within an existing Git repository if
  # git = TRUE & existing = FALSE.
  if (git & !existing) {
    check_for_existing_git_directory(directory)
  }

  # Require that user.name and user.email be set locally or globally
  if (git && is.null(user.name) && is.null(user.email)) {
    check_git_config(path = directory, "`wflow_start` with `git = TRUE`")
  }

  # Do not allow git = FALSE and disable_remote = TRUE
  if (!git && disable_remote) {
    stop("disable_remote is only available if git=TRUE")
  }

  # Do not allow disable_remote = TRUE on Windows
  if (disable_remote && .Platform$OS.type == "windows") {
    stop("disable_remote is not available on Windows")
  }

  do.call(wflow_start_, args = as.list(environment()))
}

wflow_start_ <- function() {}
formals(wflow_start_) <- formals(wflow_start)
body(wflow_start_) <- quote({

  # Create directory if it doesn't already exist
  if (!existing && !fs::dir_exists(directory) && !dry_run) {
    fs::dir_create(directory)
  }

  # Convert to absolute path. Needs to be run again after creating the directory
  # because symlinks can only resolved for existing directories.
  directory <- absolute(directory)

  # Configure name of workflowr project
  if (is.null(name)) {
    name <- basename(directory)
  }

  # Get variables to interpolate into _workflowr.yml
  wflow_version <- as.character(utils::packageVersion("workflowr"))
  the_seed_to_set <- as.numeric(format(Sys.Date(), "%Y%m%d")) # YYYYMMDD

  # Add files ------------------------------------------------------------------

  # Use templates defined in R/infrastructure.R
  names(templates)[which(names(templates) == "Rproj")] <-
    glue::glue("{basename(directory)}.Rproj")
  names(templates) <- file.path(directory, names(templates))
  project_files <- names(templates)

  # Create subdirectories
  subdirs <- file.path(directory, c("analysis", "code", "data", "docs",
                                    "output"))
  if (!dry_run) {
    fs::dir_create(subdirs)
  }

  if (!dry_run) {
    for (fname in project_files) {
      if (!fs::file_exists(fname) || overwrite) {
        cat(glue::glue(templates[[fname]]), file = fname)
      }
    }
  }

  # Create .nojekyll file in docs/ directory
  nojekyll <- file.path(directory, "docs", ".nojekyll")
  project_files <- c(project_files, nojekyll)
  if (!dry_run) {
    fs::file_create(nojekyll)
  }

  # Configure, initialize, and commit ------------------------------------------

  # Configure RStudio
  rs_version <- check_rstudio_version()

  # Change working directory to workflowr project
  if (change_wd && !dry_run) {
    setwd(directory)
  }

  # Configure Git repository
  if (git && !dry_run) {
    if (!git2r::in_repository(directory)) {
      git2r::init(directory)
    }
    repo <- git2r::repository(directory)
    # Set local user.name and user.email
    if (!is.null(user.name) && !is.null(user.email)) {
      git2r::config(repo, user.name = user.name, user.email = user.email)
    }
    # Make the first workflowr commit
    git2r_add(repo, project_files, force = TRUE)
    status <- git2r::status(repo)
    if (length(status$staged) == 0) {
      warning("No new workflowr files were committed.")
    } else{
      commit <- git2r::commit(repo, message = "Start workflowr project.")
    }
    # Create pre-push hook to prevent pushing confidential projects
    if (disable_remote) {
      pre_push_file <- file.path(git2r::workdir(repo), ".git/hooks/pre-push")
      if (!fs::file_exists(pre_push_file) || overwrite) {
        # extras is a list defined in infrastructure.R
        cat(glue::glue(extras[["disable_remote"]]), file = pre_push_file)
      }
      if (!file_is_executable(pre_push_file)) {
        fs::file_chmod(pre_push_file, "a+x")
      }
    }
  }

  # Prepare output -------------------------------------------------------------

  o <- list(directory = directory,
            name = name,
            git = git,
            existing = existing,
            overwrite = overwrite,
            change_wd = change_wd,
            disable_remote = disable_remote,
            dry_run = dry_run,
            user.name = user.name,
            user.email = user.email,
            commit = if (exists("commit", inherits = FALSE)) commit else NULL)
  class(o) <- "wflow_start"

  return(o)
})

#' @export
print.wflow_start <- function(x, ...) {
  if (x$dry_run) {
    cat("wflow_start (\"dry run mode\"):\n")
    if (x$existing) {
      cat(sprintf("- Files will be added to existing directory %s\n", x$directory))
    } else {
      cat(sprintf("- New directory will be created at %s\n", x$directory))
    }
    cat(sprintf("- Project name will be \"%s\"\n", x$name))
    if (x$change_wd) {
      cat(sprintf("- Working directory will be changed to %s\n", x$directory))
    } else {
      cat(sprintf("- Working directory will continue to be %s\n", getwd()))
    }
    if (x$existing && git2r::in_repository(x$directory)) {
      repo <- git2r::repository(x$directory, discover = TRUE)
      cat(sprintf("- Git repo already present at %s\n", git2r::workdir(repo)))
    } else if (x$git) {
      cat(sprintf("- Git repo will be initiated at %s\n", x$directory))
    } else {
      cat(sprintf("- Git repo will not be initiated at %s\n", x$directory))
    }
    if (x$git) {
      cat("- Files will be committed with Git\n")
    } else {
      cat("- Files will not be committed with Git\n")
    }
    if (x$disable_remote) {
      cat("- Pushing to remote repository will be disabled\n")
    }
  } else {
    cat("wflow_start:\n")
    if (x$existing) {
      cat(sprintf("- Files added to existing directory %s\n", x$directory))
    } else {
      cat(sprintf("- New directory created at %s\n", x$directory))
    }
    cat(sprintf("- Project name is \"%s\"\n", x$name))
    if (x$change_wd) {
      cat(sprintf("- Working directory changed to %s\n", x$directory))
    } else {
      cat(sprintf("- Working directory continues to be %s\n", getwd()))
    }
    if (git2r::in_repository(x$directory)) {
      repo <- git2r::repository(x$directory, discover = TRUE)
      if (x$git && !x$existing) {
        cat(sprintf("- Git repo initiated at %s\n", git2r::workdir(repo)))
      } else if (x$git && x$existing && length(git2r::commits(repo)) == 1) {
        cat(sprintf("- Git repo initiated at %s\n", git2r::workdir(repo)))
      } else {
        cat(sprintf("- Git repo already present at %s\n", git2r::workdir(repo)))
      }
      if (x$git) {
        if (is.null(x$commit)) {
          cat("- Files were not committed\n")
        } else {
          cat(sprintf("- Files were committed in version %s\n",
                      shorten_sha(x$commit$sha)))
        }
      }
    } else {
      cat("- No Git repo\n")
    }
    if (x$disable_remote) {
      cat("- Pushing to remote repository is disabled\n")
    }
  }

  return(invisible(x))
}

check_rstudio_version <- function() {
  if (rstudioapi::isAvailable()) {
    rs_version <- rstudioapi::getVersion()
    if (rs_version < "1.0.0") {
      message(strwrap(paste("You can gain lots of new useful features",
                        "by updating to RStudio version 1.0 or greater.",
                        "You are running RStudio",
                        as.character(rs_version)), prefix = "\n"))
    }
  } else {
    rs_version <- NULL
  }
  return(rs_version)
}

check_for_existing_git_directory <- function(directory) {
  # In order to check if location is within an existing Git repository, first
  # must obtain the most upstream existing directory
  dir_existing <- obtain_existing_path(directory)
  if (git2r::in_repository(dir_existing)) {
    r <- git2r::repository(dir_existing, discover = TRUE)
    stop(call. = FALSE,
      "The directory where you have chosen to create a new workflowr directory",
      " is already within a Git repository. This is potentially dangerous. If",
      " you want to have a workflowr project created within this existing Git",
      " repository, re-run wflow_start with `git = FALSE` and then manually",
      " commit the new files. The following directory contains the existing .git",
      " directory: ", git2r::workdir(r))
  }
  return(invisible(NULL))
}

Try the workflowr package in your browser

Any scripts or data that you put into this service are public.

workflowr documentation built on Aug. 23, 2023, 1:09 a.m.