suppressMessages({
suppressPackageStartupMessages({
library(BiocStyle)
library(BiocBuildTools)
library(BiocPkgTools)
library(igraph)
library(dplyr)
library(magrittr)
})
})

Basic concepts

Release and devel collections

The Bioconductor project has fostered development and use of software for analysis of state-of-the-art genome-scale assays for almost two decades. The project has successfully addressed two conflicting objectives:

To achieve these objectives, it was recognized at the outset of the project that package management should emulate the process by which the R language evolves. A "release branch" is defined that constitutes a stable collection. Changes to packages in the release branch are permitted to address bugs or documentation shortfalls; otherwise, code in the release branch is considered permanently locked. Changes to packages in the "devel branch" can introduce new features. A change to a package in the devel branch that alters that package's API in release must be staged: the "old" API components that are to be removed must remain available for one release in "deprecated" state, after which these components can be declared defunct and removed from functionality.

To accommodate the rapid pace of innovation in biotechnology, release branches of Bioconductor packages are produced every six months, transitioning from the current devel branch. For code from the devel branch to transition into release, formal tests must be passed.

Implementation of release and devel collections in git

The ensemble of R packages managed and distributed in the Bioconductor project is a collection of git repositories. The "devel" version of each package is the "master" branch of the associated repository. The "release" version of each package is a formally tagged branch of the associated repository. The complete history of code changes to each package is preserved in the git log and branch tags of the form RELEASE_X_Y identify the various package releases.

In summary, each git repository for each Bioconductor package contains the history of modifications to source code and documentation, with the master branch providing the current devel image, and RELEASE_X_Y tagged branches providing past releases.

The current package collections

Bioconductor has three main package types: software, annotation, and experiment. 'Software' packages primarily support analysis and visualization, 'annotation' packages provide reference information about genomes, genes, and other concepts of biology, and 'experiment' packages provide curated data and documentation for exemplary experiments.

To obtain the list of package names, we use the git repo git.bioconductor.org/admin/manifest, which has three text files with package names for each of the three types.

A small collection for illustration

We identified a small group of packages and took a snapshot of the associated repositories.

library(BiocBuildTools)
bioc_coreset()
bioc_coreset(small=TRUE)

In the following, we produce the folder.

td = tempfile()
dir.create(td)
unzip(system.file("demo_srcs/litclone.zip", package="BiocBuildTools"), exdir=td)

Self-testing

Bioconductor's guidelines for contributions indicate that contributed packages must pass R CMD check, a constantly evolving procedure for assessing adequacy of package documentation and risks of error in package code.

We'll use the r BiocStyle::CRANpkg("rcmdcheck") package to capture information on package compliance to basic standards. This runs R CMD check and organizes the message stream from that process.

chk_parody = rcmdcheck::rcmdcheck(paste0(td, "/parody"))
names(chk_parody)

The basic outcomes of a 'passed' check process are listed here:

c(nerr=length(chk_parody$errors),nwarn= length(chk_parody$warnings),
nnote= length(chk_parody$notes))

No error was detected, but a warning and several notes were reported. We will look at these in further detail below.

Implicit interoperability testing

A goal of the core members of the Bioconductor project is the development of reusable infrastructure components that are employed by independent package contributors. Programming with common data structures and APIs simplifies development of chained workflows, and facilitates methods comparison and optimization. Reusable components can be analyzed for inefficiencies and improved to the benefit of the entire community of users and developers.

R packages declare interdependencies explicitly in the DESCRIPTION file. An example is

dcf = read.dcf(paste0(td, "/vsn/DESCRIPTION"))
cat(dcf[,"Imports"])
cat("\n")

The fields Depends, Imports, Suggests and LinkingTo define the independently maintained packages that must be available for r BiocStyle::Biocpkg("Rsamtools") to work effectively. Details on these types of dependency are provided in Writing R Extensions.

We can visualize the interdependencies of a small collection using r BiocStyle::Biocpkg("BiocPkgTools").

library(BiocPkgTools)
library(BiocBuildTools)
library(dplyr)
library(magrittr)
dd = buildPkgDependencyDataFrame()
dfc = dd %>% filter(Package %in% bioc_coreset(small=FALSE) & dependency %in% bioc_coreset(small=FALSE)) 
gg = buildPkgDependencyIgraph(dfc)
plot(gg)

Some of the packages in the small set are isolated.

setdiff(bioc_coreset(), union(dfc$Package, dfc$dependency))

Supporting developers with build system enhancements

The current build system has functioned well as the software collection has grown from a few hundred to more than 1800 packages. The key resources for developers are

We propose to enhance these facilities by

The last aim will take considerable work and discussion is deferred.

Improving check report delivery

Here is a screen shot of the browse_checks app in BiocBuildTools

We use pkgnet to generate network statistics and displays related to package and function dependencies.

TO DO:



vjcitn/BiocBuildTools documentation built on March 15, 2024, 4:19 a.m.