suppressMessages({ suppressPackageStartupMessages({ library(BiocStyle) library(BiocQE) library(BiocPkgTools) library(igraph) library(dplyr) library(magrittr) }) })
The Bioconductor project has fostered development and use of software for analysis of state-of-the-art genome-scale assays for almost two decades. The project has successfully addressed two conflicting objectives:
Users require familiar and stable toolsets to perform analyses that may take years to complete.
Developers want to make use of cutting-edge innovations in biotechnology and computer science to build their tools.
To achieve these objectives, it was recognized at the outset of the project that package management should emulate the process by which the R language evolves. A "release branch" is defined that constitutes a stable collection. Changes to packages in the release branch are permitted to address bugs or documentation shortfalls; otherwise, code in the release branch is considered permanently locked. Changes to packages in the "devel branch" can introduce new features. A change to a package in the devel branch that alters that package's API in release must be staged: the "old" API components that are to be removed must remain available for one release in "deprecated" state, after which these components can be declared defunct and removed from functionality.
To accommodate the rapid pace of innovation in biotechnology, release branches of Bioconductor packages are produced every six months, transitioning from the current devel branch. For code from the devel branch to transition into release, formal tests must be passed.
The ensemble of R packages managed and distributed in the
Bioconductor project is a collection of git repositories.
The "devel" version of each package is the "master"
branch of the associated repository. The "release"
version of each package is a formally tagged
branch of the associated repository. The complete
history of code changes to each package is preserved
in the git log and branch tags of the form
RELEASE_X_Y
identify the various package releases.
In summary, each git repository for each Bioconductor package
contains the history of modifications to source code and
documentation, with the master branch providing the current devel image,
and RELEASE_X_Y
tagged branches providing past releases.
Bioconductor has three main package types: software, annotation, and experiment. 'Software' packages primarily support analysis and visualization, 'annotation' packages provide reference information about genomes, genes, and other concepts of biology, and 'experiment' packages provide curated data and documentation for exemplary experiments.
To obtain the list of package names, we use the git repo git.bioconductor.org/admin/manifest, which has three text files with package names for each of the three types.
We identified a small group of packages and took a snapshot of the associated repositories.
library(BiocQE) bioc_coreset()
In the following, we unzip the snapshot in a temporary folder.
td = tempdir() curd = getwd() setwd(td) file.copy(system.file("demo_srcs", package="BiocQE"), td, recursive=TRUE) setwd(curd) # for knitr
Bioconductor's guidelines for contributions indicate that
contributed packages must pass R CMD check
, a constantly
evolving procedure for assessing adequacy of package
documentation and risks of error in package code.
We'll use the r BiocStyle::CRANpkg("rcmdcheck")
package to capture information
on package compliance to basic standards. This runs R CMD check
and organizes the message stream from that process.
setwd(td) chk_parody = rcmdcheck::rcmdcheck("demo_srcs/parody", quiet=TRUE) names(chk_parody) setwd(curd) # for knitr
The basic outcomes of a 'passed' check process are listed here:
c(nerr=length(chk_parody$errors),nwarn= length(chk_parody$warnings), nnote= length(chk_parody$notes))
No error was detected, but a warning and several notes were reported. We will look at these in further detail below.
A goal of the core members of the Bioconductor project is the development of reusable infrastructure components that are employed by independent package contributors. Programming with common data structures and APIs simplifies development of chained workflows, and facilitates methods comparison and optimization. Reusable components can be analyzed for inefficiencies and improved to the benefit of the entire community of users and developers.
R packages declare interdependencies explicitly in the DESCRIPTION
file. An example is
setwd(td) dcf = read.dcf("demo_srcs/IRanges/DESCRIPTION") cat(dcf[,"Imports"]) cat("\n") setwd(curd)
The fields Depends
, Imports
, Suggests
and LinkingTo
define
the independently maintained packages that must be available for
r BiocStyle::Biocpkg("HCAExplorer")
to work effectively. Details on
these types of dependency are provided in
Writing R Extensions.
We can visualize the interdependencies of a small collection
using r BiocStyle::Biocpkg("BiocPkgTools")
.
suppressMessages({ library(BiocPkgTools) library(BiocQE) library(dplyr) library(magrittr) }) dd = buildPkgDependencyDataFrame() dfc = dd %>% filter(Package %in% bioc_coreset() & dependency %in% bioc_coreset()) gg = buildPkgDependencyIgraph(dfc) plot(gg)
Some of the packages in the small set are isolated.
setdiff(bioc_coreset(), union(dfc$Package, dfc$dependency))
The current build system has functioned well as the software collection has grown from a few hundred to more than 1800 packages. The key resources for developers are
BiocManager::install
BiocManager::valid
We propose to enhance these facilities by
The last aim will take considerable work and discussion is deferred.
Here is a screen shot of the browse_checks
app in BiocQE
We use pkgnet to generate network statistics and displays related to package and function dependencies.
TO DO:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.