knitr::opts_chunk$set(comment = "#>", collapse = TRUE, screenshot.force = FALSE)

Introduction

in this vignette, you can learn the R functions usage of BioInstaller R package.

Key points:

Core R functions

Function set.biosoftwares.db() can be used to set a database file saving the information of installed software and database.

Next, you can use the function install.bioinfo(show.all.names = TRUE) get all soteware and database supported by BioInstaller.

library(BioInstaller)
set.biosoftwares.db(tempfile())
# Show all avaliable softwares/dependece in default inst/extdata/config/github/github.toml 
# and inst/extdata/config/nongithub/nongithub.toml
x <- install.bioinfo(show.all.names = TRUE)
suppressWarnings(
  DT::datatable(matrix(x, ncol=3), caption = sprintf("Items supported by BioInstaller"))
)

When setting the show.all.versions to TRUE, the versions of software and database will be returned.

# Fetching versions of softwares
 install.bioinfo(name = 'samtools', show.all.versions = TRUE)

Parameter verbose in install.bioinfo can be used to show the extra debug information.

Besides, if you only want to download the source code or raw database files to a specific directory, the parameter download.dir and download.onlyare required.

# Install 'demo' with debug infomation
download.dir <- sprintf('%s/demo_2', tempdir())
install.bioinfo('demo', download.dir = download.dir, verbose = TRUE)

# Download demo source code
download.dir <- sprintf('%s/demo_3', tempdir())
install.bioinfo('demo', download.dir = download.dir,
  download.only = TRUE, verbose = TRUE)

After finishing the download step, typically, you need to install the software by running several commands or an installation script. BioInstaller stores the related installation script or commands in the configuration files, and you can one-click to install the related software or database.

Besides, BioInstaller will pass several parameters to installation command or script. Just like destdir of ./configure --prefix={{destdir}}; make; make install for compiling C program.

It is optional to create bin directory in the destdir, and copy all the executable files in it. The bin in destdir can be set in the variable PATH for re-use in any other working directory.

If you want to download the source code in A directory and install it to B directory, you need to simultaneously set the parameters destdir and download.dirin function install.bioinfo.

# Set download.dir and destdir (destdir like /usr/local 
# including bin, lib, include and others), 
# destdir will work if install step {{destdir}} be used
download.dir <- sprintf('%s/demo_source', tempdir())
destdir <- sprintf('%s/demo', tempdir())
install.bioinfo('demo', download.dir = download.dir, destdir = destdir)

Saved informations after installation

It is important to save related information of installation. This step can help you to use the software or database in the other pipeline. So, after installed the software and database, BioInstaller will save the information, such as software name, version, path, update time, in the database file set by function set.biosoftwares.db(), which also defined by environment variable BIO_SOFWARES_DB_ACTIVE.

temp.db <- tempfile()
set.biosoftwares.db(temp.db)
is.biosoftwares.db.active(temp.db)

# Install 'demo' quite
download.dir <- sprintf('%s/demo_1', tempdir())
install.bioinfo('demo', download.dir = download.dir, verbose = FALSE)

Function get.info() can be used to get the saved information of installed software and databases.

When you want to delete the saved information, you can use function del.info.

config <- get.info('demo')
config

config <- configr::read.config(temp.db)
config$demo$comments <- 'This is a demo.'
params <- list(config.dat = config, file.path = temp.db)
do.call(configr::write.config, params)
get.info('demo')
del.info('demo')

Local mode

Local mode of BioInstaller was useful when you have downloaded the source code or database file, and have not run the install steps.

Tips:

download.dir <- sprintf('%s/github_demo_local', tempdir())
install.bioinfo('github_demo', download.dir = download.dir, download.only = TRUE, verbose = FALSE)
install.bioinfo('github_demo', local.source = download.dir)

download.dir <- sprintf('%s/demo_local', tempdir())
install.bioinfo('demo_2', download.dir = download.dir, download.only = TRUE, verbose = FALSE)
install.bioinfo('demo_2', download.dir = download.dir, local.source = sprintf('%s/GRCh37_MT_ensGene.txt.gz', download.dir), decompress = TRUE)

Download all versions

Function craw.all.version is the simplest method to download all available URL files in nongithub or database files.

download.dir <- sprintf('%s/crawl.all.versions', tempdir())
crawl.all.versions('demo', download.dir = download.dir)

Meta information of software and database

Function get.meta can be used to get all software and databases meta information, such as description and publication, supported by BioInstaller.

# Get all meta source files
meta_files <- get.meta.files()
meta_files

# Get all of meta informaton in BioInstaller
meta <- get.meta()
names(meta)
meta[1:4]
meta$db$cfg_meta
meta$db$item$atcircdb

# Examples of get.meta
db_cfg_meta <- get.meta(value = "cfg_meta", config = 'db')
db_cfg_meta

db_cfg_meta_parsed <- get.meta(value = 'cfg_meta', config = 'db', read.config.params = list(rcmd.parse = TRUE))
db_cfg_meta_parsed

db_cfg_meta <- get.meta(config = 'github', value = 'item')
db_cfg_meta$bwa

# Get databases meta file
db_meta_file <- get.meta(config = 'db_meta_file')
db_meta_file
db_meta_file <- meta_files[["db_meta_file"]]
db_meta_file

Download database

Database files are required for almost all bioinformatics data analysis pipeline, especially for sequence mapping and annotation steps. We hope BioInstaller can help you to access these resources easily in R, and you can use the function install.bioinfo directly download the supported databases.

# get all database name
library(stringr)
x <- install.bioinfo(show.all.names = T)
x <- x[str_detect(x, "^db_|reffa|bundle")]
suppressWarnings(
  DT::datatable(matrix(x, ncol=3), caption = sprintf("Database supported by BioInstaller (n=%s)", length(x)))
)

# all databases config 
db_cfg_meta <- get.meta(config = 'db', value = 'cfg_meta', 
                        read.config.params=list(rcmd.parse = TRUE))
cfg_dir <- db_cfg_meta$cfg_dir
cfg_dir
avaliable_cfg <- db_cfg_meta$avaliable_cfg
avaliable_cfg
sprintf("%s/%s", cfg_dir, avaliable_cfg)

Just like ANNOVAR and Bioconductor have done, we hope to establish a integrated and shared database pool in this tool.

# ANNOVAR
download.dir <- sprintf('%s/db_annovar', tempdir())
config.toml <- system.file("extdata", "config/db/db_annovar.toml", 
  package = "BioInstaller")
#install.bioinfo('db_ucsc_refgene', download.dir = download.dir, 
#  nongithub.cfg = config.toml, extra.list = list(buildver = "hg19"))

# db_main
download.dir <- sprintf('%s/db_main', tempdir())
config.toml <- system.file("extdata", "config/db/db_main.toml", 
  package = "BioInstaller")
install.bioinfo('db_diseaseenhancer', download.dir = download.dir, 
  nongithub.cfg = config.toml)

Write you own configuration file

If the software and database have not been supported by BioInstaller, you can write your own YAML or TOML format configuration file. A related vignette can help you to do this.

Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

sessionInfo()


JhuangLab/BioInstaller documentation built on Jan. 28, 2023, 1:55 p.m.