README.md

Reproducible Bioinformatics Community

The aim of Reproducible Bioinformatics project is the creation of easy to use Bioinformatics workflows that fullfill the following roles (Sandve et al. PLoS Comp Biol. 2013):

1 For Every Result, Keep Track of How It Was Produced
2 Avoid Manual Data Manipulation Steps
3 Archive the Exact Versions of All External Programs Used
4 Version Control All Custom Scripts
5 Record All Intermediate Results, When Possible in Standardized Formats
6 For Analyses That Include Randomness, Note Underlying Random Seeds
7 Always Store Raw Data behind Plots
8 Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
9 Connect Textual Statements to Underlying Results
10 Provide Public Access to Scripts, Runs, and Results

Reproducible Bioinformatics is a non-profit and open-source project.

We are a group of Bioinformaticians interested to simplify the use of bioinformatics tools to Biologists w/wo scripting ability. At the same time we are interested in providing robust and reproducible workflows.

For this reason we have developed the docker4seq package.

At the present time a total of three workflows are available in the stable version of docker4seq package (more info below in the text):

- RNAseq workflow
- miRNAseq workflow
- ChIPseq workflow

Under development are:

- PDX workflow: variants calling in patient derived xenograft (PDX) from RNAseq and EXOMEseq data
- Single cell analysis workflow
- Metagenomics workflow

All workflows are controlled by a set of R fuctions, part of docker4seq package, and the algorithms used are all encapsulated into Docker images and stored at docker.io/repbioinfo repository.

More info on docker4seq: docker4seq web page

How to be part of the Reproducible Bioinformatics Project community

Any bioinformatician interested to embed specific applications in the available workflows or interested to develop a new workflow is requested to embed the application(s) in a docker image, save it in a public repository and configure one or more R functions that can be used to interact with the docker image. The module/workflow needs to fullfil at least the first 6 Sandve's rules.

Steps required to submit a new application/workflow:

docker4seq

docker4seq is registed with RRID SCR_017006 at SciCrunch. docker4seq* is part of Elixir bio.tools.

A collection of functions to execute NGS computing demanding applications, e.g reads mapping and counting, wrapped in docker containers. To install it you can use use devtools:

install.packages("devtools")
library(devtools)
install_github("kendomaniac/docker4seq", ref="master")

Requirements

You need to have docker installed on your machine, for more info see this document: https://docs.docker.com/engine/installation/. docker4seq package is expected to run on 64 bits linux machine with at least 4 cores. 32 Gb RAM are required only if mapping will be done with STAR. In case mapping is done with Salmon, only 16 Gb RAM are needed. A scratch folder should be present, e.g. /data/scratch and it should be writable from everybody:

chmod 777 /data/scratch

The functions in docker4seq package require that user is sudo or part of a docker group. See the following document for more info: https://docs.docker.com/install/linux/linux-postinstall/

IMPORTANT The first time docker4seq is installed the downloadContainers needs to be executed to download to the local repository the containers that are needed for the use of docker4seq

More info on the functionalities of the package are available at: docker4seq/4SeqGUI vignette

testSeqbox In docker4seq library is now present the function testSeqbox, allowing to check if the software required for docker4seq functionalities is properly installed. Check ?testSeqbox to see how to use it.

Workflows compliance with Sandve rules:

Diclaimer:

docker4seq developers have no liability for any use of docker4seq functions, including without limitation, any loss of data, incorrect results, or any costs, liabilities, or damages that result from use of docker4seq.

4SeqGUI Project

4SeqGUI is the GUI that can control the docker4seq functionalities. It represents the graphical interface used in SeqBox project (see below).

Video tutorials for 4SeqGUI:

HowTo run a full RNAseq analysis

HowTo run a full miRNAseq analysis

HowTo run a full ChIPseq analysis

The SeqBox Project

Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analyses also to scientists with/without scripting experience.

More info on SeqBox characteristics and cost are available at www.seqbox.com

IMPORTANT The first time docker4seq is installed the downloadContainers needs to be executed to download to the local repository the containers that are needed for the use of docker4seq

More info on the functionalities of the package are available at: docker4seq/4seqGUI vignette

Diclaimer:

docker4seq developers have no liability for any use of docker4seq functions, including without limitation, any loss of data, incorrect results, or any costs, liabilities, or damages that result from use of docker4seq.



kendomaniac/docker4seq documentation built on July 15, 2024, 12:02 a.m.