README.md
In mireia-bioinfo/pipelineNGS: Methods for Epigenetic Data Processing

pipelineNGS

pipelineNGS is a package for processing epigenomic high-throughput data, specifically histone mark ChIP-seq and ATAC-seq.

As this package is a wrapper for some command line tools, you need to have this programs in your $PATH. If they are not in your $PATH, you can also provide the path to the binary files using the appropriate arguments.

Additionally, you will need to download reference files to perform the different steps in the pre-processing pipeline:

Reference genome indexed with Bowtie2. See more information here. You can download the fasta files for the different genome builds from the UCSC download site.
Chromosome sizes (gen_sizes). You can download this file also from the UCSC: hg38 or hg19. Only necessary for ATAC-seq offset correction.
ENCODE Blacklist (blacklist). They can be downloaded from here.

Open your R session, install the devtools package if it is not already in your machine and type the following:

# Install pipelineNGS package
devtools::install_github("mireia-bioinfo/pipelineNGS")

# Load pipelineNGS package
library(pipelineNGS)

In this package we currently have implemented the pipelines for analyzing the following experiments:

ATAC-seq (ATAC).
ChIP-seq for histone marks (CHIP).
CUTandTAG for histone marks (CT).
CUTandRUN for transcription factors (CR).

In the following figure you can see a description of the steps needed for the analysis of each type of experiment, with specific arguments (if any) used in the different steps.

Here is an example on how to run a ChIP-seq analysis with single-end data.

## General parameters
index <- "/vault/refs/indexes/hg38"
blacklist <- "/vault/refs/Blacklist/lists/hg38-blacklist.v2.bed"

## Example Single End ##
fastq_files <- c("fastq/sample1_L1.fastq.gz", "fastq/sample1_L2.fastq.gz",
                 "fastq/sample1_L3.fastq.gz", "fastq/sample2_L2.fastq.gz",
                 "fastq/sample3_L1.fastq.gz", "fastq/sample3_L3.fastq.gz")

## Convert to list to use as input for process_epigenome()
# Create one list element for each simple
names <- sapply(strsplit(basename(fastq_files), "_"), function(x) x[1])
fastq_input <- split(fastq_files, names)
fastq_input

## Using the files described in the previous chunk:
process_epigenome(fastq_files=fastq_input,
                  out_name=names(fastq_input),
                  run_fastqc=TRUE,
                  seq_type="CT",
                  type="PE",
                  index=index,
                  blacklist=blacklist,
                  cores=6)

mireia-bioinfo/pipelineNGS documentation built on Jan. 2, 2023, 11:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com