process_epigenome: Automated processing of ChIP-seq and ATAC-seq samples
In mireia-bioinfo/pipelineNGS: Methods for Epigenetic Data Processing

process_epigenome

R Documentation

Automated processing of ChIP-seq and ATAC-seq samples

Description

This function performs all necessary steps in the ChIP-seq processing pipeline.

Usage

process_epigenome(
  fastq_files,
  out_name = NULL,
  seq_type = c("ATAC", "CHIP", "CT"),
  type = "SE",
  cores = 8,
  path_fastqc = "FastQC/",
  path_bam = "BAM/",
  path_peaks = "Peaks/",
  path_logs = "Logs/",
  run_fastqc = TRUE,
  index = "/vault/refs/indexes/hg38",
  extra_bowtie2 = "",
  remove = c("chrM", "chrUn", "_random", "_hap", "_gl", "EBVls"),
  blacklist = "/vault/refs/hg38-blacklist.v2.bed",
  type_peak = c("narrow", "broad"),
  shift = c(TRUE, FALSE),
  chunk = 1e+07,
  gen_sizes = "/vault/refs/hg38.chromSizes.txt"
)

Arguments

`fastq_files`	Character string (single-end) or character vector of length 2 (paired-end) with the file names of the samples to be analysed.
`out_name`	Character vector, with the same length as `fastq_files`, indicating the output filenames.
`seq_type`	Experiment type, either "ATAC" (default) or "CHIP".
`type`	Sequence type, one of "SE" (single end) or "PE" (paired end).
`cores`	Number of threads to use for the analysis.
`path_fastqc`	Character indicating the output directory for the FastQC reports.
`path_bam`	Character indicating the output directory for the bam files.
`path_peaks`	Character indicating the output directory for the peak files.
`path_logs`	Character indicating the output directory for the logs.
`run_fastqc`	Logical indcating whether to run (TRUE) or not (FALSE) FastQC. Default: TRUE.
`index`	Character indicating the location and basename for the Bowtie2 index.
`extra_bowtie2`	Character containing additional arguments to be passed to bowtie2 alignment call.
`remove`	Character vector with chr that will be filtered out. Any chromosome name containing matches for these characters will be removed.
`blacklist`	Character indicating the file containing blacklist regions in bed format. Any reads overlapping these regions will be discarded.
`type_peak`	Character indicating the type of peak to be called with MACS2, either "narrow" or "broad".
`shift`	Logical indicating whether the reads should be shifted -100bp and extended to 200bp (TRUE) or not (FALSE, default).
`chunk`	Size of the chunk to load into memory for ATAC-seq read offset. This argument is necessary only when `type="SE"`.
`gen_sizes`	Character string indicating the path where the file with chromosome name and sizes can be found. This argument is necessary only when `type="SE"`.

Details

This function ocesses ATAC-seq or ChIP-seq from FastQ files using the following pipeline:

Quality Control (FastQC).
Alignment to reference genome (Bowtie2).
Post-processing (Samtools), including removing duplicates, blacklisted regions and non-reference chromosomes.
(only for ATAC-seq) Offset correction (Samtools).
Peak calling (MACS2).

This function can process paired and single end FastQ files:

Single end files. The argument fastq_files should be a character vector with the name of each file.
Paired end files. The argument fastq_files should be a list, where each element is a vector of size 1, where the first one is the R1 and the second one is the R2.

Value

Creates the folders path_fastqc, path_bam, path_peaks, path_logs, by default in your working directory, containing the output files from de different analyses.

Examples

## Not run: 
process_epigenome(fastq_files=c("path/to/file.fastq.gz", "path/to/file2.fastq.gz"),
                  seq_type="ATAC",
                  out_name=c("sample1", "sample2"),
                  type="SE",
                  cores=8)

## End(Not run)

mireia-bioinfo/pipelineNGS documentation built on Jan. 2, 2023, 11:18 a.m.