process_epigenome | R Documentation |
This function performs all necessary steps in the ChIP-seq processing pipeline.
process_epigenome( fastq_files, out_name = NULL, seq_type = c("ATAC", "CHIP", "CT"), type = "SE", cores = 8, path_fastqc = "FastQC/", path_bam = "BAM/", path_peaks = "Peaks/", path_logs = "Logs/", run_fastqc = TRUE, index = "/vault/refs/indexes/hg38", extra_bowtie2 = "", remove = c("chrM", "chrUn", "_random", "_hap", "_gl", "EBVls"), blacklist = "/vault/refs/hg38-blacklist.v2.bed", type_peak = c("narrow", "broad"), shift = c(TRUE, FALSE), chunk = 1e+07, gen_sizes = "/vault/refs/hg38.chromSizes.txt" )
fastq_files |
Character string (single-end) or character vector of length 2 (paired-end) with the file names of the samples to be analysed. |
out_name |
Character vector, with the same length as |
seq_type |
Experiment type, either "ATAC" (default) or "CHIP". |
type |
Sequence type, one of "SE" (single end) or "PE" (paired end). |
cores |
Number of threads to use for the analysis. |
path_fastqc |
Character indicating the output directory for the FastQC reports. |
path_bam |
Character indicating the output directory for the bam files. |
path_peaks |
Character indicating the output directory for the peak files. |
path_logs |
Character indicating the output directory for the logs. |
run_fastqc |
Logical indcating whether to run (TRUE) or not (FALSE) FastQC. Default: TRUE. |
index |
Character indicating the location and basename for the Bowtie2 index. |
extra_bowtie2 |
Character containing additional arguments to be passed to bowtie2 alignment call. |
remove |
Character vector with chr that will be filtered out. Any chromosome name containing matches for these characters will be removed. |
blacklist |
Character indicating the file containing blacklist regions in bed format. Any reads overlapping these regions will be discarded. |
type_peak |
Character indicating the type of peak to be called with MACS2, either "narrow" or "broad". |
shift |
Logical indicating whether the reads should be shifted -100bp and extended to 200bp (TRUE) or not (FALSE, default). |
chunk |
Size of the chunk to load into memory for ATAC-seq read offset.
This argument is necessary only when |
gen_sizes |
Character string indicating the path where the file with
chromosome name and sizes can be found. This argument is necessary only when |
This function ocesses ATAC-seq or ChIP-seq from FastQ files using the following pipeline:
Quality Control (FastQC).
Alignment to reference genome (Bowtie2).
Post-processing (Samtools), including removing duplicates, blacklisted regions and non-reference chromosomes.
(only for ATAC-seq) Offset correction (Samtools).
Peak calling (MACS2).
This function can process paired and single end FastQ files:
Single end files. The argument fastq_files
should be a character vector with the name
of each file.
Paired end files. The argument fastq_files
should be a list, where each element is a
vector of size 1, where the first one is the R1 and the second one is the R2.
Creates the folders path_fastqc
, path_bam
, path_peaks
, path_logs
,
by default in your working directory, containing the output files from de different
analyses.
## Not run: process_epigenome(fastq_files=c("path/to/file.fastq.gz", "path/to/file2.fastq.gz"), seq_type="ATAC", out_name=c("sample1", "sample2"), type="SE", cores=8) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.