| process_epigenome | R Documentation |
This function performs all necessary steps in the ChIP-seq processing pipeline.
process_epigenome(
fastq_files,
out_name = NULL,
seq_type = c("ATAC", "CHIP", "CT"),
type = "SE",
cores = 8,
path_fastqc = "FastQC/",
path_bam = "BAM/",
path_peaks = "Peaks/",
path_logs = "Logs/",
run_fastqc = TRUE,
index = "/vault/refs/indexes/hg38",
extra_bowtie2 = "",
remove = c("chrM", "chrUn", "_random", "_hap", "_gl", "EBVls"),
blacklist = "/vault/refs/hg38-blacklist.v2.bed",
type_peak = c("narrow", "broad"),
shift = c(TRUE, FALSE),
chunk = 1e+07,
gen_sizes = "/vault/refs/hg38.chromSizes.txt"
)
fastq_files |
Character string (single-end) or character vector of length 2 (paired-end) with the file names of the samples to be analysed. |
out_name |
Character vector, with the same length as |
seq_type |
Experiment type, either "ATAC" (default) or "CHIP". |
type |
Sequence type, one of "SE" (single end) or "PE" (paired end). |
cores |
Number of threads to use for the analysis. |
path_fastqc |
Character indicating the output directory for the FastQC reports. |
path_bam |
Character indicating the output directory for the bam files. |
path_peaks |
Character indicating the output directory for the peak files. |
path_logs |
Character indicating the output directory for the logs. |
run_fastqc |
Logical indcating whether to run (TRUE) or not (FALSE) FastQC. Default: TRUE. |
index |
Character indicating the location and basename for the Bowtie2 index. |
extra_bowtie2 |
Character containing additional arguments to be passed to bowtie2 alignment call. |
remove |
Character vector with chr that will be filtered out. Any chromosome name containing matches for these characters will be removed. |
blacklist |
Character indicating the file containing blacklist regions in bed format. Any reads overlapping these regions will be discarded. |
type_peak |
Character indicating the type of peak to be called with MACS2, either "narrow" or "broad". |
shift |
Logical indicating whether the reads should be shifted -100bp and extended to 200bp (TRUE) or not (FALSE, default). |
chunk |
Size of the chunk to load into memory for ATAC-seq read offset.
This argument is necessary only when |
gen_sizes |
Character string indicating the path where the file with
chromosome name and sizes can be found. This argument is necessary only when |
This function ocesses ATAC-seq or ChIP-seq from FastQ files using the following pipeline:
Quality Control (FastQC).
Alignment to reference genome (Bowtie2).
Post-processing (Samtools), including removing duplicates, blacklisted regions and non-reference chromosomes.
(only for ATAC-seq) Offset correction (Samtools).
Peak calling (MACS2).
This function can process paired and single end FastQ files:
Single end files. The argument fastq_files should be a character vector with the name
of each file.
Paired end files. The argument fastq_files should be a list, where each element is a
vector of size 1, where the first one is the R1 and the second one is the R2.
Creates the folders path_fastqc, path_bam, path_peaks, path_logs,
by default in your working directory, containing the output files from de different
analyses.
## Not run:
process_epigenome(fastq_files=c("path/to/file.fastq.gz", "path/to/file2.fastq.gz"),
seq_type="ATAC",
out_name=c("sample1", "sample2"),
type="SE",
cores=8)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.