subset_bam_call_peaks | R Documentation |
This functions does peak calling on each cell population in order to refine gene annotation for large bins. For instance, a 50000bp bins might contain the TSS of several genes, while in reality only one or two of these genes are overlapping the signal (peak). To do so, first in-silico cell sorting is applied based on previously defined clusters contained in the SingleCellExperiment. Taking BAM files of each sample as input, samtools pools then splits reads from each cell barcode into 1 BAM file per cell cluster (pseudo-bulk). Then MACS2 calls peaks on each cluster. The peaks are aggregated and merged if closer to a certain distance defined by user (10000bp). Then,
This function takes as input a SingleCellExperiment, that must contain a 'cell_cluster' column in it's colData, an output directory where to store temporary files, the list of BAM files corresponding to each sample and containing the cell barcode information as a tag (for instance tag CB:Z:xxx, XB:Z:xxx or else...) or single-cell BED files containing the raw reads and corresponding to the 'barcode' column metadata, the p.value used by MACS2 to distinguish significant peaks, the reference genome (either hg38 or mm10), the maximal merging distance in bp and a data.frame containing gene TSS genomic cooridnates of corresponding genome (if set to NULL, will automatically load geneTSS). The output is a SingleCellExperiment with GRanges object containing ranges of each merged peaks that falls within genomic bins of the SingleCellExperiment, saving the bin range as additional column (window_chr, window_start, window_end), as well as the closests genes and their distance relative to the peak. The peaks may be present in several rows if multiple genes are close / overlap to the peaks.
Note that the user must have MACS2 installed and available in the PATH. Users can open command terminal and type 'which macs2' to verify the availability of these programs. Will only work on unix operating system. Check operating system with 'print(.Platform)'.
subset_bam_call_peaks(
scExp,
odir,
input,
format = "BAM",
p.value = 0.05,
ref = "hg38",
peak_distance_to_merge = 10000,
geneTSS_annotation = NULL,
run_coverage = FALSE,
progress = NULL
)
scExp |
A SingleCellExperiment object |
odir |
Output directory where to write temporary files and each cluster's BAM file |
input |
A character vector of file paths to each sample's BAM file, containing cell barcode information as tags. BAM files can be paired-end or single-end. |
format |
Format of the input data, either "BAM" or "scBED". |
p.value |
a p-value to use for MACS2 to determine significant peaks. (0.05) |
ref |
A reference genome, either hg38, mm10 or ce11. ('hg38') |
peak_distance_to_merge |
Maximal distance to merge peaks together after peak calling , in bp. (10000) |
geneTSS_annotation |
A data.frame annotation of genes TSS. If NULL will automatically load Gencode list of genes fro specified reference genome. |
run_coverage |
Create coverage tracks (.bw) for each cluster ? |
progress |
A shiny Progress instance to display progress bar. |
The BED files of the peaks called for each clusters, as well as the merged peaks are written in the output directory.
A SingleCellExperiment with refinded annotation
## Not run:
data("scExp")
subset_bam_call_peaks(scExp, "path/to/out/", list("sample1" =
"path/to/BAM/sample1.bam", "sample2" = "path/to/BAM/sample2.bam"),
p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000,
geneTSS_annotation = NULL)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.