TRESS_peak: Detecting m6A methylation regions from Methylated RNA...

View source: R/TRESS_peak.R

TRESS_peakR Documentation

Detecting m6A methylation regions from Methylated RNA Immunoprecipitation Sequencing.

Description

This is a wrapper function to call m6A peaks transcriptome wide. When there are multiple biological replicates, it

  • Divides the whole genome to obtain bin-level read counts: DivideBins

  • Calls candidate m6A methylation regions: CallCandidates

  • Model fitting on candidate peaks based on Negative Binomial distribution: CallPeaks.multiRep

If there is only one replicate, it calls CallPeaks.oneRep to detect m6A methylation regions.

Usage

TRESS_peak(IP.file, Input.file, Path_To_AnnoSqlite,
           binsize = 50,
           WhichThreshold = "fdr_lfc",
           pval.cutoff0 = 1e-5,
           fdr.cutoff0 = 0.05,
           lfc.cutoff0 = 0.7,
           lowcount = 30,
           InputDir,
           OutputDir = NA,
           experiment_name,
           filetype = "bam",
           IncludeIntron = FALSE)

Arguments

IP.file

A vector of characters containing the name of BAM files for all IP samples.

Input.file

A vector of characters containing the name of BAM files for all input control samples.

Path_To_AnnoSqlite

A character to specify the path to a "*.sqlite" file used for genome annotation.

binsize

A numerical value to specify the size of window to bin the genome and get bin-level read counts. Default value is 50.

WhichThreshold

A character to specify which criterion to select significant bins in the first step, and also significant m6A regions in the second step. It takes among "pval", "fdr", "lfc", "pval_lfc" and "fdr_lfc". "pval": The inference is only based on P-values; "fdr": The inference is only based on FDR; "lfc": The inference is only based on log fold changes between normalized IP and normalized input read counts; "pval_lfc": The inference is based on both p-values and log fold changes; "fdr_lfc": The inference is based on both FDR and log fold changes. Default is "fdr_lfc".

pval.cutoff0

A numerical value to specify a cutoff for p-value. Default is 1e-5.

fdr.cutoff0

A numerical value to specify a cutoff for fdr. Default is 0.05.

lfc.cutoff0

A numerical value to specify a cutoff for log fold change between normalized IP and input read counts. Default is 0.7 for fold change of 2.

lowcount

An integer to filter out regions with total input counts < lowcount. Default is 30.

InputDir

A character to specify the input directory of all BAM files.

OutputDir

A character to specify an output directory save all results. Default is NA, which will not save any results.

experiment_name

A character to specify the name of results.

filetype

A character to specify the format of input data. Possible choices are: "bam", "bed" and "GRanges". Default is "bam".

IncludeIntron

A logical value indicating whether to include (TRUE) intronic regions or not (False). Default is FALSE.

Details

TRESS implements a two-step procedure to conduct peak calling for MeRIP-seq data with multiple biological replicates. In the first step, it quickly divide the whole genome into equal sized bins and loosely indentifies candidate peak regions using an ad hoc procedure. In the second step, it detects high confident peaks among candidate regions and ranks them with more rigorous statistical modeling based on an empirical Bayesian hierarchical model.

When there is only one biological replciate, candidate regions from the above two-step procedure will be output as the final list of peaks. P-values come from binomial test, which are further adjusted using Benjamini-Hochberg procedure.

Value

If directory OutputDir is specified, this function will output two sets of results. One is saved as ".rda", which contains all bin-level data (genome coordinates and read counts matrix). The other one is an ".xls" file, which contains information of all peaks. The columns of the peak excel files are:

chr

Chromosome number of each peak.

start

The start of genomic position of each peak.

end

The end of genomic position of each peak.

strand

The strand of each peak.

summit

The summit of each peak.

lg.fc

Log fold change between normalized IP and normalized input read counts for each peak.

pvals

P-values calculated based on the Wald-test.

p.adj

Adjusted p-values using Benjamini-Hochberg procedure.

If there are multiple replicates, the excel will also include following columns:

mu

Estimated methylation level of each peak.

mu.var

Estimated variance for methylation level of each peak

stats

Wald test statistics of each peak

shrkPhi

The shrinkage estimation of dispersion for mehtylation levels of each peak.

shrkTheta

The shrinkage estimation for scale parameter theta in the gamma distribution.

rSocre

A score defined by TRESS to rank each peak. The higher the score, the higher the rank would be.

Note, there are additional columns regardless of the number of replicates. Those columns contain read counts from respective samples and have names "*.bam".

Author(s)

Zhenxing Guo <zhenxing.guo@emory.edu>

References

Guo, Z., Shafik, A. M., Jin, P., Wu, Z., and Wu, H. (2021) Detecting m6A methylation regions from Methylated RNA Immunoprecipitation Sequencing. Bioinformatics, 1-7. https://doi-org.proxy.library.emory.edu/10.1093/bioinformatics/btab181

Examples

## Use BAM files in datasetTRES
# install_github("https://github.com/ZhenxingGuo0015/datasetTRES")
## Not run: 
library(datasetTRES)
IP.file = c("cb_ip_rep1_chr19.bam", "cb_ip_rep2_chr19.bam")
Input.file = c("cb_input_rep1_chr19.bam", "cb_input_rep2_chr19.bam")
BamDir = file.path(system.file(package = "datasetTRES"), "extdata/")
annoDir = file.path(
  system.file(package = "datasetTRES"),
  "extdata/mm9_chr19_knownGene.sqlite"
  )
OutDir = "/directory/to/output"
 TRESS_peak(IP.file = IP.file,
           Input.file = Input.file,
           Path_To_AnnoSqlite = annoDir,
           InputDir = BamDir,
           OutputDir = OutDir,
           experiment_name = "examplebyBam",
           filetype = "bam")
peaks = read.table(paste0(OutDir, "/", "c"),
                   sep = "\t", header = TRUE)

## End(Not run)

haowulab/TRESS documentation built on Aug. 27, 2022, 7:11 p.m.