TRESS_DMRfit: Differential m6A methylation analysis for MeRIP-seq data...

View source: R/TRESS_DMRfit.R

TRESS_DMRfitR Documentation

Differential m6A methylation analysis for MeRIP-seq data under general experimental design

Description

This function performs differential m6A analysis through the following three steps:

  • Divide the whole genome to obtain bin-level read counts: DivideBins

  • Call candidate differential m6A methylation regions (DMRs): CallCandidates

  • Model fitting on candidate DMRs based on Negative Binomial distribution: CallDMRs.paramEsti

Usage

TRESS_DMRfit(IP.file, Input.file, Path_To_AnnoSqlite,
             variable = NULL, model = NULL,
             InputDir, OutputDir = NA,
             experimentName = NA,
             binsize = 50,
             WhichThreshold = "fdr",
             pval.cutoff = 1e-5,
             fdr.cutoff = 0.05,
             lfc.cutoff = 0.4,
             IncludeIntron = TRUE,
             filetype = "bam",
             filterRegion = TRUE,
             shrkPhi = TRUE,
             addsuedo = FALSE)

Arguments

IP.file

A vector of characters containing the name of BAM files for all IP samples.

Input.file

A vector of characters containing the name of BAM files for all input control samples.

Path_To_AnnoSqlite

A character to specify the path to a "*.sqlite" file used for genome annotation.

variable

A dataframe containing condition information of all samples. Default is NULL.

model

A formula to specify which factor in "variable" will be included into design for model fitting. Default is NULL.

InputDir

A character to specify the input directory of all BA, files.

OutputDir

A character to specify an output directory to save bin-level and region-level data. Default is NA, which will not save any results.

experimentName

A character to specify the name of results if "OutputDir" is provided.

binsize

A numerical value to specify the size of window to bin the genome. Default value is 50.

WhichThreshold

A character to specify which criterion to select significant bins in order to obtain candidate DMRs from the first step. It takes among "pval", "fdr", "lfc", "pval_lfc" and "fdr_lfc". "pval": The inference is only based on P-values; "fdr": The inference is only based on FDR; "lfc": The inference is only based on log fold changes between normalized IP and normalized input read counts; "pval_lfc": The inference is based on both p-values and log fold changes; "fdr_lfc": The inference is based on both FDR and log fold changes. Default is "fdr".

pval.cutoff

A numerical value to specify a p-value cutoff in the selection of significant bins to form candidate DMRs. Default is 1e-5.

fdr.cutoff

A numerical value to specify a FDR cutoff in the selection of significant bins to form candidate DMRs. Default is 0.05.

lfc.cutoff

A numerical value to specify a cutoff of log fold change between normalized IP and input counts in the selection of significant bins to form candidate DMRs. Default is 0.4 for fold change of 1.5.

filetype

A character to specify the format of input data. Possible choices are: "bam", "bed" and "GRanges". Default is "bam".

IncludeIntron

A logical value indicating whether to include (TRUE) bins overlapping with intronic regions or not (False). Default is TRUE.

filterRegion

A logical value indicating whether to filter out candidate DMRs based on their marginal coefficient of variation (CV) in methylation ratios. If TRUE, then a candidate DMR with CV < 25% quantile would be filtered out. Default value is TRUE.

shrkPhi

A logical value to indicate whether conducting shringkage estimate for dispersion parameter. Default is TRUE.

addsuedo

A logical value to indicate whether or not adding a psuedo count 5 on raw read counts. Default is FALSE.

Details

For complete details on each step (especially step 3) in above "Description" section, please see the manual pages of respective functions.

Value

This function generates three sets of results: "allBins", "Candidates" and "DMRfit" returned respectively by function DivideBins, CallCandidates and CallDMRs.paramEsti. If "OutputDir" is not specified, only "DMRfit" will be returned. If "OutputDir" is specified, in addition to returning "DMRfit", "allBins" and "Candidates" will also be saved under the provided output directory.

Detailed structure of "DMRfit", "Candidates" and "allBins" can be found in the manual of respective functions.

Author(s)

Zhenxing Guo <zhenxing.guo@emory.edu>

References

Zhenxing Guo, Andrew M. Shafik, Peng Jin, Hao Wu. (2022) Differential RNA Methylation Analysis for MeRIP-seq Data under General Experimental Design. Bioinformatics, 38 (20), 4705-4712. https://academic.oup.com/bioinformatics/article/38/20/4705/6692302?login=true

See Also

DivideBins, CallCandidates, CallDMRs.paramEsti

Examples

## Not run: 
Input.file = c("input1.bam", "input2.bam",..., "inputN.bam")
IP.file = c("ip1.bam", "ip2.bam", ..., "ipN.bam")
InputDir = "/directory/to/BAMfile"
OutputDir = "/directory/to/output"
Path_sqlit = "/path/to/xxx.sqlite"
design = "YourDesign"
model = "YourModel"
DMR.fit = TRESS_DMRfit(IP.file = IP.file,
                       Input.file = Input.file,
                       Path_To_AnnoSqlite = Path_sqlit,
                       variable = design,
                       model = model,
                       WhichThreshold = "fdr",
                       InputDir = InputDir,
                       OutputDir = OutputDir,
                       experimentName = "example"
                       )
                       
## End(Not run)

ZhenxingGuo0015/TRESS documentation built on April 14, 2023, 4:21 p.m.