meRIP_QC_report: Generate quality control report of a single MeRIP data site.

Description Usage Arguments Details Value See Also Examples

View source: R/meRIP_QC_report.R

Description

meRIP_QC_report is used to generate a single quality control report for a summarized experiment object of MeRIP experiment.

Usage

1
2
3
4
5
6
meRIP_QC_report(se_M, txdb = NULL, save_title = "modX",
  save_dir = save_title, gtcoord = NULL, p_threshold = NULL,
  fdr_threshold = NULL, log2FC_cutoff = 0, min_num_mod = 10000,
  save_inference_result = TRUE, GC_idx_feature = NULL,
  DM_analysis = FALSE, DM_method = "DESeq2", expected_change = NULL,
  PCA_plot = FALSE, row_minimal_counts = 10, cqn = FALSE)

Arguments

se_M

A SummarizedExperiment object containing the counts of each modification sites of each bam files. Appropriate colData and rowRanges should be available. Specifically, colData should be a DataFrame object including the following columns:

SRR_RUN : a factor variable that uniquely indentify each columns of the count matrix, could be ID for each bam files.

IP_input : a factor variable indicating whether the columns belong to IP or input, the levels need to be c("input", "IP").

txdb

TxDb object of the corresponding rowRanges, this is either obtained from biocoductor or converted from the user provided GFF files.

save_title

A character string indicating the header of the reports generated.

save_dir

A character string indicating the directory to save the report, by default it is the current working directory.

gtcoord

Optional: A variable containing guitar coordinate, which is defined by the Guitar package. If not provided, the guitar coordinate will be automatically generated from txdb.

Cui X, Wei Z, Zhang L, Liu H, Sun L, Zhang s, Huang Y and Meng J (2016). <e2><80><9c>Guitar: an R/Bioconductor package for gene annotation guided transcriptomic analysis of RNA related genomic features.<e2><80><9d> BioMed Research International.

p_threshold

A numeric value between 0 to 1, it indicates the p value cut off of the statistical inference, it will be neglected if fdr_threshold is not NULL.

fdr_threshold

A numeric value between 0 to 1, it indicates the fdr cut off of the statistical inference.

By default, meRIP_QC_report want to call DESeq2 and infer methylation under the design log2(Q) ~ intercept + I(IP). The Wald test is conducted on the coefficient estimate of the second term I(IP).

log2FC_cutoff

The log2 fold change cutoff of the inference result, default setting is 0.

min_num_mod

The minimal number of sites inferred in the Methylation and Control groups, i.e.IP bigger than input and vice versa (for control), default setting is 10000.

save_inference_result

Whether to save the result of the inference, default setting is TRUE.

GC_idx_feature

Optional: The GC content values for each features (rows) of the count matrix.

DM_analysis

Optional: Whether to conduct differential methylation analysis or not, default setting is FALSE.

DM_method

Decide the statistical inference method used in differential methylation procedure. The default setting is "DESeq2"; an alternative setting is "QNB", which will use the QNB package to compute the differential methylation statistics.

Liu, L., et al. (2017). "QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model." Bmc Bioinformatics 18(1): 387.

expected_change

Optional: could be either "hyper" and "hypo", indicating the expected change of treated condition over input condition, this is useful when inference of the target sites of RNA modification writers or erasers from the MeRIP Seq data. Default setting is NULL.

PCA_plot

Whether to plot the PCA plot with DESeq2, the default setting is FALSE, it can be time consuming due to the required rlog transformation in DESeq2.

row_minimal_counts

A non negative integer number, the methylation sites with total count (row sums) smaller than the threshold will be excluded from the statistical inference, the default setting is 10.

The row filter is recommended when dealing with sparse count matrix, it can improve the computational efficiencies of the inference process; occasionally, it can also improve the statistical power of the tests;

cqn

indicate wheather to normalize GC content dependency of methylation / differential methylation log2FC, default is FALSE.

Select FALSE if you want to diagnose GC content batch effect.

Select TRUE if you want to send the inference result to downstream analysis.

Details

The function can generate a Quality Control report on a well formated SummarizedExperiment object containing reads count matrix and the genomic locations of each row features. Under current version, meRIP_QC_report supports the generation of the following reports.

1. A reads number distribution plot.

2. A GC content diagnosis plot for single columns of SummarizedExperiment.

3. A methylation profile report in tabular format based on DeSeq2 result.

4. A GC content diagnosis plot for methylation sites.

5. Guitar plot for methylation sites.

6. Exon length distribution for methylation sites.

Value

This function will generate files of quality control reports under the directory provided by save_dir

See Also

many.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
meRIP_QC_report(se_M = se_mm10,
txdb = TxDb.Mmusculus.UCSC.mm10.knownGene::TxDb.Mmusculus.UCSC.mm10.knownGene,
gtcoord = Gtcoord_mm10,
min_num_mod = 1000)

#To do:
1. add QNB: done.
1.5 add 2 functions: Mod_count_denovo, Mod_count_annotation.
2. add cqn (adjust GC content) / probably add GC content adjustment for CHIP-seq (if possible).
3. add plot over-dispersion for both QNB and DESeq2.
4. change the save dir into paste, or record the original dir. (don't reset directory at final, if you cannot complete (due to middle error), you will mess up user's directory)
5. The output of the inference result could not be RDS, be a readable format such as csv.
6. If some one not provide guitar coordinate (gtcoord = NULL), make the coordinate being automatically generated from txdb....
7. Organize and merge into one html report.
8. Remove unnecessary export
9. A summarization purposed OLM on log2FC ~ GC_content_z + exon_length + stop_codon.

ZhenWei10/meripQC documentation built on May 13, 2019, 11:51 p.m.