dmdeepm6A: Single base m6A and DM m6A dientification

Description Usage Arguments Details Value Author(s) References Examples

Description

This function is main function of DMDeepm6A which is used to predict m6A site and DM m6A site in single base resolution from MeRIP-Seq (m6A-Seq) data. The main features of the function includes:

1. Identification m6A site in single base resolution from exomePeak detected peak region.

2. Annotation the m6A site located gene name and transcript position, e.g. 5'UTR, CDS, 3'UTR or LncRNA.

3. Identify differential methylation m6A sites in single base resolution.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
dmdeepm6A(ip_bams,
          input_bams,
          output_filepath = NA,
          experiment_name = "DMDeepm6A_out",
          sample_conditions = NA,
          exomepeak_path = NA,
          gft_genome = NA,
          model_filepath = NA,
          default_genome = TRUE,
          tx_genes = NA,
          txdb = NA,
          BSgenome = NA,
          egSYMBOL = NA,
          minimal_gene_length = 0,
          sig_site_thresh = 0.8,
          diff_method = "exomepeak",
          diff_normalize = "TotalReads",
          diff_bin_width = 201,
          sig_diff_thresh = 0.01)

Arguments

ip_bams

A vector of characters denote the file path of IP samples in bam formate

input_bams

A vector of characters denote the file path of input samples in bam formate, should be paired with the IP samples

output_filepath

The file path where to output the result

experiment_name

The name of the experiment

sample_conditions

A vector of characters denote the condition of IP and Input samples. Should be the same length as the ip_bams and the values should be "untreated" or "treated"

exomepeak_path

A vector of characters denote the file paths where you save the pre-generated exomePeak result for each IP-Input sample paire

gft_genome

The gtf genome annotation, this is should be consistent with the BSgenome.

model_filepath

The CNN models used to do site prediction, usually do not need to input, only if you trained several CNN model by yourself

default_genome

A logical parameter denotes whether use the default genome, default is TRUE. The default genome is hg19 for human

tx_genes

The transcriptome used to do the analysis, usually do not need to input

txdb

The txdb famate genome annotation. You need to input it similar to "TxDb.Hsapiens.UCSC.hg19.knownGene" if DefaultGenome is FALSE.

BSgenome

The sequence of genome. You need to input it similar to "BSgenome.Hsapiens.UCSC.hg19" if DefaultGenome is FALSE.

egSYMBOL

The gene name annotation. You need to input it similar to "org.Hs.egSYMBOL" if DefaultGenome is FALSE.

minimal_gene_length

The minimal length of a trascript when make transcriptome from genome

sig_site_thresh

The probability treshold used to define a detected single base m6A site as significant

diff_method

The method used to do differential methylation test for candidate DmM sites, the defalt is the same as exomePeak, and QNB is an alternative which will improve the precision but sacrifice the sensitivity a lot.

diff_normalize

The normalization used to do QNB test for differential methylation site. The default is "DMSites" which means the readscount will be normalized with the readscount of all candidate DM sites. Alternatively, it can be set as "TotalReads" or "deseq" which means the normalization will be done based on total number of reads of each replicate or as discribed in DeSeq method.

diff_bin_width

The bin width used to extend candidate DmM sites to count the reads.

sig_diff_thresh

The thresh used to identify whether a candidate DmM site is significantly differential methylated.

Details

tx_genes is generated from the TXDB annotation genome you are using. It is saved as "psuedoGene.RData" under the output filepath. You can use it next time when you are using the same genome to save some time of making it from TXDB.

You need to input txdb, BSgenome and egSYMBOL if you do not want to use the default hg19 genome. Generally, you need to install this 3 genome information before you use them.

Value

By default, deepm6A will output results both

1. as BED/XLS files on disk (default: "Deepm6A_out") under the specified directory (default: current working directory).

2. returned the annotated DM m6A site identified in single base resolution as list object under the R environment.

Author(s)

Songyao Zhang

References

FDMDeep-m6A: Identification and prioritization of functional differential methylation genes.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## example for default hg19 genome

## get input bam
ip_bam1 <- system.file("extdata", "treated_ip1.bam", package="DMDeepm6A")
ip_bam2 <- system.file("extdata", "treated_ip2.bam", package="DMDeepm6A")
ip_bam3 <- system.file("extdata", "untreated_ip1.bam", package="DMDeepm6A")
ip_bam4 <- system.file("extdata", "untreated_ip2.bam", package="DMDeepm6A")
input_bam1 <- system.file("extdata", "treated_input1.bam", package="DMDeepm6A")
input_bam2 <- system.file("extdata", "treated_input2.bam", package="DMDeepm6A")
input_bam3 <- system.file("extdata", "untreated_input1.bam", package="DMDeepm6A")
input_bam4 <- system.file("extdata", "untreated_input2.bam", package="DMDeepm6A")

ip_bams <- c(ip_bam1, ip_bam2, ip_bam3, ip_bam4)
input_bams <- c(input_bam1, input_bam2, input_bam3, input_bam4)

## get sample condition
sample_condition <- c("treated", "treated", "untreated", "untreated")

## get genome annotation, generally, you can leave this default if you use the default hg19 genome
## we use a toy gtf here to make this example run faster.
gft_genome <- system.file("extdata", "genes.gtf", package="DMDeepm6A")

## diff peak calling
re <- dmdeepm6A(ip_bams = ip_bams,
                input_bams = input_bams,
                sample_conditions = sample_condition,
                gft_genome = gft_genome)

## do only peak calling for several replicates

## get input bam
ip_bam1 <- system.file("extdata", "treated_ip1.bam", package="DMDeepm6A")
ip_bam2 <- system.file("extdata", "treated_ip2.bam", package="DMDeepm6A")
input_bam1 <- system.file("extdata", "treated_input1.bam", package="DMDeepm6A")
input_bam2 <- system.file("extdata", "treated_input2.bam", package="DMDeepm6A")

ip_bams <- c(ip_bam1, ip_bam2)
input_bams <- c(input_bam1, input_bam2)

## peak calling. sample_conditions should leave default (NA) or set all rep as "untreated".
re <- dmdeepm6A(ip_bams = ip_bams,
                input_bams = input_bams,
                gft_genome = gft_genome)

# An genome input formate example for rat rn5 genome
# not run

## get genome annotation
# library(TxDb.Rnorvegicus.UCSC.rn5.refGene)
# txdb <- TxDb.Rnorvegicus.UCSC.rn5.refGene

# library(BSgenome.Rnorvegicus.UCSC.rn5)
# BSgenome <- BSgenome.Rnorvegicus.UCSC.rn5

# library(org.Rn.eg.db)
# egSYMBOL <- org.Rn.egSYMBOL

# sigpeak <- deepm6A(ip_bam,
#                    input_bam,
#                    sample_conditions = sample_conditions,
#                    DefaultGenome = FALSE,
#                    txdb = txdb,
#                    BSgenome = BSgenome,
#                    egSYMBOL = egSYMBOL)

NWPU-903PR/DMDeepm6A documentation built on May 28, 2019, 8:58 p.m.