m6A_express: Predicting context-specific m6A regulation of gene expression

Description Usage Arguments Details Value Author(s) Examples

Description

The package is designed to predict genes whose expressions are regulaed by m6A methylation based on a negative binomial regression model with pooling prior information across genes to explicitly estimate the sample-to-sample variability between biological replicates and improve the precision and robustness of regression coefficients, which is suitable for MeRIP-seq data in small sample scenario.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
m6Aexpress(# input files in BAM format
            express_data,
            treated_express_data=character(0),
            IP_BAM,
            INPUT_BAM,
            TREATED_IP_BAM=character(0),
            TREATED_INPUT_BAM=character(0),

            # annotation file in GTF format or TXDB format
            annot_type="hg19",
            species="human",
            GENE_ANNO_GTF=NULL,
            isGTFAnnotationFile=FALSE,
            TXDB=NA,

            # specific parameters
            GENOME = NA,
            UCSC_TABLE_NAME = "knownGene",
            isPairedEnd = FALSE,
            mode="DE-DM",
            CV_values = 0.3,
            num_sample_subgroup,
            
            # threshod cutoff
            pvalue=NA,
            FDR=0.05,
            diff_gene_pvalue=NA,
            diff_gene_fdr=0.05,
            diff_peak_pvalue=NA,
            diff_peak_fdr=0.05,

            # output direction
            OUTPUT_DIR= NA
            )

Arguments

express_data

a character vector giving names of input samples containing read mapping results in BAM format. The input files could be the conventional RNA-seq data or the INPUT aligned MeRIP-seq data, which are used to quantify as the gene expression.

treated_express_data

a character vector giving names of input treated samples containing read mapping results in BAM format. The treated samples could be the conventional RNA-seq data or the INPUT aligned MeRIP-seq data, which are used to quantify as the gene expression under treated condition. These files are only provided in differential expression context.

IP_BAM

a character vector giving names of IP samples in MeRIP-seq data, which contained read mapping results in BAM format.

INPUT_BAM

a character vector giving names of INPUT samples in MeRIP-seq data, which contained read mapping results in BAM format.

TREATED_IP_BAM

a character vector giving names of treated IP samples in MeRIP-seq data, which contained read mapping results in BAM format. These files are only provided in differential methylation context.

TREATED_INPUT_BAM

a character vector giving names of treated INPUT samples in MeRIP-seq data, which contained read mapping results in BAM format. These files are only provided in differential methylation context.

annot_type

a character string specifying an in-built annotation used to quantify each gene's reads count for expression data. It has four possible values including "mm10", "mm9", "hg38" and "hg19", corresponding to the NCBI RefSeq annotations for genomes ‘mm10’, ‘mm9’, ‘hg38’ and ‘hg19’, respectively. "hg19" by default.

species

a character string specifying the name of species, which could be "human", "mouse". This parameter is only appliable when GENE_ANNO_GTF is NULL.

GENE_ANNO_GTF

A character string giving name of a user-provided annotation file in GTF format. Note that GENE_ANNO_GTF will override annot_type if both provided.

isGTFAnnotationFile

logical indicating whether the annotation provided via the annot_type argument is in GTF format. FALSE by default. This option is only applicable when GENE_ANNO_GTF is not NULL.

mode

A string, such as "DE-DM","HVP", which specifies the predicated m6A-reg-exp gene in differential gene and differential methylation ("DE-DM") context or high variable peak ("HVP") context, default: mode="DE-DM".

CV_values

A decimal number, which is the cutoff for coefficient of variation (CV) used to select high variable peaks. The high variable peaks are defined as CV values are bigger than the cutoff: default CV_values=0.3.

num_sample_subgroup

A vector of integer, which specifies the number of samples for each sub-group. For example, num_sample_subgroup=c(2,2,3,1) means four sub-groups have two samples, two samples, three samples and one sample in each sub-group, respectively.

TXDB

An optional TxDb object for gene annotation information used in the analysis, default: NA. Please refere to "GenomicFeatures" package for more details about the "TxDb" object.

GENOME

A string,such as "hg19" or "mm10", which specifies the genome assembly used. If a gene annotation file is provided, this function will not use it directly; otherwise, this function will download the gene annotation from UCSC using the genome assembly specified here and the gene annotation table specified in "UCSC_TABLE_NAME".

UCSC_TABLE_NAME

A string, which gives the gene annotation used from UCSC, default: "knownGene". Please use function: supportedUCSCtables() to check available tables. Some tables may not be available for all genomes, and the "refGene" table does not work correctly due to multiple occuences of the same transcript on the same chromosome.

pvalue

a decimal number, which specifies the p-value cut-off in the predicting m6A regulated expression gene of m6A-express model.

FDR

a decimal number, which specifies the FDR cut-off in the predicting m6A regulated expression gene of m6A-express model, default: 0.05.

diff_gene_pvalue

a decimal number, which specifies the p-value cut-off to identify differential expression gene in DESeq2 R-package.

diff_gene_fdr

a decimal number, which specifies the fdr cut-off to identify differential expression gene in DESeq2 R-package, default: 0.05.

diff_peak_pvalue

a decimal number, which specifies the p-value cut-off to identify differential methylation peak sites in QNB R-package.

diff_peak_fdr

a decimal number, which specifies the fdr cut-off to identify differential methylation peak sites in QNB R-package, default: 0.05.

OUTPUT_DIR

A string, which specify the output directory, default: OUTPUT_DIR=NA, the output result will save in the current directory. Otherwise, m6A_express will output the significant m6A regulated expression gene under given cutoff, e.g. FDR<0.05 or pvalue<0.05.

Details

In order to analysis the regulation of m6A methylation on gene expression, we develop the R package m6A-express. The inputs of m6A-express are BAM files of RNA-seq data or INPUT BAM files of MeRIP-seq data, which are used to quantify gene expression, and IP BAM files with paired INPUT BAM files that are used to quantify methylation inensity in m6A-express model. For the origianl gene annotation file, such as GTF or TXDB format file, user could provide it or directly use the default in-built annotation file by given the function parameter species, annot_type, GENOME and UCSC_TABLE_NAME = "knownGene".

The m6Aexpress function is an all-in-one command that performs all the core functions of the m6A-express R-package. The core function of m6A_express achieved two modes to predict methylation regulated expression genes:

1. Predict m6A regulated expression genes under multiple conditions, such as in mutiple tissues context, and we call it basic mode.

2. Predict the differential methylation peak sites regulated differential expression genes in specific context and we call it DE-DM mode.

Value

In the basic mode, m6A-express will output results in specific directory including:

1. Peak calling result botain by exomePeak R-package, which are detailedly explained in exomePeak R-package.

2. The paired reads count and methylation intensity for each gene in tab-delimited text file.

3. The predicting m6A methylation regulated expression (m6A-reg-exp) genes under given cut-off in .XLS (tab-delimited) format file. The .XLS file includes five columns: the name of m6A-reg-exp gene, Beta0, Beta1, pvalue and FDR.

gene_name

The name of m6A methylation regulated expression gene in the given annotation file.

Beta0

Beta0 is a gene-specific intercept and models the baseline log gene expression.

Beta1

Beta1 captures the influence of m6A methylation on gene expression.

pvalue

pvalue of the predicted m6A-reg-exp genes.

FDR

FDR of the predicted m6A-reg-exp genes.

In the DE-DM mode, m6A-express will output results in specific directory including:

1. Consistent peak sites information between two conditions obtained by exomePeak R-package, which are detailedly explained in exomePeak R-package.

2. Differential expression (DE) gene as .XLS file on disk in the specified directory given DE cut-off (e.g. diff_gene_pvalue=0.05 or diff_gene_fdr=0.05).

3. The differential m6A methylation (DM) peak sites information saved as .XLS file in the specified directory given DM cut-off (e.g. diff_peak_pvalue=0.05 or diff_peak_fdr=0.05).

4. The paired reads count and methylation intensity for each differential expression gene with differential methylation peaks (DE-DM) in tab-delimited text file.

5. The predicting m6A methylation regulated expression (m6A-reg-exp) genes out from DE-DM genes under given cut-off (e.g. pvalue=0.05 or FDR=0.05) in .XLS (tab-delimited) format file. The .XLS file includes five columns: the name of m6A-reg-exp gene, Beta0, Beta1, pvalue and FDR.

In the HVP mode, m6A-express will output results in specific directory including:

1. Consistent peak sites information across multiple conditions obtained by exomePeak R-package, which are detailedly explained in exomePeak R-package.

2. The High varialbe peak sites information as .xlsx data format.

3. The paired reads count and methylation intensity for each differential expression gene with high variable peaks (HVP) in tab-delimited text file.

4. The predicting m6A methylation regulated expression (m6A-reg-exp) genes out from HVP genes under given cut-off (e.g. pvalue=0.05 or FDR=0.05) in .XLS (tab-delimited) format file. The .XLS file includes five columns: the name of m6A-reg-exp gene, Beta0, Beta1, pvalue and FDR.

Author(s)

Teng Zhang <tengzhang126@163.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
# Obtain BAM files
f1 <- system.file("extdata", "IP1.bam", package="m6Aexpress")
f2 <- system.file("extdata", "IP2.bam", package="m6Aexpress")
f3 <- system.file("extdata", "IP3.bam", package="m6Aexpress")
f4 <- system.file("extdata", "IP4.bam", package="m6Aexpress")
f5 <- system.file("extdata", "Input1.bam", package="m6Aexpress")
f6 <- system.file("extdata", "Input2.bam", package="m6Aexpress")
f7 <- system.file("extdata", "Input3.bam", package="m6Aexpress")
f8 <- system.file("extdata", "Input4.bam", package="m6Aexpress")
# Input the annotation file
gtf <- system.file("extdata", "hg19toy.gtf", package="m6Aexpress")
# In model="DE-DM"
IP_BAM <- c(f1,f2)
TREATED_IP_BAM <- c(f3,f4)
INPUT_BAM <- c(f5,f6)
TREATED_INPUT_BAM <- c(f7,f8)
## Predict m6A-reg-exp genes out from DE-DM genes.
m6A_reg_exp_gene <- m6Aexpress(express_data=INPUT_BAM, treated_express_data=TREATED_INPUT_BAM, IP_BAM=IP_BAM, TREATED_IP_BAM=TREATED_IP_BAM, INPUT_BAM=INPUT_BAM, TREATED_INPUT_BAM=TREATED_INPUT_BAM,annot_type="hg19", GENE_ANNO_GTF=gtf,isGTFAnnotationFile=TRUE, pvalue=0.05,mode="DE-DM")

## End(Not run)
## Not run: 
# In model="HVP"
IP_BAM <- c(f1,f2,f3,f4)
INPUT_BAM <- c(f5,f6,f7,f8)

# Predict m6A-reg-exp gene by m6A-express model in "HVP" context
m6A_reg_exp_gene <- m6Aexpress(express_data=INPUT_BAM, IP_BAM=IP_BAM, INPUT_BAM=INPUT_BAM, annot_type="hg19", GENE_ANNO_GTF=gtf,isGTFAnnotationFile=TRUE, pvalue=0.05,mode="HVP", CV_values = 0.3, num_sample_subgroup=c(2,2))

## End(Not run)

NWPU-903PR/m6Aexpress documentation built on Dec. 17, 2021, 5:18 a.m.